From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sat, 28 Aug 2004 14:43:03 -0700 From: Andrew Morton Subject: Re: Kernel 2.6.8.1: swap storm of death - nr_requests > 1024 on swap partition Message-Id: <20040828144303.0ae2bebe.akpm@osdl.org> In-Reply-To: <4130F55A.90705@pandora.be> References: <20040824124356.GW2355@suse.de> <412CDE7E.9060307@seagha.com> <20040826144155.GH2912@suse.de> <412E13DB.6040102@seagha.com> <412E31EE.3090102@pandora.be> <41308C62.7030904@seagha.com> <20040828125028.2fa2a12b.akpm@osdl.org> <4130F55A.90705@pandora.be> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Karl Vogel Cc: Jens Axboe , linux-mm@kvack.org List-ID: (Added linux-mm) Karl Vogel wrote: > > Andrew Morton wrote: > > Karl Vogel wrote: > > > >>Further testing shows that all the schedulers exhibit this exact same > >> problem when run with a nr_requests size of 8192 on the drive hosting > >> the swap partition. > >> > >> I tried noop, deadline, as and CFQ with: > >> > >> echo 8192 >/sys/block/hda/queue/nr_requests > > > > > > That allows up to 2GB of memory to be under writeout at the same time. The > > VM cannot touch any of that memory. > > Well I used that value as it is the default for CFQ.. and it was with > CFQ that I had the problems. The patch Jens offered to track down the > problem, commented out this 'q->nr_requests = 8192' in CFQ and it > helped. Therefor I tried the other schedulers with this value to see if > it made a difference. > > So if I understand you correctly, CFQ shouldn't be using 8192 on 512Mb > systems?! Yup. It's asking for trouble to allow that much memory to be unreclaimably pinned. Of course, you could have the same problem with just 128 requests per queue, and lots of queues. I solved all these problems in the dirty memory writeback paths. But I forgot about swapout! > With overcommit_memory set to 1, the program can be run again after the > OOM kill.. but the OOM killing remains. > > With overcommit_memory set to 0 a second run fails. I 'think' it's > because somehow SwapCache is 500Kb after the OOM, so in effect my system > doesn't have 1Gb to spare anymore. Doing swapoff/swapon frees this and > then I can do the calloc(1Gb) again. > > Another way to free the SwapCached is to generate lots of I/O doing 'dd > if=/dev/hda of=/dev/null' ... after a while SwapCached is < 1Mb again. > urgh. It sounds like the overcommit logic forgot to account swapcache as reclaimable. It's been a ton of trouble, that code. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org