From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id BFEEA6B003D for ; Wed, 1 Apr 2009 04:37:49 -0400 (EDT) Received: from m3.gw.fujitsu.co.jp ([10.0.50.73]) by fgwmail6.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id n318cMEw006545 for (envelope-from kosaki.motohiro@jp.fujitsu.com); Wed, 1 Apr 2009 17:38:23 +0900 Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id A234345DD7F for ; Wed, 1 Apr 2009 17:38:22 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 7511D45DD78 for ; Wed, 1 Apr 2009 17:38:22 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id BAAAAE08001 for ; Wed, 1 Apr 2009 17:38:21 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.249.87.104]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 90E561DB803B for ; Wed, 1 Apr 2009 17:38:20 +0900 (JST) From: KOSAKI Motohiro Subject: Re: add_to_swap_cache with GFP_ATOMIC ? In-Reply-To: References: <28c262360903310338k20b8eebbncb86baac9b09e54@mail.gmail.com> Message-Id: <20090401165516.B1EB.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Wed, 1 Apr 2009 17:38:19 +0900 (JST) Sender: owner-linux-mm@kvack.org To: Hugh Dickins Cc: kosaki.motohiro@jp.fujitsu.com, Minchan Kim , linux-mm , lkml , Andrew Morton , Christoph Lameter , Nick Piggin List-ID: (cc to related person) > The questionable one is add_to_swap (when vmscanning), which calls > it with __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, i.e. GFP_ATOMIC > plus __GFP_NOMEMALLOC|__GFP_NOWARN. That one I have wondered > about from time to time: GFP_NOIO would be the obvious choice, > that's what swap_writepage will use to allocate bio soon after. > > I've been tempted to change it, but afraid to touch that house > of cards, and afraid of long testing and justification required. > Would it be safe to drop that __GFP_HIGH? What's the effect of the > __GFP_NOMEMALLOC (we've layer on layer of tweak this one way because > we're in the reclaim path so let it eat more, then tweak it the other > way because we don't want it to eat up _too_ much). I just let it stay. firstly, following some patch indicate add_to_swap() parameter history. ac47b003d03c2a4f28aef1d505b66d24ad191c4f(Hugh, Jan 6 2009) reverted 1480a540c98525640174a7eadd712378fcd6fd63(Cristoph, Jan 8 2006). bd53b714d32a29bdf33009f812e295667e92b930(Nick, May 1 2005) added __GFP_NOMEMALLOC. __GFP_NOMEMALLOC mean "please don't eat emergency memory". c4b3efde0744e038d16b33d18110d39e762ef80c(akpm, Jan 6 2003) explained why no using emergency memory is better. it said In the case of adding pages to swapcache we're still using GFP_ATOMIC, so these addition attempts can still fail. That's OK, because the error is handled and, unlike file pages, it will not cause user applicaton failures. IOW, GFP_ATOMIC on add_to_swap() was introduced accidentally. the reason was old add_to_page_cache() didn't have gfp_mask parameter and we didn't have the reason of changing add_to_swap() behavior. I think it don't have deeply reason and changing GFP_NOIO don't cause regression. --------------------------------------------- commit ac47b003d03c2a4f28aef1d505b66d24ad191c4f Author: Hugh Dickins Date: Tue Jan 6 14:39:39 2009 -0800 mm: remove gfp_mask from add_to_swap Remove gfp_mask argument from add_to_swap(): it's misleading because its only caller, shrink_page_list(), is not atomic at that point; and in due course (implementing discard) we'll sometimes want to allocate some memory with GFP_NOIO (as is used in swap_writepage) when allocating swap. No change to the gfp_mask passed down to add_to_swap_cache(): still use __GFP_HIGH without __GFP_WAIT (with nomemalloc and nowarn as before): though it's not obvious if that's the best combination to ask for here. -int add_to_swap(struct page * page, gfp_t gfp_mask) +int add_to_swap(struct page *page) { swp_entry_t entry; int err; @@ -153,7 +153,7 @@ int add_to_swap(struct page * page, gfp_t gfp_mask) * Add it to the swap cache and mark it dirty */ err = add_to_swap_cache(page, entry, - gfp_mask|__GFP_NOMEMALLOC|__GFP_NOWARN); + __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); --------------------------------------------- commit 1480a540c98525640174a7eadd712378fcd6fd63 Author: Christoph Lameter Date: Sun Jan 8 01:00:53 2006 -0800 [PATCH] SwapMig: add_to_swap() avoid atomic allocations Add gfp_mask to add_to_swap add_to_swap does allocations with GFP_ATOMIC in order not to interfere with swapping. During migration we may have use add_to_swap extensively which may lead to out of memory errors. This patch makes add_to_swap take a parameter that specifies the gfp mask. The page migration code can then make add_to_swap use GFP_KERNEL. -int add_to_swap(struct page * page) +int add_to_swap(struct page * page, gfp_t gfp_mask) { swp_entry_t entry; int err; @@ -166,7 +166,7 @@ int add_to_swap(struct page * page) * Add it to the swap cache and mark it dirty */ err = __add_to_swap_cache(page, entry, - GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_NOWARN); + gfp_mask|__GFP_NOMEMALLOC|__GFP_NOWARN); --------------------------------------------- commit bd53b714d32a29bdf33009f812e295667e92b930 Author: Nick Piggin Date: Sun May 1 08:58:37 2005 -0700 [PATCH] mm: use __GFP_NOMEMALLOC Use the new __GFP_NOMEMALLOC to simplify the previous handling of PF_MEMALLOC. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds @@ -154,29 +153,19 @@ int add_to_swap(struct page * page) if (!entry.val) return 0; - /* Radix-tree node allocations are performing - * GFP_ATOMIC allocations under PF_MEMALLOC. - * They can completely exhaust the page allocator. - * - * So PF_MEMALLOC is dropped here. This causes the slab - * allocations to fail earlier, so radix-tree nodes will - * then be allocated from the mempool reserves. + /* + * Radix-tree node allocations from PF_MEMALLOC contexts could + * completely exhaust the page allocator. __GFP_NOMEMALLOC + * stops emergency reserves from being allocated. * - * We're still using __GFP_HIGH for radix-tree node - * allocations, so some of the emergency pools are available, - * just not all of them. + * TODO: this could cause a theoretical memory reclaim + * deadlock in the swap out path. */ - - pf_flags = current->flags; - current->flags &= ~PF_MEMALLOC; - /* * Add it to the swap cache and mark it dirty */ - err = __add_to_swap_cache(page, entry, GFP_ATOMIC|__GFP_NOWARN); - - if (pf_flags & PF_MEMALLOC) - current->flags |= PF_MEMALLOC; + err = __add_to_swap_cache(page, entry, + GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_NOWARN); --------------------------------------------- commit b84a35be0285229b0a8a5e2e04d79360c5b75562 Author: Nick Piggin Date: Sun May 1 08:58:36 2005 -0700 [PATCH] mempool: NOMEMALLOC and NORETRY Mempools have 2 problems. The first is that mempool_alloc can possibly get stuck in __alloc_pages when they should opt to fail, and take an element from their reserved pool. The second is that it will happily eat emergency PF_MEMALLOC reserves instead of going to their reserved pools. Fix the first by passing __GFP_NORETRY in the allocation calls in mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which directs the page allocator not to allocate from the reserve pool. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --------------------------------------------- commit c22eeaf4682941543d1426621ca1e3c6d3833fe2 Author: akpm Date: Fri Sep 3 17:20:58 2004 +0000 [PATCH] add_to_swap(): suppress oom message Page allocation failures are expected when add_to_swap() tries to allocate radix-tree nodes without PF_MEMALLOC. Kill the __alloc_pages() warnings. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds BKrev: 4138a7faauXIloxI3vwZ04xvfa1paQ @@ -171,7 +171,7 @@ int add_to_swap(struct page * page) /* * Add it to the swap cache and mark it dirty */ - err = __add_to_swap_cache(page, entry, GFP_ATOMIC); + err = __add_to_swap_cache(page, entry, GFP_ATOMIC|__GFP_NOWARN); if (pf_flags & PF_MEMALLOC) current->flags |= PF_MEMALLOC; --------------------------------------------- commit c4b3efde0744e038d16b33d18110d39e762ef80c Author: akpm Date: Mon Jan 6 03:51:29 2003 +0000 [PATCH] handle radix_tree_node allocation failures This patch uses the radix_tree_preload() API in add_to_page_cache(). A new gfp_mask argument is added to add_to_page_cache(), which is then passed on to radix_tree_preload(). It's pretty simple. In the case of adding pages to swapcache we're still using GFP_ATOMIC, so these addition attempts can still fail. That's OK, because the error is handled and, unlike file pages, it will not cause user applicaton failures. This codepath (radix-tree node exhaustion on swapout) was well tested in the days when the swapper_space radix tree was fragmented all over the place due to unfortunate swp_entry bit layout. BKrev: 3e18fd41cbfAQY7P7NcjQYCevjYG2g @@ -149,7 +149,8 @@ int add_to_swap(struct page * page) /* * Add it to the swap cache and mark it dirty */ - err = add_to_page_cache(page, &swapper_space, entry.val); + err = add_to_page_cache(page, &swapper_space, + entry.val, GFP_ATOMIC); --------------------------------------------- commit e9cb95284376950d34943acd2ddd33a473ba9ba3 Author: torvalds Date: Tue Dec 3 22:59:18 2002 +0000 Merge master.kernel.org:/home/hch/BK/xfs/linux-2.5 into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux @@ -149,9 +150,15 @@ int add_to_swap(struct page * page) /* * Add it to the swap cache and mark it dirty */ - switch (add_to_page_cache(page, &swapper_space, entry.val)) { + err = add_to_page_cache(page, &swapper_space, entry.val); + + if (!(pf_flags & PF_NOWARN)) + current->flags &= ~PF_NOWARN; + if (pf_flags & PF_MEMALLOC) + current->flags |= PF_MEMALLOC; + + switch (err) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org