* Re: add_to_swap_cache with GFP_ATOMIC ?
2009-03-31 11:17 ` Hugh Dickins
@ 2009-03-31 11:30 ` Minchan Kim
2009-04-01 8:38 ` KOSAKI Motohiro
1 sibling, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2009-03-31 11:30 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-mm, lkml
Thanks for quick reply.
On Tue, Mar 31, 2009 at 8:17 PM, Hugh Dickins <hugh@veritas.com> wrote:
> On Tue, 31 Mar 2009, Minchan Kim wrote:
>>
>> I don't know why we should call add_to_swap_cache with GFP_ATOMIC ?
>> Is there a special something for avoiding blocking?
>
> add_to_swap_cache itself does not need to be called with GFP_ATOMIC.
>
> There are three places from which it is called:
>
> read_swap_cache_async (typically used when faulting) masks the
> gfp_mask coming in (typically GFP_HIGHUSER_MOVABLE for the pages
> themselves) to call add_to_swap_cache typically with GFP_KERNEL.
>
> shmem_writepage does call it with GFP_ATOMIC: that's because it's
> holding the shmem_inode's spin_lock while it switches the page between
> file cache and swap cache - IIRC holding page lock isn't quite enough
> for that, because of other cases; but I've not thought that through
> in a long time, we could re-examine if it troubles you.
Yes. My point was that. but I am not sure what are other cases. :(
Now, It don't hurt me but not sure in future.
> The questionable one is add_to_swap (when vmscanning), which calls
> it with __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, i.e. GFP_ATOMIC
> plus __GFP_NOMEMALLOC|__GFP_NOWARN. That one I have wondered
> about from time to time: GFP_NOIO would be the obvious choice,
> that's what swap_writepage will use to allocate bio soon after.
>
> I've been tempted to change it, but afraid to touch that house
> of cards, and afraid of long testing and justification required.
> Would it be safe to drop that __GFP_HIGH? What's the effect of the
> __GFP_NOMEMALLOC (we've layer on layer of tweak this one way because
> we're in the reclaim path so let it eat more, then tweak it the other
> way because we don't want it to eat up _too_ much). I just let it stay.
Sigh.
What a complex thing to change one line.
Thanks for kind explanation.
> Hugh
>
--
Kinds regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: add_to_swap_cache with GFP_ATOMIC ?
2009-03-31 11:17 ` Hugh Dickins
2009-03-31 11:30 ` Minchan Kim
@ 2009-04-01 8:38 ` KOSAKI Motohiro
2009-04-01 9:13 ` KOSAKI Motohiro
2009-04-01 11:46 ` Hugh Dickins
1 sibling, 2 replies; 7+ messages in thread
From: KOSAKI Motohiro @ 2009-04-01 8:38 UTC (permalink / raw)
To: Hugh Dickins
Cc: kosaki.motohiro, Minchan Kim, linux-mm, lkml, Andrew Morton,
Christoph Lameter, Nick Piggin
(cc to related person)
> The questionable one is add_to_swap (when vmscanning), which calls
> it with __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, i.e. GFP_ATOMIC
> plus __GFP_NOMEMALLOC|__GFP_NOWARN. That one I have wondered
> about from time to time: GFP_NOIO would be the obvious choice,
> that's what swap_writepage will use to allocate bio soon after.
>
> I've been tempted to change it, but afraid to touch that house
> of cards, and afraid of long testing and justification required.
> Would it be safe to drop that __GFP_HIGH? What's the effect of the
> __GFP_NOMEMALLOC (we've layer on layer of tweak this one way because
> we're in the reclaim path so let it eat more, then tweak it the other
> way because we don't want it to eat up _too_ much). I just let it stay.
firstly, following some patch indicate add_to_swap() parameter history.
ac47b003d03c2a4f28aef1d505b66d24ad191c4f(Hugh, Jan 6 2009) reverted
1480a540c98525640174a7eadd712378fcd6fd63(Cristoph, Jan 8 2006).
bd53b714d32a29bdf33009f812e295667e92b930(Nick, May 1 2005) added
__GFP_NOMEMALLOC. __GFP_NOMEMALLOC mean "please don't eat emergency memory".
c4b3efde0744e038d16b33d18110d39e762ef80c(akpm, Jan 6 2003) explained
why no using emergency memory is better.
it said
In the case of adding pages to swapcache we're still using GFP_ATOMIC, so
these addition attempts can still fail. That's OK, because the error is
handled and, unlike file pages, it will not cause user applicaton failures.
IOW, GFP_ATOMIC on add_to_swap() was introduced accidentally. the reason
was old add_to_page_cache() didn't have gfp_mask parameter and we didn't
have the reason of changing add_to_swap() behavior.
I think it don't have deeply reason and changing GFP_NOIO don't cause regression.
---------------------------------------------
commit ac47b003d03c2a4f28aef1d505b66d24ad191c4f
Author: Hugh Dickins <hugh@veritas.com>
Date: Tue Jan 6 14:39:39 2009 -0800
mm: remove gfp_mask from add_to_swap
Remove gfp_mask argument from add_to_swap(): it's misleading because its
only caller, shrink_page_list(), is not atomic at that point; and in due
course (implementing discard) we'll sometimes want to allocate some memory
with GFP_NOIO (as is used in swap_writepage) when allocating swap.
No change to the gfp_mask passed down to add_to_swap_cache(): still use
__GFP_HIGH without __GFP_WAIT (with nomemalloc and nowarn as before):
though it's not obvious if that's the best combination to ask for here.
-int add_to_swap(struct page * page, gfp_t gfp_mask)
+int add_to_swap(struct page *page)
{
swp_entry_t entry;
int err;
@@ -153,7 +153,7 @@ int add_to_swap(struct page * page, gfp_t gfp_mask)
* Add it to the swap cache and mark it dirty
*/
err = add_to_swap_cache(page, entry,
- gfp_mask|__GFP_NOMEMALLOC|__GFP_NOWARN);
+ __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
---------------------------------------------
commit 1480a540c98525640174a7eadd712378fcd6fd63
Author: Christoph Lameter <clameter@sgi.com>
Date: Sun Jan 8 01:00:53 2006 -0800
[PATCH] SwapMig: add_to_swap() avoid atomic allocations
Add gfp_mask to add_to_swap
add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
swapping. During migration we may have use add_to_swap extensively which may
lead to out of memory errors.
This patch makes add_to_swap take a parameter that specifies the gfp mask.
The page migration code can then make add_to_swap use GFP_KERNEL.
-int add_to_swap(struct page * page)
+int add_to_swap(struct page * page, gfp_t gfp_mask)
{
swp_entry_t entry;
int err;
@@ -166,7 +166,7 @@ int add_to_swap(struct page * page)
* Add it to the swap cache and mark it dirty
*/
err = __add_to_swap_cache(page, entry,
- GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_NOWARN);
+ gfp_mask|__GFP_NOMEMALLOC|__GFP_NOWARN);
---------------------------------------------
commit bd53b714d32a29bdf33009f812e295667e92b930
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date: Sun May 1 08:58:37 2005 -0700
[PATCH] mm: use __GFP_NOMEMALLOC
Use the new __GFP_NOMEMALLOC to simplify the previous handling of
PF_MEMALLOC.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
@@ -154,29 +153,19 @@ int add_to_swap(struct page * page)
if (!entry.val)
return 0;
- /* Radix-tree node allocations are performing
- * GFP_ATOMIC allocations under PF_MEMALLOC.
- * They can completely exhaust the page allocator.
- *
- * So PF_MEMALLOC is dropped here. This causes the slab
- * allocations to fail earlier, so radix-tree nodes will
- * then be allocated from the mempool reserves.
+ /*
+ * Radix-tree node allocations from PF_MEMALLOC contexts could
+ * completely exhaust the page allocator. __GFP_NOMEMALLOC
+ * stops emergency reserves from being allocated.
*
- * We're still using __GFP_HIGH for radix-tree node
- * allocations, so some of the emergency pools are available,
- * just not all of them.
+ * TODO: this could cause a theoretical memory reclaim
+ * deadlock in the swap out path.
*/
-
- pf_flags = current->flags;
- current->flags &= ~PF_MEMALLOC;
-
/*
* Add it to the swap cache and mark it dirty
*/
- err = __add_to_swap_cache(page, entry, GFP_ATOMIC|__GFP_NOWARN);
-
- if (pf_flags & PF_MEMALLOC)
- current->flags |= PF_MEMALLOC;
+ err = __add_to_swap_cache(page, entry,
+ GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_NOWARN);
---------------------------------------------
commit b84a35be0285229b0a8a5e2e04d79360c5b75562
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date: Sun May 1 08:58:36 2005 -0700
[PATCH] mempool: NOMEMALLOC and NORETRY
Mempools have 2 problems.
The first is that mempool_alloc can possibly get stuck in __alloc_pages
when they should opt to fail, and take an element from their reserved pool.
The second is that it will happily eat emergency PF_MEMALLOC reserves
instead of going to their reserved pools.
Fix the first by passing __GFP_NORETRY in the allocation calls in
mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which
directs the page allocator not to allocate from the reserve pool.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---------------------------------------------
commit c22eeaf4682941543d1426621ca1e3c6d3833fe2
Author: akpm <akpm>
Date: Fri Sep 3 17:20:58 2004 +0000
[PATCH] add_to_swap(): suppress oom message
Page allocation failures are expected when add_to_swap() tries to allocate
radix-tree nodes without PF_MEMALLOC. Kill the __alloc_pages() warnings.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
BKrev: 4138a7faauXIloxI3vwZ04xvfa1paQ
@@ -171,7 +171,7 @@ int add_to_swap(struct page * page)
/*
* Add it to the swap cache and mark it dirty
*/
- err = __add_to_swap_cache(page, entry, GFP_ATOMIC);
+ err = __add_to_swap_cache(page, entry, GFP_ATOMIC|__GFP_NOWARN);
if (pf_flags & PF_MEMALLOC)
current->flags |= PF_MEMALLOC;
---------------------------------------------
commit c4b3efde0744e038d16b33d18110d39e762ef80c
Author: akpm <akpm>
Date: Mon Jan 6 03:51:29 2003 +0000
[PATCH] handle radix_tree_node allocation failures
This patch uses the radix_tree_preload() API in add_to_page_cache().
A new gfp_mask argument is added to add_to_page_cache(), which is then passed
on to radix_tree_preload(). It's pretty simple.
In the case of adding pages to swapcache we're still using GFP_ATOMIC, so
these addition attempts can still fail. That's OK, because the error is
handled and, unlike file pages, it will not cause user applicaton failures.
This codepath (radix-tree node exhaustion on swapout) was well tested in the
days when the swapper_space radix tree was fragmented all over the place due
to unfortunate swp_entry bit layout.
BKrev: 3e18fd41cbfAQY7P7NcjQYCevjYG2g
@@ -149,7 +149,8 @@ int add_to_swap(struct page * page)
/*
* Add it to the swap cache and mark it dirty
*/
- err = add_to_page_cache(page, &swapper_space, entry.val);
+ err = add_to_page_cache(page, &swapper_space,
+ entry.val, GFP_ATOMIC);
---------------------------------------------
commit e9cb95284376950d34943acd2ddd33a473ba9ba3
Author: torvalds <torvalds>
Date: Tue Dec 3 22:59:18 2002 +0000
Merge master.kernel.org:/home/hch/BK/xfs/linux-2.5
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
@@ -149,9 +150,15 @@ int add_to_swap(struct page * page)
/*
* Add it to the swap cache and mark it dirty
*/
- switch (add_to_page_cache(page, &swapper_space, entry.val)) {
+ err = add_to_page_cache(page, &swapper_space, entry.val);
+
+ if (!(pf_flags & PF_NOWARN))
+ current->flags &= ~PF_NOWARN;
+ if (pf_flags & PF_MEMALLOC)
+ current->flags |= PF_MEMALLOC;
+
+ switch (err) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread