add_to_swap_cache with GFP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* add_to_swap_cache with GFP_ATOMIC ?
@ 2009-03-31 10:38 Minchan Kim
  2009-03-31 11:17 ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Minchan Kim @ 2009-03-31 10:38 UTC (permalink / raw)
  To: Hugh Dickens; +Cc: linux-mm, lkml

Hi, all.

Let me have a question.

I don't know why we should call add_to_swap_cache with GFP_ATOMIC ?
Is there a special something for avoiding blocking?

-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: add_to_swap_cache with GFP_ATOMIC ?
  2009-03-31 10:38 add_to_swap_cache with GFP_ATOMIC ? Minchan Kim
@ 2009-03-31 11:17 ` Hugh Dickins
  2009-03-31 11:30   ` Minchan Kim
  2009-04-01  8:38   ` KOSAKI Motohiro
  0 siblings, 2 replies; 7+ messages in thread
From: Hugh Dickins @ 2009-03-31 11:17 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, lkml

On Tue, 31 Mar 2009, Minchan Kim wrote:
> 
> I don't know why we should call add_to_swap_cache with GFP_ATOMIC ?
> Is there a special something for avoiding blocking?

add_to_swap_cache itself does not need to be called with GFP_ATOMIC.

There are three places from which it is called:

read_swap_cache_async (typically used when faulting) masks the
gfp_mask coming in (typically GFP_HIGHUSER_MOVABLE for the pages
themselves) to call add_to_swap_cache typically with GFP_KERNEL.

shmem_writepage does call it with GFP_ATOMIC: that's because it's
holding the shmem_inode's spin_lock while it switches the page between
file cache and swap cache - IIRC holding page lock isn't quite enough
for that, because of other cases; but I've not thought that through
in a long time, we could re-examine if it troubles you.

The questionable one is add_to_swap (when vmscanning), which calls
it with __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, i.e. GFP_ATOMIC
plus __GFP_NOMEMALLOC|__GFP_NOWARN.  That one I have wondered
about from time to time: GFP_NOIO would be the obvious choice,
that's what swap_writepage will use to allocate bio soon after.

I've been tempted to change it, but afraid to touch that house
of cards, and afraid of long testing and justification required.
Would it be safe to drop that __GFP_HIGH?  What's the effect of the
__GFP_NOMEMALLOC (we've layer on layer of tweak this one way because
we're in the reclaim path so let it eat more, then tweak it the other
way because we don't want it to eat up _too_ much).  I just let it stay.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: add_to_swap_cache with GFP_ATOMIC ?
  2009-03-31 11:17 ` Hugh Dickins
@ 2009-03-31 11:30   ` Minchan Kim
  2009-04-01  8:38   ` KOSAKI Motohiro
  1 sibling, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2009-03-31 11:30 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, lkml

Thanks for quick reply.

On Tue, Mar 31, 2009 at 8:17 PM, Hugh Dickins <hugh@veritas.com> wrote:
> On Tue, 31 Mar 2009, Minchan Kim wrote:
>>
>> I don't know why we should call add_to_swap_cache with GFP_ATOMIC ?
>> Is there a special something for avoiding blocking?
>
> add_to_swap_cache itself does not need to be called with GFP_ATOMIC.
>
> There are three places from which it is called:
>
> read_swap_cache_async (typically used when faulting) masks the
> gfp_mask coming in (typically GFP_HIGHUSER_MOVABLE for the pages
> themselves) to call add_to_swap_cache typically with GFP_KERNEL.
>
> shmem_writepage does call it with GFP_ATOMIC: that's because it's
> holding the shmem_inode's spin_lock while it switches the page between
> file cache and swap cache - IIRC holding page lock isn't quite enough
> for that, because of other cases; but I've not thought that through
> in a long time, we could re-examine if it troubles you.

Yes. My point was that. but I am not sure what are other cases. :(
Now, It don't hurt me but not sure in future.

> The questionable one is add_to_swap (when vmscanning), which calls
> it with __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, i.e. GFP_ATOMIC
> plus __GFP_NOMEMALLOC|__GFP_NOWARN.  That one I have wondered
> about from time to time: GFP_NOIO would be the obvious choice,
> that's what swap_writepage will use to allocate bio soon after.
>
> I've been tempted to change it, but afraid to touch that house
> of cards, and afraid of long testing and justification required.
> Would it be safe to drop that __GFP_HIGH?  What's the effect of the
> __GFP_NOMEMALLOC (we've layer on layer of tweak this one way because
> we're in the reclaim path so let it eat more, then tweak it the other
> way because we don't want it to eat up _too_ much).  I just let it stay.

Sigh.
What a complex thing to change one line.
Thanks for kind explanation.

> Hugh
>



-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: add_to_swap_cache with GFP_ATOMIC ?
  2009-03-31 11:17 ` Hugh Dickins
  2009-03-31 11:30   ` Minchan Kim
@ 2009-04-01  8:38   ` KOSAKI Motohiro
  2009-04-01  9:13     ` KOSAKI Motohiro
  2009-04-01 11:46     ` Hugh Dickins
  1 sibling, 2 replies; 7+ messages in thread
From: KOSAKI Motohiro @ 2009-04-01  8:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Minchan Kim, linux-mm, lkml, Andrew Morton,
	Christoph Lameter, Nick Piggin

(cc to related person)

> The questionable one is add_to_swap (when vmscanning), which calls
> it with __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, i.e. GFP_ATOMIC
> plus __GFP_NOMEMALLOC|__GFP_NOWARN.  That one I have wondered
> about from time to time: GFP_NOIO would be the obvious choice,
> that's what swap_writepage will use to allocate bio soon after.
> 
> I've been tempted to change it, but afraid to touch that house
> of cards, and afraid of long testing and justification required.
> Would it be safe to drop that __GFP_HIGH?  What's the effect of the
> __GFP_NOMEMALLOC (we've layer on layer of tweak this one way because
> we're in the reclaim path so let it eat more, then tweak it the other
> way because we don't want it to eat up _too_ much).  I just let it stay.

firstly, following some patch indicate add_to_swap() parameter history.

ac47b003d03c2a4f28aef1d505b66d24ad191c4f(Hugh, Jan 6 2009) reverted 
1480a540c98525640174a7eadd712378fcd6fd63(Cristoph, Jan 8 2006).

bd53b714d32a29bdf33009f812e295667e92b930(Nick, May 1 2005) added
__GFP_NOMEMALLOC. __GFP_NOMEMALLOC mean "please don't eat emergency memory".

c4b3efde0744e038d16b33d18110d39e762ef80c(akpm, Jan 6 2003) explained
why no using emergency memory is better.
it said 

    In the case of adding pages to swapcache we're still using GFP_ATOMIC, so
    these addition attempts can still fail.  That's OK, because the error is
    handled and, unlike file pages, it will not cause user applicaton failures.


IOW, GFP_ATOMIC on add_to_swap() was introduced accidentally. the reason 
was old add_to_page_cache() didn't have gfp_mask parameter and we didn't
 have the reason of changing add_to_swap() behavior.
I think it don't have deeply reason and changing GFP_NOIO don't cause regression.



---------------------------------------------

commit ac47b003d03c2a4f28aef1d505b66d24ad191c4f
Author: Hugh Dickins <hugh@veritas.com>
Date:   Tue Jan 6 14:39:39 2009 -0800

    mm: remove gfp_mask from add_to_swap

    Remove gfp_mask argument from add_to_swap(): it's misleading because its
    only caller, shrink_page_list(), is not atomic at that point; and in due
    course (implementing discard) we'll sometimes want to allocate some memory
    with GFP_NOIO (as is used in swap_writepage) when allocating swap.

    No change to the gfp_mask passed down to add_to_swap_cache(): still use
    __GFP_HIGH without __GFP_WAIT (with nomemalloc and nowarn as before):
    though it's not obvious if that's the best combination to ask for here.

-int add_to_swap(struct page * page, gfp_t gfp_mask)
+int add_to_swap(struct page *page)
 {
        swp_entry_t entry;
        int err;
@@ -153,7 +153,7 @@ int add_to_swap(struct page * page, gfp_t gfp_mask)
                 * Add it to the swap cache and mark it dirty
                 */
                err = add_to_swap_cache(page, entry,
-                               gfp_mask|__GFP_NOMEMALLOC|__GFP_NOWARN);
+                               __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);

---------------------------------------------

commit 1480a540c98525640174a7eadd712378fcd6fd63
Author: Christoph Lameter <clameter@sgi.com>
Date:   Sun Jan 8 01:00:53 2006 -0800

    [PATCH] SwapMig: add_to_swap() avoid atomic allocations

    Add gfp_mask to add_to_swap

    add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
    swapping.  During migration we may have use add_to_swap extensively which may
    lead to out of memory errors.

    This patch makes add_to_swap take a parameter that specifies the gfp mask.
    The page migration code can then make add_to_swap use GFP_KERNEL.

-int add_to_swap(struct page * page)
+int add_to_swap(struct page * page, gfp_t gfp_mask)
 {
        swp_entry_t entry;
        int err;
@@ -166,7 +166,7 @@ int add_to_swap(struct page * page)
                 * Add it to the swap cache and mark it dirty
                 */
                err = __add_to_swap_cache(page, entry,
-                               GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_NOWARN);
+                               gfp_mask|__GFP_NOMEMALLOC|__GFP_NOWARN);

---------------------------------------------

commit bd53b714d32a29bdf33009f812e295667e92b930
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Sun May 1 08:58:37 2005 -0700

    [PATCH] mm: use __GFP_NOMEMALLOC

    Use the new __GFP_NOMEMALLOC to simplify the previous handling of
    PF_MEMALLOC.

    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

@@ -154,29 +153,19 @@ int add_to_swap(struct page * page)
                if (!entry.val)
                        return 0;

-               /* Radix-tree node allocations are performing
-                * GFP_ATOMIC allocations under PF_MEMALLOC.
-                * They can completely exhaust the page allocator.
-                *
-                * So PF_MEMALLOC is dropped here.  This causes the slab
-                * allocations to fail earlier, so radix-tree nodes will
-                * then be allocated from the mempool reserves.
+               /*
+                * Radix-tree node allocations from PF_MEMALLOC contexts could
+                * completely exhaust the page allocator. __GFP_NOMEMALLOC
+                * stops emergency reserves from being allocated.
                 *
-                * We're still using __GFP_HIGH for radix-tree node
-                * allocations, so some of the emergency pools are available,
-                * just not all of them.
+                * TODO: this could cause a theoretical memory reclaim
+                * deadlock in the swap out path.
                 */
-
-               pf_flags = current->flags;
-               current->flags &= ~PF_MEMALLOC;
-
                /*
                 * Add it to the swap cache and mark it dirty
                 */
-               err = __add_to_swap_cache(page, entry, GFP_ATOMIC|__GFP_NOWARN);
-
-               if (pf_flags & PF_MEMALLOC)
-                       current->flags |= PF_MEMALLOC;
+               err = __add_to_swap_cache(page, entry,
+                               GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_NOWARN);

---------------------------------------------
commit b84a35be0285229b0a8a5e2e04d79360c5b75562
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Sun May 1 08:58:36 2005 -0700

    [PATCH] mempool: NOMEMALLOC and NORETRY

    Mempools have 2 problems.

    The first is that mempool_alloc can possibly get stuck in __alloc_pages
    when they should opt to fail, and take an element from their reserved pool.

    The second is that it will happily eat emergency PF_MEMALLOC reserves
    instead of going to their reserved pools.

    Fix the first by passing __GFP_NORETRY in the allocation calls in
    mempool_alloc.  Fix the second by introducing a __GFP_MEMPOOL flag which
    directs the page allocator not to allocate from the reserve pool.

    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

---------------------------------------------

commit c22eeaf4682941543d1426621ca1e3c6d3833fe2
Author: akpm <akpm>
Date:   Fri Sep 3 17:20:58 2004 +0000

    [PATCH] add_to_swap(): suppress oom message

    Page allocation failures are expected when add_to_swap() tries to allocate
    radix-tree nodes without PF_MEMALLOC.  Kill the __alloc_pages() warnings.

    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

    BKrev: 4138a7faauXIloxI3vwZ04xvfa1paQ

@@ -171,7 +171,7 @@ int add_to_swap(struct page * page)
                /*
                 * Add it to the swap cache and mark it dirty
                 */
-               err = __add_to_swap_cache(page, entry, GFP_ATOMIC);
+               err = __add_to_swap_cache(page, entry, GFP_ATOMIC|__GFP_NOWARN);

                if (pf_flags & PF_MEMALLOC)
                        current->flags |= PF_MEMALLOC;


---------------------------------------------

commit c4b3efde0744e038d16b33d18110d39e762ef80c
Author: akpm <akpm>
Date:   Mon Jan 6 03:51:29 2003 +0000

    [PATCH] handle radix_tree_node allocation failures

    This patch uses the radix_tree_preload() API in add_to_page_cache().

    A new gfp_mask argument is added to add_to_page_cache(), which is then passed
    on to radix_tree_preload().   It's pretty simple.

    In the case of adding pages to swapcache we're still using GFP_ATOMIC, so
    these addition attempts can still fail.  That's OK, because the error is
    handled and, unlike file pages, it will not cause user applicaton failures.
    This codepath (radix-tree node exhaustion on swapout) was well tested in the
    days when the swapper_space radix tree was fragmented all over the place due
    to unfortunate swp_entry bit layout.

    BKrev: 3e18fd41cbfAQY7P7NcjQYCevjYG2g

@@ -149,7 +149,8 @@ int add_to_swap(struct page * page)
                /*
                 * Add it to the swap cache and mark it dirty
                 */
-               err = add_to_page_cache(page, &swapper_space, entry.val);
+               err = add_to_page_cache(page, &swapper_space,
+                                       entry.val, GFP_ATOMIC);

---------------------------------------------

commit e9cb95284376950d34943acd2ddd33a473ba9ba3
Author: torvalds <torvalds>
Date:   Tue Dec 3 22:59:18 2002 +0000

    Merge master.kernel.org:/home/hch/BK/xfs/linux-2.5
    into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux


@@ -149,9 +150,15 @@ int add_to_swap(struct page * page)
                /*
                 * Add it to the swap cache and mark it dirty
                 */
-               switch (add_to_page_cache(page, &swapper_space, entry.val)) {
+               err = add_to_page_cache(page, &swapper_space, entry.val);
+
+               if (!(pf_flags & PF_NOWARN))
+                       current->flags &= ~PF_NOWARN;
+               if (pf_flags & PF_MEMALLOC)
+                       current->flags |= PF_MEMALLOC;
+
+               switch (err) {



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: add_to_swap_cache with GFP_ATOMIC ?
  2009-04-01  8:38   ` KOSAKI Motohiro
@ 2009-04-01  9:13     ` KOSAKI Motohiro
  2009-04-01 11:46     ` Hugh Dickins
  1 sibling, 0 replies; 7+ messages in thread
From: KOSAKI Motohiro @ 2009-04-01  9:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Minchan Kim, linux-mm, lkml, Andrew Morton,
	Christoph Lameter, Nick Piggin

> IOW, GFP_ATOMIC on add_to_swap() was introduced accidentally. the reason 
> was old add_to_page_cache() didn't have gfp_mask parameter and we didn't
>  have the reason of changing add_to_swap() behavior.
> I think it don't have deeply reason and changing GFP_NOIO don't cause regression.

"accidentally" is wrong word obiously. I mean "non strong intention".


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: add_to_swap_cache with GFP_ATOMIC ?
  2009-04-01  8:38   ` KOSAKI Motohiro
  2009-04-01  9:13     ` KOSAKI Motohiro
@ 2009-04-01 11:46     ` Hugh Dickins
  2009-04-01 13:30       ` KOSAKI Motohiro
  1 sibling, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2009-04-01 11:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Minchan Kim, linux-mm, lkml, Andrew Morton, Christoph Lameter,
	Nick Piggin

On Wed, 1 Apr 2009, KOSAKI Motohiro wrote:
> 
> IOW, GFP_ATOMIC on add_to_swap() was introduced accidentally. the reason 
> was old add_to_page_cache() didn't have gfp_mask parameter and we didn't
>  have the reason of changing add_to_swap() behavior.
> I think it don't have deeply reason and changing GFP_NOIO
> don't cause regression.

You may well be right: we'll see if you send in a patch to change it.
But I won't be sending in that patch myself, that's all :)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: add_to_swap_cache with GFP_ATOMIC ?
  2009-04-01 11:46     ` Hugh Dickins
@ 2009-04-01 13:30       ` KOSAKI Motohiro
  0 siblings, 0 replies; 7+ messages in thread
From: KOSAKI Motohiro @ 2009-04-01 13:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Minchan Kim, linux-mm, lkml, Andrew Morton, Christoph Lameter,
	Nick Piggin

>> IOW, GFP_ATOMIC on add_to_swap() was introduced accidentally. the reason
>> was old add_to_page_cache() didn't have gfp_mask parameter and we didn't
>>  have the reason of changing add_to_swap() behavior.
>> I think it don't have deeply reason and changing GFP_NOIO
>> don't cause regression.
>
> You may well be right: we'll see if you send in a patch to change it.
> But I won't be sending in that patch myself, that's all :)

OK, I'll queue it on my local patch queue. thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-04-01 13:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-31 10:38 add_to_swap_cache with GFP_ATOMIC ? Minchan Kim
2009-03-31 11:17 ` Hugh Dickins
2009-03-31 11:30   ` Minchan Kim
2009-04-01  8:38   ` KOSAKI Motohiro
2009-04-01  9:13     ` KOSAKI Motohiro
2009-04-01 11:46     ` Hugh Dickins
2009-04-01 13:30       ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox