linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, hannes@cmpxchg.org
Subject: Re: [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
Date: Thu, 6 Aug 2015 20:50:27 +0900	[thread overview]
Message-ID: <201508062050.CAF21340.FJSOQOHVOLMtFF@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20150805140230.GF11176@dhcp22.suse.cz>

Michal Hocko wrote:
> On Wed 05-08-15 21:28:39, Tetsuo Handa wrote:
> > Reduced to only linux-mm.
> > 
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > 
> > > GFP_NOFS allocations are not allowed to invoke the OOM killer since
> > > their reclaim abilities are severely diminished.  However, without the
> > > OOM killer available there is no hope of progress once the reclaimable
> > > pages have been exhausted.
> > 
> > Excuse me, but I still cannot understand. Why are !__GFP_FS allocations
> > considered as "their reclaim abilities are severely diminished"?
> > 
> > It seems to me that not only GFP_NOFS allocation requests but also
> > almost all types of memory allocation requests do not include
> > __GFP_NO_KSWAPD flag.
> 
> __GFP_NO_KSWAPD is not to be used outside of very specific cases.
> 
> > Therefore, while a thread which called __alloc_pages_slowpath(GFP_NOFS)
> > cannot reclaim FS memory, I assume that kswapd kernel threads which are
> > woken up by the thread via wakeup_kswapd() via wake_all_kswapds() can
> > reclaim FS memory by calling balance_pgdat(). Is this assumption correct?
> 
> yes.
> 
OK. Then, it sounds to me that

  GFP_NOFS allocations' reclaim abilities are severely diminished as of
  reaching __alloc_pages_may_oom() for the first time of their allocation.
  But as time goes by, kswapd which has full reclaim abilities will reclaim
  memory which GFP_NOFS cannot reclaim. Thus, GFP_NOFS allocations' reclaim
  abilities is nearly equals to GFP_KERNEL if they waited for enough time.
  Therefore, GFP_NOFS allocations are allowed to invoke the OOM killer
  if they waited for enough time.

and the problem is that we don't have a trigger to teach that "You have
waited for enough duration but memory is still tight. Therefore, you can
invoke the OOM killer."

> > If the assumption is correct, when kswapd kernel threads returned from
> > balance_pgdat() or got stuck inside reclaiming functions (e.g. blocked at
> > mutex_lock() inside slab's shrinker functions), I think that the thread
> > which called __alloc_pages_slowpath(GFP_NOFS) has reclaimed FS memory
> > as if the thread called __alloc_pages_slowpath(GFP_KERNEL), and therefore
> > the thread qualifies calling out_of_memory() as with __GFP_FS allocations.
> 
> You are missing an important point. We are talking about OOM situation
> here. Which means that the background reclaim is not able to make
> sufficient progress and neither is the direct reclaim.

My worry here is about nearly OOM situation.

Generally, __GFP_WAIT allocations are more likely to succeed than
!__GFP_WAIT allocations. Therefore, GFP_ATOMIC allocations include
__GFP_HIGH in order to pass __zone_watermark_ok() when !__GFP_HIGH
allocations fail.

GFP_NOFS allocations include __GFP_WAIT but does not include __GFP_HIGH.
GFP_NOFS allocations will fail __zone_watermark_ok() when GFP_ATOMIC
allocations will pass. Thus, GFP_NOFS allocations retrying forever unless
TIF_MEMDIE is set is the toehold of likeliness of succeeding memory
allocation (except for the deadlock problem).

This patch changes !__GFP_FS allocations not to retry unless __GFP_NOFAIL is
set. I worry that we are going to make !__GFP_FS allocations less reliable
than GFP_ATOMIC allocations because the former is "close to !__GFP_WAIT" and
!__GFP_HIGH whereas the latter is "indeed !__GFP_WAIT" and __GFP_HIGH.

Therefore, I worry that, under nearly OOM condition where waiting for kswapd
kernel threads for a few seconds will reclaim FS memory which will be enough
to succeed the !__GFP_FS allocations, GFP_NOFS allocations start failing
prematurely. The toehold (reliability by __GFP_WAIT) is almost gone.

Therefore, I'm tempted to add __GFP_NOFAIL to GFP_NOFS/GFP_NOIO allocations.
If __GFP_NOFAIL is added, they will start calling out_of_memory() even under
nearly OOM condition where waiting for kswapd kernel threads for a few seconds
will reclaim memory which will be enough to succeed the GFP_NOFS/GFP_NOIO
allocations. The bad end is that out_of_memory() is called needlessly/frequently
than now, and I worry that OOM deadlock problem or depletion of memory reserves
occurs more likely than now due to a lot of __GFP_NOFAIL allocations.

Maybe, I'm tempted to replace GFP_NOFS/GFP_NOIO allocations with GFP_ATOMIC
allocations ( http://marc.info/?l=linux-xfs&m=142520873721204&w=2 ).

>                                                        While the
> GFP_IOFS requests are allowed to make a (V)FS activity which _might_
> help GFP_NOFS is not by definition. And that is why this reclaim context
> is less capable. Well to be more precise we do not perform IO (other
> than the swapout) from the direct reclaim context because of the stack
> restrictions so even GPF_IOFS is not _that_ strong but shrinkers are
> still free to do metadata specific actions.
>  
> > > Don't risk hanging these allocations.  Leave it to the allocation site
> > > to implement the fallback policy for failing allocations.
> > 
> > Are there memory pages which kswapd kernel threads cannot reclaim
> > but __alloc_pages_slowpath(GFP_KERNEL) allocations can reclaim
> > when __alloc_pages_slowpath(GFP_NOFS) allocations are hanging?
> 
> See above and have a look at the particular shrinkers code (e.g.
> super_cache_scan).

super_cache_scan() checks for __GFP_FS upon entry. If kswapd kernel threads
can call super_cache_scan() with GFP_KERNEL context, kswapd kernel threads
can reclaim. Thus, the answer to this question is "no" because I assume that
kswapd kernel threads can call super_cache_scan() with GFP_KERNEL context.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-08-06 11:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-05  9:51 [RFC 0/8] Allow GFP_NOFS allocation to fail mhocko
2015-08-05  9:51 ` [RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko
2015-08-05  9:51 ` [RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation mhocko
2015-08-05  9:51 ` [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM mhocko
2015-08-05 12:28   ` Tetsuo Handa
2015-08-05 14:02     ` Michal Hocko
2015-08-06 11:50       ` Tetsuo Handa [this message]
2015-08-12  9:11         ` Michal Hocko
2015-08-16 14:04           ` Tetsuo Handa
2015-08-05  9:51 ` [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure mhocko
2015-08-05 11:42   ` Jan Kara
2015-08-05 16:49   ` Greg Thelen
2015-08-12  9:14     ` Michal Hocko
2015-08-15 13:54       ` Theodore Ts'o
2015-08-18 10:36         ` Michal Hocko
2015-08-24 12:06         ` Michal Hocko
2015-08-18 10:38   ` [RFC -v2 " Michal Hocko
2015-08-05  9:51 ` [RFC 5/8] ext4: Do not fail journal due to block allocator mhocko
2015-08-05 11:43   ` Jan Kara
2015-08-18 10:39   ` [RFC -v2 " Michal Hocko
2015-08-18 10:55     ` Michal Hocko
2015-08-05  9:51 ` [RFC 6/8] ext3: Do not abort journal prematurely mhocko
2015-08-18 10:39   ` [RFC -v2 " Michal Hocko
2015-08-05  9:51 ` [RFC 7/8] btrfs: Prevent from early transaction abort mhocko
2015-08-05 16:31   ` David Sterba
2015-08-18 10:40   ` [RFC -v2 " Michal Hocko
2015-08-18 11:01     ` Michal Hocko
2015-08-18 17:11     ` Chris Mason
2015-08-18 17:29       ` Michal Hocko
2015-08-19 12:26         ` Michal Hocko
2015-08-05  9:51 ` [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio mhocko
2015-08-05 16:32   ` David Sterba
2015-08-18 10:41   ` [RFC -v2 " Michal Hocko
2015-08-05 19:58 ` [RFC 0/8] Allow GFP_NOFS allocation to fail Andreas Dilger
2015-08-06 14:34 ` Michal Hocko
2015-09-07 16:51 ` Tetsuo Handa
2015-09-15 13:16   ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201508062050.CAF21340.FJSOQOHVOLMtFF@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox