Re: [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
	David Rientjes <rientjes@google.com>,
	Michal Hocko <mhocko@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
Date: Fri, 18 Sep 2015 15:25:41 +0900	[thread overview]
Message-ID: <20150918062541.GA7769@js1304-P5Q-DELUXE> (raw)
In-Reply-To: <20150909122203.GY12432@techsingularity.net>

On Wed, Sep 09, 2015 at 01:22:03PM +0100, Mel Gorman wrote:
> On Tue, Sep 08, 2015 at 03:49:58PM +0900, Joonsoo Kim wrote:
> > 2015-08-24 21:09 GMT+09:00 Mel Gorman <mgorman@techsingularity.net>:
> > > __GFP_WAIT has been used to identify atomic context in callers that hold
> > > spinlocks or are in interrupts. They are expected to be high priority and
> > > have access one of two watermarks lower than "min" which can be referred
> > > to as the "atomic reserve". __GFP_HIGH users get access to the first lower
> > > watermark and can be called the "high priority reserve".
> > >
> > > Over time, callers had a requirement to not block when fallback options
> > > were available. Some have abused __GFP_WAIT leading to a situation where
> > > an optimisitic allocation with a fallback option can access atomic reserves.
> > >
> > > This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
> > > cannot sleep and have no alternative. High priority users continue to use
> > > __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are
> > > willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers
> > > that want to wake kswapd for background reclaim. __GFP_WAIT is redefined
> > > as a caller that is willing to enter direct reclaim and wake kswapd for
> > > background reclaim.
> > 
> > Hello, Mel.
> > 
> > I think that it is better to do one thing at one patch.
> 
> This was a case where the incremental change felt unnecessary. The purpose
> of the patch is to "distinguish between being unable to sleep, unwilling
> to sleep and avoiding waking kswapd". Splitting that up is possible but
> I'm not convinced it helps.
> 
> > To distinguish real atomic, we just need to introduce __GFP_ATOMIC and
> > make GFP_ATOMIC to __GFP_ATOMIC | GFP_HARDER and change related
> > things. __GFP_WAIT changes isn't needed at all for this purpose. It can
> > reduce patch size and provides more good bisectability.
> > 
> > And, I don't think that introducing __GFP_KSWAPD_RECLAIM is good thing.
> > Basically, kswapd reclaim should be enforced.
> 
> Several years ago, I would have agreed. Now there are callers that want
> to control kswapd and I think it made more sense to clearly state whether
> RECLAIM and KSWAPD are allowed instead of having RECLAIM and NO_KSWAPD
> flags -- i.e. flags that consistently allow or consistently deny.
> 
> > New flag makes user who manually
> > manipulate gfp flag more difficult. Without this change, your second hazard will
> > be disappeared although it is almost harmless.
> > 
> > And, I doubt that this big one shot change is preferable. AFAIK, even if changes
> > are one to one mapping and no functional difference, each one is made by
> > one patch and send it to correct maintainer. I guess there is some difficulty
> > in this patch to do like this, but, it could. Isn't it?
> > 
> 
> Splitting this into one patch per maintainer would be a review and bisection
> nightmare. If I saw someone else doing that I would wonder if they were
> just trying to increase their patch count for no reason.
> 
> > Some nitpicks are below.
> > 
> > > <SNIP>
> > >
> > > diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> > > index 03e75fef15b8..86809bd2026d 100644
> > > --- a/arch/arm/xen/mm.c
> > > +++ b/arch/arm/xen/mm.c
> > > @@ -25,7 +25,7 @@
> > >  unsigned long xen_get_swiotlb_free_pages(unsigned int order)
> > >  {
> > >         struct memblock_region *reg;
> > > -       gfp_t flags = __GFP_NOWARN;
> > > +       gfp_t flags = __GFP_NOWARN|___GFP_KSWAPD_RECLAIM;
> > 
> > Please use __XXX rather than ___XXX.
> > 
> 
> Fixed.
> 
> > > <SNIP>
> > >
> > > @@ -457,13 +457,13 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> > >                  * We solve this, and guarantee forward progress, with a rescuer
> > >                  * workqueue per bio_set. If we go to allocate and there are
> > >                  * bios on current->bio_list, we first try the allocation
> > > -                * without __GFP_WAIT; if that fails, we punt those bios we
> > > -                * would be blocking to the rescuer workqueue before we retry
> > > -                * with the original gfp_flags.
> > > +                * without __GFP_DIRECT_RECLAIM; if that fails, we punt those
> > > +                * bios we would be blocking to the rescuer workqueue before
> > > +                * we retry with the original gfp_flags.
> > >                  */
> > >
> > >                 if (current->bio_list && !bio_list_empty(current->bio_list))
> > > -                       gfp_mask &= ~__GFP_WAIT;
> > > +                       gfp_mask &= ~__GFP_DIRECT_RECLAIM;
> > 
> > How about introduce helper function to mask out __GFP_DIRECT_RECLAIM?
> > It can be used many places.
> > 
> 
> In this case, the pattern for removing a single flag is easier to recognise
> than a helper whose implementation must be examined.
> 
> > >                 p = mempool_alloc(bs->bio_pool, gfp_mask);
> > >                 if (!p && gfp_mask != saved_gfp) {
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 627ed0c593fb..e3605acaaffc 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -1156,8 +1156,8 @@ static struct request *__get_request(struct request_list *rl, int rw_flags,
> > >   * @bio: bio to allocate request for (can be %NULL)
> > >   * @gfp_mask: allocation mask
> > >   *
> > > - * Get a free request from @q.  If %__GFP_WAIT is set in @gfp_mask, this
> > > - * function keeps retrying under memory pressure and fails iff @q is dead.
> > > + * Get a free request from @q.  If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
> > > + * this function keeps retrying under memory pressure and fails iff @q is dead.
> > >   *
> > >   * Must be called with @q->queue_lock held and,
> > >   * Returns ERR_PTR on failure, with @q->queue_lock held.
> > > @@ -1177,7 +1177,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
> > >         if (!IS_ERR(rq))
> > >                 return rq;
> > >
> > > -       if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) {
> > > +       if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
> > >                 blk_put_rl(rl);
> > >                 return rq;
> > >         }
> > > @@ -1255,11 +1255,11 @@ EXPORT_SYMBOL(blk_get_request);
> > >   * BUG.
> > >   *
> > >   * WARNING: When allocating/cloning a bio-chain, careful consideration should be
> > > - * given to how you allocate bios. In particular, you cannot use __GFP_WAIT for
> > > - * anything but the first bio in the chain. Otherwise you risk waiting for IO
> > > - * completion of a bio that hasn't been submitted yet, thus resulting in a
> > > - * deadlock. Alternatively bios should be allocated using bio_kmalloc() instead
> > > - * of bio_alloc(), as that avoids the mempool deadlock.
> > > + * given to how you allocate bios. In particular, you cannot use
> > > + * __GFP_DIRECT_RECLAIM for anything but the first bio in the chain. Otherwise
> > > + * you risk waiting for IO completion of a bio that hasn't been submitted yet,
> > > + * thus resulting in a deadlock. Alternatively bios should be allocated using
> > > + * bio_kmalloc() instead of bio_alloc(), as that avoids the mempool deadlock.
> > >   * If possible a big IO should be split into smaller parts when allocation
> > >   * fails. Partial allocation should not be an error, or you risk a live-lock.
> > >   */
> > > diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> > > index 1a27f45ec776..381cb50a673c 100644
> > > --- a/block/blk-ioc.c
> > > +++ b/block/blk-ioc.c
> > > @@ -289,7 +289,7 @@ struct io_context *get_task_io_context(struct task_struct *task,
> > >  {
> > >         struct io_context *ioc;
> > >
> > > -       might_sleep_if(gfp_flags & __GFP_WAIT);
> > > +       might_sleep_if(gfpflags_allow_blocking(gfp_flags));
> > >
> > >         do {
> > >                 task_lock(task);
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 9b6e28830b82..a8b46659ce4e 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -264,7 +264,7 @@ static int bt_get(struct blk_mq_alloc_data *data,
> > >         if (tag != -1)
> > >                 return tag;
> > >
> > > -       if (!(data->gfp & __GFP_WAIT))
> > > +       if (!gfpflags_allow_blocking(data->gfp))
> > >                 return -1;
> > >
> > >         bs = bt_wait_ptr(bt, hctx);
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 7d842db59699..7d80379d7a38 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -85,7 +85,7 @@ static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
> > >                 if (percpu_ref_tryget_live(&q->mq_usage_counter))
> > >                         return 0;
> > >
> > > -               if (!(gfp & __GFP_WAIT))
> > > +               if (!gfpflags_allow_blocking(gfp))
> > >                         return -EBUSY;
> > >
> > >                 ret = wait_event_interruptible(q->mq_freeze_wq,
> > > @@ -261,11 +261,11 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, gfp_t gfp,
> > >
> > >         ctx = blk_mq_get_ctx(q);
> > >         hctx = q->mq_ops->map_queue(q, ctx->cpu);
> > > -       blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_WAIT,
> > > +       blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_DIRECT_RECLAIM,
> > >                         reserved, ctx, hctx);
> > >
> > >         rq = __blk_mq_alloc_request(&alloc_data, rw);
> > > -       if (!rq && (gfp & __GFP_WAIT)) {
> > > +       if (!rq && (gfp & __GFP_DIRECT_RECLAIM)) {
> > >                 __blk_mq_run_hw_queue(hctx);
> > >                 blk_mq_put_ctx(ctx);
> > 
> > Is there any reason not to use gfpflags_allow_nonblocking() here?
> > There are some places not using this helper and reason isn't
> > specified.
> > 
> 
> Strictly speaking the helper could be used. However, in cases where the
> same function manipulates or examines the flag in any way, I did not use
> the helper. It's in all those cases, I thought the final result was
> easier to follow.
> > >
> > >  /*
> > > + * A caller that is willing to wait may enter direct reclaim and will
> > > + * wake kswapd to reclaim pages in the background until the high
> > > + * watermark is met. A caller may wish to clear __GFP_DIRECT_RECLAIM to
> > > + * avoid unnecessary delays when a fallback option is available but
> > > + * still allow kswapd to reclaim in the background. The kswapd flag
> > > + * can be cleared when the reclaiming of pages would cause unnecessary
> > > + * disruption.
> > > + */
> > > +#define __GFP_WAIT (__GFP_DIRECT_RECLAIM|__GFP_KSWAPD_RECLAIM)
> > 
> > Convention is that combination of gfp flags don't use __XXX.
> > 
> 
> I don't understand. GFP_MOVABLE_MASK, GFP_USER and a bunch of other
> combinations use __XXX.

Hello, Mel.
Sorry for late response.

Yes, GFP_XXX can consist of multiple __GFP_XXX.
But, __GFP_XXX doesn't consist of multiple __GFP_YYY.
Your __GFP_WAIT seems to be a first one.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-09-18  6:25 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-24 12:09 [PATCH 00/12] Remove zonelist cache and high-order watermark checking v3 Mel Gorman
2015-08-24 12:09 ` [PATCH 01/12] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-08-24 12:09 ` [PATCH 02/12] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-08-24 12:09 ` [PATCH 03/12] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-08-26 10:25   ` Michal Hocko
2015-08-24 12:09 ` [PATCH 04/12] mm, page_alloc: Only check cpusets when one exists that can be mem-controlled Mel Gorman
2015-08-24 12:37   ` Vlastimil Babka
2015-08-24 13:16     ` Mel Gorman
2015-08-24 20:53       ` Vlastimil Babka
2015-08-25 10:33         ` Mel Gorman
2015-08-25 11:09           ` Vlastimil Babka
2015-08-26 13:41             ` Mel Gorman
2015-08-26 10:46   ` Michal Hocko
2015-08-24 12:09 ` [PATCH 05/12] mm, page_alloc: Remove unecessary recheck of nodemask Mel Gorman
2015-08-25 14:32   ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 06/12] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-08-25 14:36   ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd Mel Gorman
2015-08-24 18:29   ` Mel Gorman
2015-08-25 15:37   ` Vlastimil Babka
2015-08-26 14:45     ` Mel Gorman
2015-08-26 16:24       ` Vlastimil Babka
2015-08-26 18:10         ` Mel Gorman
2015-08-27  9:18           ` Vlastimil Babka
2015-08-25 15:48   ` Vlastimil Babka
2015-08-26 13:05   ` Michal Hocko
2015-09-08  6:49   ` Joonsoo Kim
2015-09-09 12:22     ` Mel Gorman
2015-09-18  6:25       ` Joonsoo Kim [this message]
2015-08-24 12:09 ` [PATCH 08/12] mm, page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM Mel Gorman
2015-08-26 12:19   ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 09/12] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-08-24 12:29 ` [PATCH 10/12] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-08-24 12:29 ` [PATCH 11/12] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-08-26 12:44   ` Vlastimil Babka
2015-08-26 14:53   ` Michal Hocko
2015-08-26 15:38     ` Mel Gorman
2015-09-08  8:01   ` Joonsoo Kim
2015-09-09 12:32     ` Mel Gorman
2015-09-18  6:38       ` Joonsoo Kim
2015-09-21 10:51         ` Mel Gorman
2015-08-24 12:30 ` [PATCH 12/12] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-08-26 13:42   ` Vlastimil Babka
2015-08-26 14:53     ` Mel Gorman
2015-08-28 12:10   ` Michal Hocko
2015-08-28 14:12     ` Mel Gorman
2015-09-08  8:26   ` Joonsoo Kim
2015-09-09 12:39     ` Mel Gorman
2015-09-18  6:56       ` Joonsoo Kim
2015-09-21 10:51         ` Mel Gorman
2015-09-30  8:51       ` Vitaly Wool
2015-09-30 13:52         ` Vlastimil Babka
2015-09-30 14:16           ` Vitaly Wool
2015-09-30 14:43             ` Vlastimil Babka
2015-09-30 15:18               ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150918062541.GA7769@js1304-P5Q-DELUXE \
    --to=iamjoonsoo.kim@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox