From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
David Rientjes <rientjes@google.com>,
Michal Hocko <mhocko@kernel.org>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
Date: Fri, 18 Sep 2015 15:25:41 +0900 [thread overview]
Message-ID: <20150918062541.GA7769@js1304-P5Q-DELUXE> (raw)
In-Reply-To: <20150909122203.GY12432@techsingularity.net>
On Wed, Sep 09, 2015 at 01:22:03PM +0100, Mel Gorman wrote:
> On Tue, Sep 08, 2015 at 03:49:58PM +0900, Joonsoo Kim wrote:
> > 2015-08-24 21:09 GMT+09:00 Mel Gorman <mgorman@techsingularity.net>:
> > > __GFP_WAIT has been used to identify atomic context in callers that hold
> > > spinlocks or are in interrupts. They are expected to be high priority and
> > > have access one of two watermarks lower than "min" which can be referred
> > > to as the "atomic reserve". __GFP_HIGH users get access to the first lower
> > > watermark and can be called the "high priority reserve".
> > >
> > > Over time, callers had a requirement to not block when fallback options
> > > were available. Some have abused __GFP_WAIT leading to a situation where
> > > an optimisitic allocation with a fallback option can access atomic reserves.
> > >
> > > This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
> > > cannot sleep and have no alternative. High priority users continue to use
> > > __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are
> > > willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers
> > > that want to wake kswapd for background reclaim. __GFP_WAIT is redefined
> > > as a caller that is willing to enter direct reclaim and wake kswapd for
> > > background reclaim.
> >
> > Hello, Mel.
> >
> > I think that it is better to do one thing at one patch.
>
> This was a case where the incremental change felt unnecessary. The purpose
> of the patch is to "distinguish between being unable to sleep, unwilling
> to sleep and avoiding waking kswapd". Splitting that up is possible but
> I'm not convinced it helps.
>
> > To distinguish real atomic, we just need to introduce __GFP_ATOMIC and
> > make GFP_ATOMIC to __GFP_ATOMIC | GFP_HARDER and change related
> > things. __GFP_WAIT changes isn't needed at all for this purpose. It can
> > reduce patch size and provides more good bisectability.
> >
> > And, I don't think that introducing __GFP_KSWAPD_RECLAIM is good thing.
> > Basically, kswapd reclaim should be enforced.
>
> Several years ago, I would have agreed. Now there are callers that want
> to control kswapd and I think it made more sense to clearly state whether
> RECLAIM and KSWAPD are allowed instead of having RECLAIM and NO_KSWAPD
> flags -- i.e. flags that consistently allow or consistently deny.
>
> > New flag makes user who manually
> > manipulate gfp flag more difficult. Without this change, your second hazard will
> > be disappeared although it is almost harmless.
> >
> > And, I doubt that this big one shot change is preferable. AFAIK, even if changes
> > are one to one mapping and no functional difference, each one is made by
> > one patch and send it to correct maintainer. I guess there is some difficulty
> > in this patch to do like this, but, it could. Isn't it?
> >
>
> Splitting this into one patch per maintainer would be a review and bisection
> nightmare. If I saw someone else doing that I would wonder if they were
> just trying to increase their patch count for no reason.
>
> > Some nitpicks are below.
> >
> > > <SNIP>
> > >
> > > diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> > > index 03e75fef15b8..86809bd2026d 100644
> > > --- a/arch/arm/xen/mm.c
> > > +++ b/arch/arm/xen/mm.c
> > > @@ -25,7 +25,7 @@
> > > unsigned long xen_get_swiotlb_free_pages(unsigned int order)
> > > {
> > > struct memblock_region *reg;
> > > - gfp_t flags = __GFP_NOWARN;
> > > + gfp_t flags = __GFP_NOWARN|___GFP_KSWAPD_RECLAIM;
> >
> > Please use __XXX rather than ___XXX.
> >
>
> Fixed.
>
> > > <SNIP>
> > >
> > > @@ -457,13 +457,13 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> > > * We solve this, and guarantee forward progress, with a rescuer
> > > * workqueue per bio_set. If we go to allocate and there are
> > > * bios on current->bio_list, we first try the allocation
> > > - * without __GFP_WAIT; if that fails, we punt those bios we
> > > - * would be blocking to the rescuer workqueue before we retry
> > > - * with the original gfp_flags.
> > > + * without __GFP_DIRECT_RECLAIM; if that fails, we punt those
> > > + * bios we would be blocking to the rescuer workqueue before
> > > + * we retry with the original gfp_flags.
> > > */
> > >
> > > if (current->bio_list && !bio_list_empty(current->bio_list))
> > > - gfp_mask &= ~__GFP_WAIT;
> > > + gfp_mask &= ~__GFP_DIRECT_RECLAIM;
> >
> > How about introduce helper function to mask out __GFP_DIRECT_RECLAIM?
> > It can be used many places.
> >
>
> In this case, the pattern for removing a single flag is easier to recognise
> than a helper whose implementation must be examined.
>
> > > p = mempool_alloc(bs->bio_pool, gfp_mask);
> > > if (!p && gfp_mask != saved_gfp) {
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 627ed0c593fb..e3605acaaffc 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -1156,8 +1156,8 @@ static struct request *__get_request(struct request_list *rl, int rw_flags,
> > > * @bio: bio to allocate request for (can be %NULL)
> > > * @gfp_mask: allocation mask
> > > *
> > > - * Get a free request from @q. If %__GFP_WAIT is set in @gfp_mask, this
> > > - * function keeps retrying under memory pressure and fails iff @q is dead.
> > > + * Get a free request from @q. If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
> > > + * this function keeps retrying under memory pressure and fails iff @q is dead.
> > > *
> > > * Must be called with @q->queue_lock held and,
> > > * Returns ERR_PTR on failure, with @q->queue_lock held.
> > > @@ -1177,7 +1177,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
> > > if (!IS_ERR(rq))
> > > return rq;
> > >
> > > - if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) {
> > > + if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
> > > blk_put_rl(rl);
> > > return rq;
> > > }
> > > @@ -1255,11 +1255,11 @@ EXPORT_SYMBOL(blk_get_request);
> > > * BUG.
> > > *
> > > * WARNING: When allocating/cloning a bio-chain, careful consideration should be
> > > - * given to how you allocate bios. In particular, you cannot use __GFP_WAIT for
> > > - * anything but the first bio in the chain. Otherwise you risk waiting for IO
> > > - * completion of a bio that hasn't been submitted yet, thus resulting in a
> > > - * deadlock. Alternatively bios should be allocated using bio_kmalloc() instead
> > > - * of bio_alloc(), as that avoids the mempool deadlock.
> > > + * given to how you allocate bios. In particular, you cannot use
> > > + * __GFP_DIRECT_RECLAIM for anything but the first bio in the chain. Otherwise
> > > + * you risk waiting for IO completion of a bio that hasn't been submitted yet,
> > > + * thus resulting in a deadlock. Alternatively bios should be allocated using
> > > + * bio_kmalloc() instead of bio_alloc(), as that avoids the mempool deadlock.
> > > * If possible a big IO should be split into smaller parts when allocation
> > > * fails. Partial allocation should not be an error, or you risk a live-lock.
> > > */
> > > diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> > > index 1a27f45ec776..381cb50a673c 100644
> > > --- a/block/blk-ioc.c
> > > +++ b/block/blk-ioc.c
> > > @@ -289,7 +289,7 @@ struct io_context *get_task_io_context(struct task_struct *task,
> > > {
> > > struct io_context *ioc;
> > >
> > > - might_sleep_if(gfp_flags & __GFP_WAIT);
> > > + might_sleep_if(gfpflags_allow_blocking(gfp_flags));
> > >
> > > do {
> > > task_lock(task);
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 9b6e28830b82..a8b46659ce4e 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -264,7 +264,7 @@ static int bt_get(struct blk_mq_alloc_data *data,
> > > if (tag != -1)
> > > return tag;
> > >
> > > - if (!(data->gfp & __GFP_WAIT))
> > > + if (!gfpflags_allow_blocking(data->gfp))
> > > return -1;
> > >
> > > bs = bt_wait_ptr(bt, hctx);
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 7d842db59699..7d80379d7a38 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -85,7 +85,7 @@ static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
> > > if (percpu_ref_tryget_live(&q->mq_usage_counter))
> > > return 0;
> > >
> > > - if (!(gfp & __GFP_WAIT))
> > > + if (!gfpflags_allow_blocking(gfp))
> > > return -EBUSY;
> > >
> > > ret = wait_event_interruptible(q->mq_freeze_wq,
> > > @@ -261,11 +261,11 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, gfp_t gfp,
> > >
> > > ctx = blk_mq_get_ctx(q);
> > > hctx = q->mq_ops->map_queue(q, ctx->cpu);
> > > - blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_WAIT,
> > > + blk_mq_set_alloc_data(&alloc_data, q, gfp & ~__GFP_DIRECT_RECLAIM,
> > > reserved, ctx, hctx);
> > >
> > > rq = __blk_mq_alloc_request(&alloc_data, rw);
> > > - if (!rq && (gfp & __GFP_WAIT)) {
> > > + if (!rq && (gfp & __GFP_DIRECT_RECLAIM)) {
> > > __blk_mq_run_hw_queue(hctx);
> > > blk_mq_put_ctx(ctx);
> >
> > Is there any reason not to use gfpflags_allow_nonblocking() here?
> > There are some places not using this helper and reason isn't
> > specified.
> >
>
> Strictly speaking the helper could be used. However, in cases where the
> same function manipulates or examines the flag in any way, I did not use
> the helper. It's in all those cases, I thought the final result was
> easier to follow.
> > >
> > > /*
> > > + * A caller that is willing to wait may enter direct reclaim and will
> > > + * wake kswapd to reclaim pages in the background until the high
> > > + * watermark is met. A caller may wish to clear __GFP_DIRECT_RECLAIM to
> > > + * avoid unnecessary delays when a fallback option is available but
> > > + * still allow kswapd to reclaim in the background. The kswapd flag
> > > + * can be cleared when the reclaiming of pages would cause unnecessary
> > > + * disruption.
> > > + */
> > > +#define __GFP_WAIT (__GFP_DIRECT_RECLAIM|__GFP_KSWAPD_RECLAIM)
> >
> > Convention is that combination of gfp flags don't use __XXX.
> >
>
> I don't understand. GFP_MOVABLE_MASK, GFP_USER and a bunch of other
> combinations use __XXX.
Hello, Mel.
Sorry for late response.
Yes, GFP_XXX can consist of multiple __GFP_XXX.
But, __GFP_XXX doesn't consist of multiple __GFP_YYY.
Your __GFP_WAIT seems to be a first one.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-09-18 6:25 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-24 12:09 [PATCH 00/12] Remove zonelist cache and high-order watermark checking v3 Mel Gorman
2015-08-24 12:09 ` [PATCH 01/12] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-08-24 12:09 ` [PATCH 02/12] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-08-24 12:09 ` [PATCH 03/12] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-08-26 10:25 ` Michal Hocko
2015-08-24 12:09 ` [PATCH 04/12] mm, page_alloc: Only check cpusets when one exists that can be mem-controlled Mel Gorman
2015-08-24 12:37 ` Vlastimil Babka
2015-08-24 13:16 ` Mel Gorman
2015-08-24 20:53 ` Vlastimil Babka
2015-08-25 10:33 ` Mel Gorman
2015-08-25 11:09 ` Vlastimil Babka
2015-08-26 13:41 ` Mel Gorman
2015-08-26 10:46 ` Michal Hocko
2015-08-24 12:09 ` [PATCH 05/12] mm, page_alloc: Remove unecessary recheck of nodemask Mel Gorman
2015-08-25 14:32 ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 06/12] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-08-25 14:36 ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 07/12] mm, page_alloc: Distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd Mel Gorman
2015-08-24 18:29 ` Mel Gorman
2015-08-25 15:37 ` Vlastimil Babka
2015-08-26 14:45 ` Mel Gorman
2015-08-26 16:24 ` Vlastimil Babka
2015-08-26 18:10 ` Mel Gorman
2015-08-27 9:18 ` Vlastimil Babka
2015-08-25 15:48 ` Vlastimil Babka
2015-08-26 13:05 ` Michal Hocko
2015-09-08 6:49 ` Joonsoo Kim
2015-09-09 12:22 ` Mel Gorman
2015-09-18 6:25 ` Joonsoo Kim [this message]
2015-08-24 12:09 ` [PATCH 08/12] mm, page_alloc: Rename __GFP_WAIT to __GFP_RECLAIM Mel Gorman
2015-08-26 12:19 ` Vlastimil Babka
2015-08-24 12:09 ` [PATCH 09/12] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-08-24 12:29 ` [PATCH 10/12] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-08-24 12:29 ` [PATCH 11/12] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-08-26 12:44 ` Vlastimil Babka
2015-08-26 14:53 ` Michal Hocko
2015-08-26 15:38 ` Mel Gorman
2015-09-08 8:01 ` Joonsoo Kim
2015-09-09 12:32 ` Mel Gorman
2015-09-18 6:38 ` Joonsoo Kim
2015-09-21 10:51 ` Mel Gorman
2015-08-24 12:30 ` [PATCH 12/12] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-08-26 13:42 ` Vlastimil Babka
2015-08-26 14:53 ` Mel Gorman
2015-08-28 12:10 ` Michal Hocko
2015-08-28 14:12 ` Mel Gorman
2015-09-08 8:26 ` Joonsoo Kim
2015-09-09 12:39 ` Mel Gorman
2015-09-18 6:56 ` Joonsoo Kim
2015-09-21 10:51 ` Mel Gorman
2015-09-30 8:51 ` Vitaly Wool
2015-09-30 13:52 ` Vlastimil Babka
2015-09-30 14:16 ` Vitaly Wool
2015-09-30 14:43 ` Vlastimil Babka
2015-09-30 15:18 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150918062541.GA7769@js1304-P5Q-DELUXE \
--to=iamjoonsoo.kim@lge.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox