Re: [patch] mm, compaction: drain pcps for zone when kcompactd fails

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, compaction: drain pcps for zone when kcompactd fails
Date: Thu, 1 Mar 2018 13:23:34 +0100	[thread overview]
Message-ID: <672ebefc-483d-2932-37b5-4ffe58156f0f@suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.20.1803010340100.88270@chino.kir.corp.google.com>

On 03/01/2018 12:42 PM, David Rientjes wrote:
> It's possible for buddy pages to become stranded on pcps that, if drained,
> could be merged with other buddy pages on the zone's free area to form
> large order pages, including up to MAX_ORDER.
> 
> Consider a verbose example using the tools/vm/page-types tool at the
> beginning of a ZONE_NORMAL, where 'B' indicates a buddy page and 'S'
> indicates a slab page, which the migration scanner is attempting to
> defragment (and doing it well, absent coalescing up to cc.order):

How can the migration scanner defragment a slab page?

> 109954  1       _______S________________________________________________________
> 109955  2       __________B_____________________________________________________
> 109957  1       ________________________________________________________________
> 109958  1       __________B_____________________________________________________
> 109959  7       ________________________________________________________________
> 109960  1       __________B_____________________________________________________
> 109961  9       ________________________________________________________________
> 10996a  1       __________B_____________________________________________________
> 10996b  3       ________________________________________________________________
> 10996e  1       __________B_____________________________________________________
> 10996f  1       ________________________________________________________________
> 109970  1       __________B_____________________________________________________
> 109971  f       ________________________________________________________________
> ...
> 109f88  1       __________B_____________________________________________________
> 109f89  3       ________________________________________________________________
> 109f8c  1       __________B_____________________________________________________
> 109f8d  2       ________________________________________________________________
> 109f8f  2       __________B_____________________________________________________
> 109f91  f       ________________________________________________________________
> 109fa0  1       __________B_____________________________________________________
> 109fa1  7       ________________________________________________________________
> 109fa8  1       __________B_____________________________________________________
> 109fa9  1       ________________________________________________________________
> 109faa  1       __________B_____________________________________________________
> 109fab  1       _______S________________________________________________________
> 
> These buddy pages, spanning 1,621 pages, could be coalesced and allow for
> three transparent hugepages to be dynamically allocated.  Totaling all
> hugepage length spans that could be coalesced, this could yield over 400
> hugepages on the zone's free area when at the time this /proc/kpageflags

I don't understand the numbers here. With order-9 hugepages it's 512
pages per hugepage. If the buddy pages span 1621 pages, how can they
yield 400 hugepages?

> was collected, there was _no_ order-9 or order-10 pages available for
> allocation even after triggering compaction through procfs.
> 
> When kcompactd fails to defragment memory such that a cc.order page can
> be allocated, drain all pcps for the zone back to the buddy allocator so
> this stranding cannot occur.  Compaction for that order will subsequently
> be deferred, which acts as a ratelimit on this drain.

I don't mind the change given the ratelimit, but what difference was
observed in practice?

BTW I wonder if we could be smarter and quicker about the drains. Let a
pcp struct page be easily recognized as such, and store the cpu number
in there. Migration scanner could then maintain a cpumask, and recognize
if the only missing pages for coalescing a cc->order block are on the
pcplists, and then do a targeted drain.
But that only makes sense to implement if it can make a noticeable
difference to offset the additional overhead, of course.

> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1987,6 +1987,14 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  		if (status == COMPACT_SUCCESS) {
>  			compaction_defer_reset(zone, cc.order, false);
>  		} else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
> +			/*
> +			 * Buddy pages may become stranded on pcps that could
> +			 * otherwise coalesce on the zone's free area for
> +			 * order >= cc.order.  This is ratelimited by the
> +			 * upcoming deferral.
> +			 */
> +			drain_all_pages(zone);
> +
>  			/*
>  			 * We use sync migration mode here, so we defer like
>  			 * sync direct compaction does.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2018-03-01 12:23 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-01 11:42 David Rientjes
2018-03-01 12:23 ` Vlastimil Babka [this message]
2018-03-01 13:05   ` David Rientjes
2018-03-02 10:28     ` Vlastimil Babka
2018-03-02 17:28   ` Matthew Wilcox
2018-03-01 23:27 ` Andrew Morton
2018-03-01 23:38   ` David Rientjes
2018-03-06 23:57     ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=672ebefc-483d-2932-37b5-4ffe58156f0f@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox