From: Vlastimil Babka <vbabka@suse.cz>
To: David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, compaction: drain pcps for zone when kcompactd fails
Date: Thu, 1 Mar 2018 13:23:34 +0100 [thread overview]
Message-ID: <672ebefc-483d-2932-37b5-4ffe58156f0f@suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.20.1803010340100.88270@chino.kir.corp.google.com>
On 03/01/2018 12:42 PM, David Rientjes wrote:
> It's possible for buddy pages to become stranded on pcps that, if drained,
> could be merged with other buddy pages on the zone's free area to form
> large order pages, including up to MAX_ORDER.
>
> Consider a verbose example using the tools/vm/page-types tool at the
> beginning of a ZONE_NORMAL, where 'B' indicates a buddy page and 'S'
> indicates a slab page, which the migration scanner is attempting to
> defragment (and doing it well, absent coalescing up to cc.order):
How can the migration scanner defragment a slab page?
> 109954 1 _______S________________________________________________________
> 109955 2 __________B_____________________________________________________
> 109957 1 ________________________________________________________________
> 109958 1 __________B_____________________________________________________
> 109959 7 ________________________________________________________________
> 109960 1 __________B_____________________________________________________
> 109961 9 ________________________________________________________________
> 10996a 1 __________B_____________________________________________________
> 10996b 3 ________________________________________________________________
> 10996e 1 __________B_____________________________________________________
> 10996f 1 ________________________________________________________________
> 109970 1 __________B_____________________________________________________
> 109971 f ________________________________________________________________
> ...
> 109f88 1 __________B_____________________________________________________
> 109f89 3 ________________________________________________________________
> 109f8c 1 __________B_____________________________________________________
> 109f8d 2 ________________________________________________________________
> 109f8f 2 __________B_____________________________________________________
> 109f91 f ________________________________________________________________
> 109fa0 1 __________B_____________________________________________________
> 109fa1 7 ________________________________________________________________
> 109fa8 1 __________B_____________________________________________________
> 109fa9 1 ________________________________________________________________
> 109faa 1 __________B_____________________________________________________
> 109fab 1 _______S________________________________________________________
>
> These buddy pages, spanning 1,621 pages, could be coalesced and allow for
> three transparent hugepages to be dynamically allocated. Totaling all
> hugepage length spans that could be coalesced, this could yield over 400
> hugepages on the zone's free area when at the time this /proc/kpageflags
I don't understand the numbers here. With order-9 hugepages it's 512
pages per hugepage. If the buddy pages span 1621 pages, how can they
yield 400 hugepages?
> was collected, there was _no_ order-9 or order-10 pages available for
> allocation even after triggering compaction through procfs.
>
> When kcompactd fails to defragment memory such that a cc.order page can
> be allocated, drain all pcps for the zone back to the buddy allocator so
> this stranding cannot occur. Compaction for that order will subsequently
> be deferred, which acts as a ratelimit on this drain.
I don't mind the change given the ratelimit, but what difference was
observed in practice?
BTW I wonder if we could be smarter and quicker about the drains. Let a
pcp struct page be easily recognized as such, and store the cpu number
in there. Migration scanner could then maintain a cpumask, and recognize
if the only missing pages for coalescing a cc->order block are on the
pcplists, and then do a targeted drain.
But that only makes sense to implement if it can make a noticeable
difference to offset the additional overhead, of course.
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> mm/compaction.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1987,6 +1987,14 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> if (status == COMPACT_SUCCESS) {
> compaction_defer_reset(zone, cc.order, false);
> } else if (status == COMPACT_PARTIAL_SKIPPED || status == COMPACT_COMPLETE) {
> + /*
> + * Buddy pages may become stranded on pcps that could
> + * otherwise coalesce on the zone's free area for
> + * order >= cc.order. This is ratelimited by the
> + * upcoming deferral.
> + */
> + drain_all_pages(zone);
> +
> /*
> * We use sync migration mode here, so we defer like
> * sync direct compaction does.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-03-01 12:23 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-01 11:42 David Rientjes
2018-03-01 12:23 ` Vlastimil Babka [this message]
2018-03-01 13:05 ` David Rientjes
2018-03-02 10:28 ` Vlastimil Babka
2018-03-02 17:28 ` Matthew Wilcox
2018-03-01 23:27 ` Andrew Morton
2018-03-01 23:38 ` David Rientjes
2018-03-06 23:57 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=672ebefc-483d-2932-37b5-4ffe58156f0f@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox