From: Nitin Gupta <nigupta@nvidia.com>
To: "rientjes@google.com" <rientjes@google.com>
Cc: "keescook@chromium.org" <keescook@chromium.org>,
"willy@infradead.org" <willy@infradead.org>,
"vbabka@suse.cz" <vbabka@suse.cz>,
"aryabinin@virtuozzo.com" <aryabinin@virtuozzo.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"cai@lca.pw" <cai@lca.pw>,
"arunks@codeaurora.org" <arunks@codeaurora.org>,
"janne.huttunen@nokia.com" <janne.huttunen@nokia.com>,
"jannh@google.com" <jannh@google.com>,
"yuzhao@google.com" <yuzhao@google.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"guro@fb.com" <guro@fb.com>,
"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"khlebnikov@yandex-team.ru" <khlebnikov@yandex-team.ru>
Subject: Re: [RFC] mm: Proactive compaction
Date: Mon, 16 Sep 2019 20:50:43 +0000 [thread overview]
Message-ID: <4b8b0cd5d7a246e9db1e1dd9b3bae7860d7ca2c0.camel@nvidia.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1909161312050.118156@chino.kir.corp.google.com>
On Mon, 2019-09-16 at 13:16 -0700, David Rientjes wrote:
> On Fri, 16 Aug 2019, Nitin Gupta wrote:
>
> > For some applications we need to allocate almost all memory as
> > hugepages. However, on a running system, higher order allocations can
> > fail if the memory is fragmented. Linux kernel currently does
> > on-demand compaction as we request more hugepages but this style of
> > compaction incurs very high latency. Experiments with one-time full
> > memory compaction (followed by hugepage allocations) shows that kernel
> > is able to restore a highly fragmented memory state to a fairly
> > compacted memory state within <1 sec for a 32G system. Such data
> > suggests that a more proactive compaction can help us allocate a large
> > fraction of memory as hugepages keeping allocation latencies low.
> >
> > For a more proactive compaction, the approach taken here is to define
> > per page-order external fragmentation thresholds and let kcompactd
> > threads act on these thresholds.
> >
> > The low and high thresholds are defined per page-order and exposed
> > through sysfs:
> >
> > /sys/kernel/mm/compaction/order-[1..MAX_ORDER]/extfrag_{low,high}
> >
> > Per-node kcompactd thread is woken up every few seconds to check if
> > any zone on its node has extfrag above the extfrag_high threshold for
> > any order, in which case the thread starts compaction in the backgrond
> > till all zones are below extfrag_low level for all orders. By default
> > both these thresolds are set to 100 for all orders which essentially
> > disables kcompactd.
> >
> > To avoid wasting CPU cycles when compaction cannot help, such as when
> > memory is full, we check both, extfrag > extfrag_high and
> > compaction_suitable(zone). This allows kcomapctd thread to stays inactive
> > even if extfrag thresholds are not met.
> >
> > This patch is largely based on ideas from Michal Hocko posted here:
> > https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/
> >
> > Testing done (on x86):
> > - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30}
> > respectively.
> > - Use a test program to fragment memory: the program allocates all memory
> > and then for each 2M aligned section, frees 3/4 of base pages using
> > munmap.
> > - kcompactd0 detects fragmentation for order-9 > extfrag_high and starts
> > compaction till extfrag < extfrag_low for order-9.
> >
> > The patch has plenty of rough edges but posting it early to see if I'm
> > going in the right direction and to get some early feedback.
> >
>
> Is there an update to this proposal or non-RFC patch that has been posted
> for proactive compaction?
>
> We've had good success with periodically compacting memory on a regular
> cadence on systems with hugepages enabled. The cadence itself is defined
> by the admin but it causes khugepaged[*] to periodically wakeup and invoke
> compaction in an attempt to keep zones as defragmented as possible
> (perhaps more "proactive" than what is proposed here in an attempt to keep
> all memory as unfragmented as possible regardless of extfrag thresholds).
> It also avoids corner-cases where kcompactd could become more expensive
> than what is anticipated because it is unsuccessful at compacting memory
> yet the extfrag threshold is still exceeded.
>
> [*] Khugepaged instead of kcompactd only because this is only enabled
> for systems where transparent hugepages are enabled, probably better
> off in kcompactd to avoid duplicating work between two kthreads if
> there is already a need for background compaction.
>
Discussion on this RFC patch revolved around the issue of exposing too
many tunables (per-node, per-order, [low-high] extfrag thresholds). It
was sort-of concluded that no admin will get these tunables right for
a variety of workloads.
To eliminate the need for tunables, I proposed another patch:
https://patchwork.kernel.org/patch/11140067/
which does not add any tunables but extends and exports an existing
function (compact_zone_order). In summary, this new patch adds a
callback function which allows any driver to implement ad-hoc
compaction policies. There is also a sample driver which makes use
of this interface to keep hugepage external fragmentation within
specified range (exposed through debugfs):
https://gitlab.com/nigupta/linux/snippets/1894161
-Nitin
next prev parent reply other threads:[~2019-09-16 20:50 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-16 21:43 Nitin Gupta
2019-08-20 8:46 ` Vlastimil Babka
2019-08-20 21:35 ` Nitin Gupta
2019-08-24 7:24 ` Khalid Aziz
2019-09-19 23:37 ` Nitin Gupta
2019-09-24 13:39 ` Vlastimil Babka
2019-09-24 14:11 ` Khalid Aziz
2019-08-20 22:20 ` Matthew Wilcox
2019-08-21 23:23 ` Nitin Gupta
2019-08-22 8:51 ` Mel Gorman
2019-08-22 21:57 ` Nitin Gupta
2019-08-26 11:47 ` Mel Gorman
2019-08-27 20:36 ` Nitin Gupta
2019-09-19 23:22 ` Nitin Gupta
2019-09-16 20:16 ` David Rientjes
2019-09-16 20:50 ` Nitin Gupta [this message]
2019-09-17 19:46 ` John Hubbard
2019-09-17 20:26 ` David Rientjes
2019-11-22 22:31 ` Nitin Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4b8b0cd5d7a246e9db1e1dd9b3bae7860d7ca2c0.camel@nvidia.com \
--to=nigupta@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=arunks@codeaurora.org \
--cc=aryabinin@virtuozzo.com \
--cc=cai@lca.pw \
--cc=dan.j.williams@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=janne.huttunen@nokia.com \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox