linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nitin Gupta <nigupta@nvidia.com>
To: "rientjes@google.com" <rientjes@google.com>
Cc: "keescook@chromium.org" <keescook@chromium.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"aryabinin@virtuozzo.com" <aryabinin@virtuozzo.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cai@lca.pw" <cai@lca.pw>,
	"arunks@codeaurora.org" <arunks@codeaurora.org>,
	"janne.huttunen@nokia.com" <janne.huttunen@nokia.com>,
	"jannh@google.com" <jannh@google.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"guro@fb.com" <guro@fb.com>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"khlebnikov@yandex-team.ru" <khlebnikov@yandex-team.ru>
Subject: Re: [RFC] mm: Proactive compaction
Date: Mon, 16 Sep 2019 20:50:43 +0000	[thread overview]
Message-ID: <4b8b0cd5d7a246e9db1e1dd9b3bae7860d7ca2c0.camel@nvidia.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1909161312050.118156@chino.kir.corp.google.com>

On Mon, 2019-09-16 at 13:16 -0700, David Rientjes wrote:
> On Fri, 16 Aug 2019, Nitin Gupta wrote:
> 
> > For some applications we need to allocate almost all memory as
> > hugepages. However, on a running system, higher order allocations can
> > fail if the memory is fragmented. Linux kernel currently does
> > on-demand compaction as we request more hugepages but this style of
> > compaction incurs very high latency. Experiments with one-time full
> > memory compaction (followed by hugepage allocations) shows that kernel
> > is able to restore a highly fragmented memory state to a fairly
> > compacted memory state within <1 sec for a 32G system. Such data
> > suggests that a more proactive compaction can help us allocate a large
> > fraction of memory as hugepages keeping allocation latencies low.
> > 
> > For a more proactive compaction, the approach taken here is to define
> > per page-order external fragmentation thresholds and let kcompactd
> > threads act on these thresholds.
> > 
> > The low and high thresholds are defined per page-order and exposed
> > through sysfs:
> > 
> >   /sys/kernel/mm/compaction/order-[1..MAX_ORDER]/extfrag_{low,high}
> > 
> > Per-node kcompactd thread is woken up every few seconds to check if
> > any zone on its node has extfrag above the extfrag_high threshold for
> > any order, in which case the thread starts compaction in the backgrond
> > till all zones are below extfrag_low level for all orders. By default
> > both these thresolds are set to 100 for all orders which essentially
> > disables kcompactd.
> > 
> > To avoid wasting CPU cycles when compaction cannot help, such as when
> > memory is full, we check both, extfrag > extfrag_high and
> > compaction_suitable(zone). This allows kcomapctd thread to stays inactive
> > even if extfrag thresholds are not met.
> > 
> > This patch is largely based on ideas from Michal Hocko posted here:
> > https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/
> > 
> > Testing done (on x86):
> >  - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30}
> >  respectively.
> >  - Use a test program to fragment memory: the program allocates all memory
> >  and then for each 2M aligned section, frees 3/4 of base pages using
> >  munmap.
> >  - kcompactd0 detects fragmentation for order-9 > extfrag_high and starts
> >  compaction till extfrag < extfrag_low for order-9.
> > 
> > The patch has plenty of rough edges but posting it early to see if I'm
> > going in the right direction and to get some early feedback.
> > 
> 
> Is there an update to this proposal or non-RFC patch that has been posted 
> for proactive compaction?
> 
> We've had good success with periodically compacting memory on a regular 
> cadence on systems with hugepages enabled.  The cadence itself is defined 
> by the admin but it causes khugepaged[*] to periodically wakeup and invoke 
> compaction in an attempt to keep zones as defragmented as possible 
> (perhaps more "proactive" than what is proposed here in an attempt to keep 
> all memory as unfragmented as possible regardless of extfrag thresholds).  
> It also avoids corner-cases where kcompactd could become more expensive 
> than what is anticipated because it is unsuccessful at compacting memory 
> yet the extfrag threshold is still exceeded.
> 
>  [*] Khugepaged instead of kcompactd only because this is only enabled
>      for systems where transparent hugepages are enabled, probably better
>      off in kcompactd to avoid duplicating work between two kthreads if
>      there is already a need for background compaction.
> 


Discussion on this RFC patch revolved around the issue of exposing too
many tunables (per-node, per-order, [low-high] extfrag thresholds). It
was sort-of concluded that no admin will get these tunables right for
a variety of workloads.

To eliminate the need for tunables, I proposed another patch:

https://patchwork.kernel.org/patch/11140067/

which does not add any tunables but extends and exports an existing
function (compact_zone_order). In summary, this new patch adds a
callback function which allows any driver to implement ad-hoc
compaction policies. There is also a sample driver which makes use
of this interface to keep hugepage external fragmentation within
specified range (exposed through debugfs):

https://gitlab.com/nigupta/linux/snippets/1894161

-Nitin


  reply	other threads:[~2019-09-16 20:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 21:43 Nitin Gupta
2019-08-20  8:46 ` Vlastimil Babka
2019-08-20 21:35   ` Nitin Gupta
2019-08-24  7:24   ` Khalid Aziz
2019-09-19 23:37   ` Nitin Gupta
2019-09-24 13:39     ` Vlastimil Babka
2019-09-24 14:11       ` Khalid Aziz
2019-08-20 22:20 ` Matthew Wilcox
2019-08-21 23:23   ` Nitin Gupta
2019-08-22  8:51 ` Mel Gorman
2019-08-22 21:57   ` Nitin Gupta
2019-08-26 11:47     ` Mel Gorman
2019-08-27 20:36       ` Nitin Gupta
2019-09-19 23:22   ` Nitin Gupta
2019-09-16 20:16 ` David Rientjes
2019-09-16 20:50   ` Nitin Gupta [this message]
2019-09-17 19:46   ` John Hubbard
2019-09-17 20:26     ` David Rientjes
2019-11-22 22:31   ` Nitin Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b8b0cd5d7a246e9db1e1dd9b3bae7860d7ca2c0.camel@nvidia.com \
    --to=nigupta@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=arunks@codeaurora.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=janne.huttunen@nokia.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox