linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nitin Gupta <nigupta@nvidia.com>
To: David Rientjes <rientjes@google.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	Yu Zhao <yuzhao@google.com>, Matthew Wilcox <willy@infradead.org>,
	Qian Cai <cai@lca.pw>, Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Roman Gushchin <guro@fb.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kees Cook <keescook@chromium.org>, Jann Horn <jannh@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Arun KS <arunks@codeaurora.org>,
	Janne Huttunen <janne.huttunen@nokia.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: RE: [RFC] mm: Proactive compaction
Date: Fri, 22 Nov 2019 22:31:12 +0000	[thread overview]
Message-ID: <BYAPR12MB3015C0746B50A129FCDD9E47D8490@BYAPR12MB3015.namprd12.prod.outlook.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1909161312050.118156@chino.kir.corp.google.com>



> -----Original Message-----
> From: owner-linux-mm@kvack.org <owner-linux-mm@kvack.org> On Behalf
> Of David Rientjes
> Sent: Monday, September 16, 2019 1:17 PM
> To: Nitin Gupta <nigupta@nvidia.com>
> Cc: akpm@linux-foundation.org; vbabka@suse.cz;
> mgorman@techsingularity.net; mhocko@suse.com;
> dan.j.williams@intel.com; Yu Zhao <yuzhao@google.com>; Matthew Wilcox
> <willy@infradead.org>; Qian Cai <cai@lca.pw>; Andrey Ryabinin
> <aryabinin@virtuozzo.com>; Roman Gushchin <guro@fb.com>; Greg Kroah-
> Hartman <gregkh@linuxfoundation.org>; Kees Cook
> <keescook@chromium.org>; Jann Horn <jannh@google.com>; Johannes
> Weiner <hannes@cmpxchg.org>; Arun KS <arunks@codeaurora.org>; Janne
> Huttunen <janne.huttunen@nokia.com>; Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru>; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org
> Subject: Re: [RFC] mm: Proactive compaction
> 
> On Fri, 16 Aug 2019, Nitin Gupta wrote:
> 
> > For some applications we need to allocate almost all memory as
> > hugepages. However, on a running system, higher order allocations can
> > fail if the memory is fragmented. Linux kernel currently does
> > on-demand compaction as we request more hugepages but this style of
> > compaction incurs very high latency. Experiments with one-time full
> > memory compaction (followed by hugepage allocations) shows that kernel
> > is able to restore a highly fragmented memory state to a fairly
> > compacted memory state within <1 sec for a 32G system. Such data
> > suggests that a more proactive compaction can help us allocate a large
> > fraction of memory as hugepages keeping allocation latencies low.
> >
> > For a more proactive compaction, the approach taken here is to define
> > per page-order external fragmentation thresholds and let kcompactd
> > threads act on these thresholds.
> >
> > The low and high thresholds are defined per page-order and exposed
> > through sysfs:
> >
> >   /sys/kernel/mm/compaction/order-[1..MAX_ORDER]/extfrag_{low,high}
> >
> > Per-node kcompactd thread is woken up every few seconds to check if
> > any zone on its node has extfrag above the extfrag_high threshold for
> > any order, in which case the thread starts compaction in the backgrond
> > till all zones are below extfrag_low level for all orders. By default
> > both these thresolds are set to 100 for all orders which essentially
> > disables kcompactd.
> >
> > To avoid wasting CPU cycles when compaction cannot help, such as when
> > memory is full, we check both, extfrag > extfrag_high and
> > compaction_suitable(zone). This allows kcomapctd thread to stays
> > inactive even if extfrag thresholds are not met.
> >
> > This patch is largely based on ideas from Michal Hocko posted here:
> > https://lore.kernel.org/linux-
> mm/20161230131412.GI13301@dhcp22.suse.cz
> > /
> >
> > Testing done (on x86):
> >  - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30}
> > respectively.
> >  - Use a test program to fragment memory: the program allocates all
> > memory  and then for each 2M aligned section, frees 3/4 of base pages
> > using  munmap.
> >  - kcompactd0 detects fragmentation for order-9 > extfrag_high and
> > starts  compaction till extfrag < extfrag_low for order-9.
> >
> > The patch has plenty of rough edges but posting it early to see if I'm
> > going in the right direction and to get some early feedback.
> >
> 
> Is there an update to this proposal or non-RFC patch that has been posted
> for proactive compaction?
> 

I recently posted a non-RFC patch for proactive compaction:

https://lkml.org/lkml/2019/11/15/1099

Please let me know if you try it out or if you have any feedback.

Thanks,
Nitin



> We've had good success with periodically compacting memory on a regular
> cadence on systems with hugepages enabled.  The cadence itself is defined
> by the admin but it causes khugepaged[*] to periodically wakeup and invoke
> compaction in an attempt to keep zones as defragmented as possible
> (perhaps more "proactive" than what is proposed here in an attempt to keep
> all memory as unfragmented as possible regardless of extfrag thresholds).
> It also avoids corner-cases where kcompactd could become more expensive
> than what is anticipated because it is unsuccessful at compacting memory yet
> the extfrag threshold is still exceeded.
> 
>  [*] Khugepaged instead of kcompactd only because this is only enabled
>      for systems where transparent hugepages are enabled, probably better
>      off in kcompactd to avoid duplicating work between two kthreads if
>      there is already a need for background compaction.



      parent reply	other threads:[~2019-11-22 22:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 21:43 Nitin Gupta
2019-08-20  8:46 ` Vlastimil Babka
2019-08-20 21:35   ` Nitin Gupta
2019-08-24  7:24   ` Khalid Aziz
2019-09-19 23:37   ` Nitin Gupta
2019-09-24 13:39     ` Vlastimil Babka
2019-09-24 14:11       ` Khalid Aziz
2019-08-20 22:20 ` Matthew Wilcox
2019-08-21 23:23   ` Nitin Gupta
2019-08-22  8:51 ` Mel Gorman
2019-08-22 21:57   ` Nitin Gupta
2019-08-26 11:47     ` Mel Gorman
2019-08-27 20:36       ` Nitin Gupta
2019-09-19 23:22   ` Nitin Gupta
2019-09-16 20:16 ` David Rientjes
2019-09-16 20:50   ` Nitin Gupta
2019-09-17 19:46   ` John Hubbard
2019-09-17 20:26     ` David Rientjes
2019-11-22 22:31   ` Nitin Gupta [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR12MB3015C0746B50A129FCDD9E47D8490@BYAPR12MB3015.namprd12.prod.outlook.com \
    --to=nigupta@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=arunks@codeaurora.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=janne.huttunen@nokia.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox