linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,  Yu Zhao <yuzhao@google.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org, Yosry Ahmed <yosryahmed@google.com>,
	 Wei Xu <weixugc@google.com>, Shakeel Butt <shakeelb@google.com>,
	 Greg Thelen <gthelen@google.com>
Subject: [RFC] Mechanism to induce memory reclaim
Date: Sun, 6 Mar 2022 15:11:23 -0800 (PST)	[thread overview]
Message-ID: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> (raw)

Hi everybody,

We'd like to discuss formalizing a mechanism to induce memory reclaim by
the kernel.

The current multigenerational LRU proposal introduces a debugfs
mechanism[1] for this.  The "TMO: Transparent Memory Offloading in
Datacenters" paper also discusses a per-memcg mechanism[2].  While the
former can be used for debugging of MGLRU, both can quite powerfully be
used for proactive reclaim.

Google's datacenters use a similar per-memcg mechanism for the same
purpose.  Thus, formalizing the mechanism would allow our userspace to use
an upstream supported interface that will be stable and consistent.

This could be an incremental addition to MGLRU's lru_gen debugfs mechanism
but, since the concept has no direct dependency on the work, we believe it
is useful independent of the reclaim mechanism in use (both with and
without CONFIG_LRU_GEN).

Idea: introduce a per-node sysfs mechanism for inducing memory reclaim
that can be useful for global (non-memcg constrained) reclaim and possible
even if memcg is not enabled in the kernel or mounted.  This could
optionally take a memcg id to induce reclaim for a memcg hierarchy.

IOW, this would be a /sys/devices/system/node/nodeN/reclaim mechanim for
each NUMA node N on the system.  (It would be similar to the existing
per-node sysfs "compact" mechanism used to trigger compaction from
userspace.)

Userspace would write the following to this file:
 - nr_to_reclaim pages
 - swappiness factor
 - memcg_id of the hierarchy to reclaim from, if any[*]
 - flags to specify context, if any[**]
 
 [*] if global reclaim or memcg is not enabled/mounted, this is 0 since
     this is the return value of mem_cgroup_id()
 [**] this is offered for extensibility to specify the context in which
      reclaim is being done (clean file pages only, demotion for memory
      tiering vs eviction, etc), otherwise 0
 
An alternative may be to introduce a /sys/kernel/mm/reclaim mechanism that
also takes a nodemask to reclaim from.  The kernel would reclaim memory
over the set of nodes passed to it.

Some questions to get discussion going:

 - Overall feedback or suggestions for the proposal in general?
 
 - This proposal uses a value specified in pages to reclaim; this could be
   a number of bytes instead.  I have no strong opinion, does anybody
   else?

 - Should this be a per-node mechanism under sysfs like the existing
   "compact" mechanism or should it be implemented as a single file that
   can optionally specify a nodemask to reclaim from?

Thanks!

[1] https://lore.kernel.org/linux-mm/20220208081902.3550911-12-yuzhao@google.com
[2] https://dl.acm.org/doi/10.1145/3503222.3507731 (Section 3.3)


             reply	other threads:[~2022-03-06 23:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-06 23:11 David Rientjes [this message]
2022-03-07  0:49 ` Yu Zhao
2022-03-07 14:41 ` Michal Hocko
2022-03-07 18:31   ` Shakeel Butt
2022-03-07 20:26     ` Johannes Weiner
2022-03-08 12:53       ` Michal Hocko
2022-03-08 14:44         ` Dan Schatzberg
2022-03-08 16:05           ` Michal Hocko
2022-03-08 17:21             ` Wei Xu
2022-03-08 17:23             ` Johannes Weiner
2022-03-08 12:52     ` Michal Hocko
2022-03-09 22:03       ` David Rientjes
2022-03-10 16:58         ` Johannes Weiner
2022-03-10 17:25           ` Shakeel Butt
2022-03-10 17:33           ` Wei Xu
2022-03-10 17:42             ` Johannes Weiner
2022-03-07 20:50 ` Johannes Weiner
2022-03-07 22:53   ` Wei Xu
2022-03-08 12:53     ` Michal Hocko
2022-03-08 14:49   ` Dan Schatzberg
2022-03-08 19:27     ` Johannes Weiner
2022-03-08 22:37       ` Dan Schatzberg
2022-03-09 22:30   ` David Rientjes
2022-03-10 16:10     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5df21376-7dd1-bf81-8414-32a73cea45dd@google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=weixugc@google.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox