linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gourry.memverge@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: linux-cxl@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	ying.huang@intel.com, akpm@linux-foundation.org,
	mhocko@kernel.org, tj@kernel.org, lizefan.x@bytedance.com,
	hannes@cmpxchg.org, corbet@lwn.net, roman.gushchin@linux.dev,
	shakeelb@google.com, muchun.song@linux.dev,
	Gregory Price <gregory.price@memverge.com>
Subject: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control
Date: Wed,  8 Nov 2023 19:25:14 -0500	[thread overview]
Message-ID: <20231109002517.106829-1-gregory.price@memverge.com> (raw)

This patchset implements weighted interleave and adds a new cgroup
sysfs entry: cgroup/memory.interleave_weights (excluded from root).

The il_weight of a node is used by mempolicy to implement weighted
interleave when `numactl --interleave=...` is invoked.  By default
il_weight for a node is always 1, which preserves the default round
robin interleave behavior.

Interleave weights denote the number of pages that should be
allocated from the node when interleaving occurs and have a range
of 1-255.  The weight of a node can never be 0, and instead the
preferred way to prevent allocation is to remove the node from the
cpuset or mempolicy altogether.

For example, if a node's interleave weight is set to 5, 5 pages
will be allocated from that node before the next node is scheduled
for allocations.

# Set node weight for node 0 to 5
echo 0:5 > /sys/fs/cgroup/user.slice/memory.interleave_weights

# Set node weight for node 1 to 3
echo 1:3 > /sys/fs/cgroup/user.slice/memory.interleave_weights

# View the currently set weights
cat /sys/fs/cgroup/user.slice/memory.interleave_weights
0:5,1:3

Weights will only be displayed for possible nodes.

With this it becomes possible to set an interleaving strategy
that fits the available bandwidth for the devices available on
the system. An example system:

Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket)
Node 1 - CXL Memory. 64GB/s BW, on Node 0 root complex

In this setup, the effective weights for a node set of [0,1]
may be may be [86, 14] (86% of memory on Node 0, 14% on node 1)
or some smaller fraction thereof to encourge quicker rounds
for better overall distribution.

This spreads memory out across devices which all have different
latency and bandwidth attributes in a way that can maximize the
available resources.

~Gregory

=============
Version Notes:

= v4 notes

Moved interleave weights to cgroups from nodes.

Omitted them from the root cgroup for initial testing/comment, but
it seems like it may be a reasonable idea to place them there too.

== Weighted interleave

mm/mempolicy: modify interleave mempolicy to use node weights

The mempolicy MPOL_INTERLEAVE utilizes the node weights defined in
the cgroup memory.interleave_weights interfaces to implement weighted
interleave.  By default, since all nodes default to a weight of 1,
the original interleave behavior is retained.

============
RFC History

Node based weights
By: Gregory Price
https://lore.kernel.org/linux-mm/20231031003810.4532-1-gregory.price@memverge.com/

Memory-tier based weights
By: Ravi Shankar
https://lore.kernel.org/all/20230927095002.10245-1-ravis.opensrc@micron.com/

Mempolicy multi-node weighting w/ set_mempolicy2:
By: Gregory Price
https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@memverge.com/

Hasan Al Maruf: N:M weighting in mempolicy
https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@cmpxchg.org/T/

Huang, Ying's presentation in lpc22, 16th slide in
https://lpc.events/event/16/contributions/1209/attachments/1042/1995/\
Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf

===================

Gregory Price (3):
  mm/memcontrol: implement memcg.interleave_weights
  mm/mempolicy: implement weighted interleave
  Documentation: sysfs entries for cgroup.memory.interleave_weights

 Documentation/admin-guide/cgroup-v2.rst       |  45 +++++
 .../admin-guide/mm/numa_memory_policy.rst     |  11 ++
 include/linux/memcontrol.h                    |  31 ++++
 include/linux/mempolicy.h                     |   3 +
 mm/memcontrol.c                               | 172 ++++++++++++++++++
 mm/mempolicy.c                                | 153 +++++++++++++---
 6 files changed, 387 insertions(+), 28 deletions(-)

-- 
2.39.1



             reply	other threads:[~2023-11-09  0:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  0:25 Gregory Price [this message]
2023-11-09  0:25 ` [RFC PATCH v4 1/3] mm/memcontrol: implement memcg.interleave_weights Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 2/3] mm/mempolicy: implement weighted interleave Gregory Price
2023-11-10 15:26   ` Ravi Jonnalagadda
2023-11-09  0:25 ` [RFC PATCH v4 3/3] Documentation: sysfs entries for cgroup.memory.interleave_weights Gregory Price
2023-11-09 10:02 ` [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Michal Hocko
2023-11-09 15:10   ` Gregory Price
2023-11-09 16:34   ` Gregory Price
2023-11-10  9:05     ` Michal Hocko
2023-11-10 21:24       ` Gregory Price
     [not found] ` <klhcqksrg7uvdrf6hoi5tegifycjltz2kx2d62hapmw3ulr7oa@woibsnrpgox4>
2023-11-09 22:48   ` John Groves
2023-11-10 22:05     ` tj
2023-11-10 22:29       ` Gregory Price
2023-11-11  3:05         ` tj
2023-11-11  3:42           ` Gregory Price
2023-11-11 11:16             ` tj
2023-11-11 23:54               ` Dan Williams
2023-11-13  2:22                 ` Gregory Price
2023-11-14  9:43             ` Michal Hocko
2023-11-14 15:50               ` Gregory Price
2023-11-14 17:01                 ` Michal Hocko
2023-11-14 17:49                   ` Gregory Price
2023-11-15  5:56                     ` Huang, Ying
2023-12-04  3:33                       ` Gregory Price
2023-12-04  8:19                         ` Huang, Ying
2023-12-04 13:50                           ` Gregory Price
2023-12-05  9:01                             ` Huang, Ying
2023-12-05 14:47                               ` Gregory Price
2023-12-06  0:50                                 ` Huang, Ying
2023-12-06  2:01                                   ` Gregory Price
2023-11-10  6:16 ` Huang, Ying
2023-11-10 19:54   ` Gregory Price
2023-11-13  1:31     ` Huang, Ying
2023-11-13  2:28       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231109002517.106829-1-gregory.price@memverge.com \
    --to=gourry.memverge@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=gregory.price@memverge.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox