Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: "tj@kernel.org" <tj@kernel.org>,
	Gregory Price <gregory.price@memverge.com>
Cc: John Groves <john@jagalactic.com>,
	Gregory Price <gourry.memverge@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"shakeelb@google.com" <shakeelb@google.com>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>,
	"jgroves@micron.com" <jgroves@micron.com>
Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control
Date: Sat, 11 Nov 2023 15:54:55 -0800	[thread overview]
Message-ID: <6550144fb048d_46f0294be@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <ZU9ijZHZZjRgUctq@mtj.duckdns.org>

tj@kernel.org wrote:
> Hello,
> 
> On Fri, Nov 10, 2023 at 10:42:39PM -0500, Gregory Price wrote:
> > On Fri, Nov 10, 2023 at 05:05:50PM -1000, tj@kernel.org wrote:
> ...
> > I've been considering this as well, but there's more context here being
> > lost.  It's not just about being able to toggle the policy of a single
> > task, or related tasks, but actually in support of a more global data
> > interleaving strategy that makes use of bandwidth more effectively as
> > we begin to memory expansion and bandwidth expansion occur on the
> > PCIE/CXL bus.
> > 
> > If the memory landscape of a system changes, for example due to a
> > hotplug event, you actually want to change the behavior of *every* task
> > that is using interleaving.  The fundamental bandwidth distribution of
> > the entire system changed, so the behavior of every task using that
> > memory should change with it.
> > 
> > We've explored adding weights to: mempolicy, memory tiers, nodes, memcg,
> > and now additionally cpusets. In the last email, I'd asked whether it
> > might actually be worth adding a new mpol component of cgroups to
> > aggregate these issues, rather than jam them into either component.
> > I would love your thoughts on that.
> 
> As for CXL and the changing memory landscape, I think some caution is
> necessary as with any expected "future" technology changes. The recent
> example with non-volatile memory isn't too far from CXL either. Note that
> this is not to say that we shouldn't change anything until the hardware is
> wildly popular but more that we need to be cognizant of the speculative
> nature and the possibility of overbuilding for it.
> 
> I don't have a golden answer but here are general suggestions: Build
> something which is small and/or useful even outside the context of the
> expected hardware landscape changes. Enable the core feature which is
> absolutely required in a minimal manner. Avoid being maximalist in feature
> and convenience coverage.

If I had to state the golden rule of kernel enabling, this paragraph
comes close to being it.

> Here, even if CXL actually becomes popular, how many are going to use memory
> hotplug and need to dynamically rebalance memory in actively running
> workloads? What's the scenario? Are there going to be an army of data center
> technicians going around plugging and unplugging CXL devices depending on
> system memory usage?

While I have personal skepticism that all of the infrastructure in the
CXL specification is going to become popular, one mechanism that seems
poised to cross that threshold is "dynamic capacity". So it is not the
case that techs are running around hot-adjusting physical memory. A host
will have a cable hop to a shared memory pool in the rack where it can
be dynamically provisioned across hosts.

However, even then the bounds of what is dynamic is going to be
constrained to a fixed address space with likely predictable performance
characteristics for that address range. That potentially allows for a
system wide memory interleave policy to be viable. That might be the
place to start and mirrors, at a coarser granularity, what hardware
interleaving can do.

[..]

next prev parent reply	other threads:[~2023-11-11 23:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  0:25 Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 1/3] mm/memcontrol: implement memcg.interleave_weights Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 2/3] mm/mempolicy: implement weighted interleave Gregory Price
2023-11-10 15:26   ` Ravi Jonnalagadda
2023-11-09  0:25 ` [RFC PATCH v4 3/3] Documentation: sysfs entries for cgroup.memory.interleave_weights Gregory Price
2023-11-09 10:02 ` [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Michal Hocko
2023-11-09 15:10   ` Gregory Price
2023-11-09 16:34   ` Gregory Price
2023-11-10  9:05     ` Michal Hocko
2023-11-10 21:24       ` Gregory Price
     [not found] ` <klhcqksrg7uvdrf6hoi5tegifycjltz2kx2d62hapmw3ulr7oa@woibsnrpgox4>
2023-11-09 22:48   ` John Groves
2023-11-10 22:05     ` tj
2023-11-10 22:29       ` Gregory Price
2023-11-11  3:05         ` tj
2023-11-11  3:42           ` Gregory Price
2023-11-11 11:16             ` tj
2023-11-11 23:54               ` Dan Williams [this message]
2023-11-13  2:22                 ` Gregory Price
2023-11-14  9:43             ` Michal Hocko
2023-11-14 15:50               ` Gregory Price
2023-11-14 17:01                 ` Michal Hocko
2023-11-14 17:49                   ` Gregory Price
2023-11-15  5:56                     ` Huang, Ying
2023-12-04  3:33                       ` Gregory Price
2023-12-04  8:19                         ` Huang, Ying
2023-12-04 13:50                           ` Gregory Price
2023-12-05  9:01                             ` Huang, Ying
2023-12-05 14:47                               ` Gregory Price
2023-12-06  0:50                                 ` Huang, Ying
2023-12-06  2:01                                   ` Gregory Price
2023-11-10  6:16 ` Huang, Ying
2023-11-10 19:54   ` Gregory Price
2023-11-13  1:31     ` Huang, Ying
2023-11-13  2:28       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6550144fb048d_46f0294be@dwillia2-mobl3.amr.corp.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgroves@micron.com \
    --cc=john@jagalactic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox