linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Michal Hocko <mhocko@suse.com>
Cc: "tj@kernel.org" <tj@kernel.org>,
	John Groves <john@jagalactic.com>,
	Gregory Price <gourry.memverge@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"shakeelb@google.com" <shakeelb@google.com>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>,
	"jgroves@micron.com" <jgroves@micron.com>
Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control
Date: Tue, 14 Nov 2023 12:49:36 -0500	[thread overview]
Message-ID: <ZVOzMEtDYB4l8qFy@memverge.com> (raw)
In-Reply-To: <ZVOn2T_Qg_NTKlB2@tiehlicka>

On Tue, Nov 14, 2023 at 06:01:13PM +0100, Michal Hocko wrote:
> On Tue 14-11-23 10:50:51, Gregory Price wrote:
> > On Tue, Nov 14, 2023 at 10:43:13AM +0100, Michal Hocko wrote:
> [...]
> > > That being said, I still believe that a cgroup based interface is a much
> > > better choice over a global one. Cpusets seem to be a good fit as the
> > > controller does control memory placement wrt NUMA interfaces.
> > 
> > I think cpusets is a non-starter due to the global spinlock required when
> > reading informaiton from it:
> > 
> > https://elixir.bootlin.com/linux/latest/source/kernel/cgroup/cpuset.c#L391
> 
> Right, our current cpuset implementation indeed requires callback lock
> from the page allocator. But that is an implementation detail. I do not
> remember bug reports about the lock being a bottle neck though. If
> anything cpusets lock optimizations would be win also for users who do
> not want to use weighted interleave interface.

Definitely agree, but that's a rather large increase of scope :[

We could consider a push-model similar to how cpuset nodemasks are
pushed down to mempolicies, rather than a pull-model of having
mempolicy read directly from cpusets, at least until cpusets lock
optimization is undertaken.

This pattern looks like a wart to me, which is why I avoided it, but the
locking implications on the pull-model make me sad.

Would like to point out that Tejun pushed back on implementing weights
in cgroups (regardless of subcomponent), so I think we need to come
to a consensus on where this data should live in a "more global"
context (cpusets, memcg, nodes, etc) before I go mucking around
further.

So far we have:
* mempolicy: updating weights is a very complicated undertaking,
             and no (good) way to do this from outside the task.
	     would be better to have a coarser grained control.

             New syscall is likely needed to add/set weights in the
	     per-task mempolicy, or bite the bullet on set_mempolicy2
	     and make the syscall extensible for the future.

* memtiers: tier=node when devices are already interleaved or when all
            devices are different, so why add yet another layer of
	    complexity if other constructs already exist.  Additionally,
	    you lose task-placement relative weighting (or it becomes
	    very complex to implement.

* cgroups: "this doesn't involve dynamic resource accounting /
            enforcement at all" and "these aren't resource
	    allocations, it's unclear what the hierarchical
	    relationship mean".

* node: too global, explore smaller scope first then expand.

For now I think there is consensus that mempolicy should have weights
per-task regardless of how the more-global mechanism is defined, so i'll
go ahead and put up another RFC for some options on that in the next
week or so.

The limitations on the first pass will be that only the task is capable
of re-weighting should cpusets.mems or the nodemask change.

~Gregory


  reply	other threads:[~2023-11-14 17:49 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  0:25 Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 1/3] mm/memcontrol: implement memcg.interleave_weights Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 2/3] mm/mempolicy: implement weighted interleave Gregory Price
2023-11-10 15:26   ` Ravi Jonnalagadda
2023-11-09  0:25 ` [RFC PATCH v4 3/3] Documentation: sysfs entries for cgroup.memory.interleave_weights Gregory Price
2023-11-09 10:02 ` [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Michal Hocko
2023-11-09 15:10   ` Gregory Price
2023-11-09 16:34   ` Gregory Price
2023-11-10  9:05     ` Michal Hocko
2023-11-10 21:24       ` Gregory Price
     [not found] ` <klhcqksrg7uvdrf6hoi5tegifycjltz2kx2d62hapmw3ulr7oa@woibsnrpgox4>
2023-11-09 22:48   ` John Groves
2023-11-10 22:05     ` tj
2023-11-10 22:29       ` Gregory Price
2023-11-11  3:05         ` tj
2023-11-11  3:42           ` Gregory Price
2023-11-11 11:16             ` tj
2023-11-11 23:54               ` Dan Williams
2023-11-13  2:22                 ` Gregory Price
2023-11-14  9:43             ` Michal Hocko
2023-11-14 15:50               ` Gregory Price
2023-11-14 17:01                 ` Michal Hocko
2023-11-14 17:49                   ` Gregory Price [this message]
2023-11-15  5:56                     ` Huang, Ying
2023-12-04  3:33                       ` Gregory Price
2023-12-04  8:19                         ` Huang, Ying
2023-12-04 13:50                           ` Gregory Price
2023-12-05  9:01                             ` Huang, Ying
2023-12-05 14:47                               ` Gregory Price
2023-12-06  0:50                                 ` Huang, Ying
2023-12-06  2:01                                   ` Gregory Price
2023-11-10  6:16 ` Huang, Ying
2023-11-10 19:54   ` Gregory Price
2023-11-13  1:31     ` Huang, Ying
2023-11-13  2:28       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZVOzMEtDYB4l8qFy@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=gourry.memverge@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgroves@micron.com \
    --cc=john@jagalactic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox