Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gregory.price@memverge.com>,
	Michal Hocko <mhocko@suse.com>
Cc: "tj@kernel.org" <tj@kernel.org>,
	 John Groves <john@jagalactic.com>,
	Gregory Price <gourry.memverge@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	 "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	 "lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	 "hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	 "roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	 "shakeelb@google.com" <shakeelb@google.com>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>,
	 "jgroves@micron.com" <jgroves@micron.com>
Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control
Date: Wed, 15 Nov 2023 13:56:53 +0800	[thread overview]
Message-ID: <87o7fveeze.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <ZVOzMEtDYB4l8qFy@memverge.com> (Gregory Price's message of "Tue, 14 Nov 2023 12:49:36 -0500")

Gregory Price <gregory.price@memverge.com> writes:

> On Tue, Nov 14, 2023 at 06:01:13PM +0100, Michal Hocko wrote:
>> On Tue 14-11-23 10:50:51, Gregory Price wrote:
>> > On Tue, Nov 14, 2023 at 10:43:13AM +0100, Michal Hocko wrote:
>> [...]
>> > > That being said, I still believe that a cgroup based interface is a much
>> > > better choice over a global one. Cpusets seem to be a good fit as the
>> > > controller does control memory placement wrt NUMA interfaces.
>> > 
>> > I think cpusets is a non-starter due to the global spinlock required when
>> > reading informaiton from it:
>> > 
>> > https://elixir.bootlin.com/linux/latest/source/kernel/cgroup/cpuset.c#L391
>> 
>> Right, our current cpuset implementation indeed requires callback lock
>> from the page allocator. But that is an implementation detail. I do not
>> remember bug reports about the lock being a bottle neck though. If
>> anything cpusets lock optimizations would be win also for users who do
>> not want to use weighted interleave interface.
>
> Definitely agree, but that's a rather large increase of scope :[
>
> We could consider a push-model similar to how cpuset nodemasks are
> pushed down to mempolicies, rather than a pull-model of having
> mempolicy read directly from cpusets, at least until cpusets lock
> optimization is undertaken.
>
> This pattern looks like a wart to me, which is why I avoided it, but the
> locking implications on the pull-model make me sad.
>
> Would like to point out that Tejun pushed back on implementing weights
> in cgroups (regardless of subcomponent), so I think we need to come
> to a consensus on where this data should live in a "more global"
> context (cpusets, memcg, nodes, etc) before I go mucking around
> further.
>
> So far we have:
> * mempolicy: updating weights is a very complicated undertaking,
>              and no (good) way to do this from outside the task.
> 	     would be better to have a coarser grained control.
>
>              New syscall is likely needed to add/set weights in the
> 	     per-task mempolicy, or bite the bullet on set_mempolicy2
> 	     and make the syscall extensible for the future.
>
> * memtiers: tier=node when devices are already interleaved or when all
>             devices are different, so why add yet another layer of
> 	    complexity if other constructs already exist.  Additionally,
> 	    you lose task-placement relative weighting (or it becomes
> 	    very complex to implement.

Because we usually have multiple nodes in one mem-tier, I still think
mem-tier-based interface is simpler than node-based.  But, it seems more
complex to introduce mem-tier into mempolicy.  Especially if we have
per-task weights.  So, I am fine to go with node-based interface.

> * cgroups: "this doesn't involve dynamic resource accounting /
>             enforcement at all" and "these aren't resource
> 	    allocations, it's unclear what the hierarchical
> 	    relationship mean".
>
> * node: too global, explore smaller scope first then expand.

Why is it too global?  I understand that it doesn't cover all possible
use cases (although I don't know whether these use cases are practical
or not).  But it can provide a reasonable default per-node weight based
on available node performance information (such as, HMAT, CDAT, etc.).
And, quite some workloads can just use it.  I think this is an useful
feature.

> For now I think there is consensus that mempolicy should have weights
> per-task regardless of how the more-global mechanism is defined, so i'll
> go ahead and put up another RFC for some options on that in the next
> week or so.
>
> The limitations on the first pass will be that only the task is capable
> of re-weighting should cpusets.mems or the nodemask change.

--
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2023-11-15  5:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  0:25 Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 1/3] mm/memcontrol: implement memcg.interleave_weights Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 2/3] mm/mempolicy: implement weighted interleave Gregory Price
2023-11-10 15:26   ` Ravi Jonnalagadda
2023-11-09  0:25 ` [RFC PATCH v4 3/3] Documentation: sysfs entries for cgroup.memory.interleave_weights Gregory Price
2023-11-09 10:02 ` [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Michal Hocko
2023-11-09 15:10   ` Gregory Price
2023-11-09 16:34   ` Gregory Price
2023-11-10  9:05     ` Michal Hocko
2023-11-10 21:24       ` Gregory Price
     [not found] ` <klhcqksrg7uvdrf6hoi5tegifycjltz2kx2d62hapmw3ulr7oa@woibsnrpgox4>
2023-11-09 22:48   ` John Groves
2023-11-10 22:05     ` tj
2023-11-10 22:29       ` Gregory Price
2023-11-11  3:05         ` tj
2023-11-11  3:42           ` Gregory Price
2023-11-11 11:16             ` tj
2023-11-11 23:54               ` Dan Williams
2023-11-13  2:22                 ` Gregory Price
2023-11-14  9:43             ` Michal Hocko
2023-11-14 15:50               ` Gregory Price
2023-11-14 17:01                 ` Michal Hocko
2023-11-14 17:49                   ` Gregory Price
2023-11-15  5:56                     ` Huang, Ying [this message]
2023-12-04  3:33                       ` Gregory Price
2023-12-04  8:19                         ` Huang, Ying
2023-12-04 13:50                           ` Gregory Price
2023-12-05  9:01                             ` Huang, Ying
2023-12-05 14:47                               ` Gregory Price
2023-12-06  0:50                                 ` Huang, Ying
2023-12-06  2:01                                   ` Gregory Price
2023-11-10  6:16 ` Huang, Ying
2023-11-10 19:54   ` Gregory Price
2023-11-13  1:31     ` Huang, Ying
2023-11-13  2:28       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o7fveeze.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgroves@micron.com \
    --cc=john@jagalactic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox