linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org, ying.huang@intel.com,
	akpm@linux-foundation.org, tj@kernel.org,
	lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net,
	roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev
Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control
Date: Fri, 10 Nov 2023 16:24:34 -0500	[thread overview]
Message-ID: <ZU6fkqsk6xiDzsnM@memverge.com> (raw)
In-Reply-To: <ZU3ydS1Puv2OHgiE@tiehlicka>

On Fri, Nov 10, 2023 at 10:05:57AM +0100, Michal Hocko wrote:
> On Thu 09-11-23 11:34:01, Gregory Price wrote:
> [...]
> > Anyway, summarizing:  After a bit of reading, this does seem to map
> > better to the "accounting consumption" subsystem than the "constrain"
> > subsystem. However, if you think it's better suited for cpuset, I'm
> > happy to push in that direction.
> 
> Maybe others see it differently but I stick with my previous position.
> Memcg is not a great fit for reasons already mentioned - most notably
> that the controller doesn't control the allocation but accounting what
> has been already allocated. Cpusets on the other hand constrains the
> allocations and that is exactly what you want to achieve. 
> -- 
> Michal Hocko
> SUSE Labs

Digging in a bit, placing it in cpusets has locking requirements that
concerns me.  Maybe I'm being a bit over-cautious, so if none of this
matters, then I'll go ahead and swap the code over to cpusets.
Otherwise, just more food for thought in cpusets vs memcg.

In cpusets.c it states when acquiring read-only access, we have to
acquire the (global) callback lock:

https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L391
* There are two global locks guarding cpuset structures - cpuset_mutex and
* callback_lock. We also require taking task_lock() when dereferencing a
* task's cpuset pointer. See "The task_lock() exception", at the end of this
* comment.

Examples:

cpuset_node_allowed:
https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L4780
    spin_lock_irqsave(&callback_lock, flags);
    rcu_read_lock();
    cs = nearest_hardwall_ancestor(task_cs(current)); <-- walks parents
    allowed = node_isset(node, cs->mems_allowed);
    rcu_read_unlock();
    spin_unlock_irqrestore(&callback_lock, flags);

cpuset_mems_allowed:
https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L4679
    spin_lock_irqsave(&callback_lock, flags);
    rcu_read_lock();
    guarantee_online_mems(task_cs(tsk), &mask); <-- walks parents
    rcu_read_unlock();
    spin_unlock_irqrestore(&callback_lock, flags);

Seems apparent that any form of parent walk in cpusets will require the
acquisition of &callback_lock.  This does not appear true of memcg.

Implementing a similar inheritance structure as described in this patch
set would therefore cause the acquisition of the callback lock during
node selection. So if we want this in cpuset, we're going to eat that
lock acquisition, despite not really needing it.


I'm was not intending to do checks against cpusets.mems_allowed when
acquiring weights, as this is already enforced between cpusets and
mempolicy on hotplug and mask changes, as well as in the allocators via
read_mems_allowed_begin/retry..

This is why I said this was *not* a constraining feature. Additionally,
if the node selected by mpol is exhausted, the allocator will simply
acquire memory from another (allowed) node, disregarding the weights
entirely (which is the correct / expected behavior). Another example of
"this is more of a suggestion" rather than a constraint.

So I'm contending here that putting it in cpusets is overkill.  But if
it likewise doesn't fit in memcg, is it insane to suggest that maybe we
should consider adding cgroup.mpol, and maybe consider migrating features
from mempolicy.c into cgroups (while keeping mpol the way it is).

~Gregory


  reply	other threads:[~2023-11-10 21:24 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09  0:25 Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 1/3] mm/memcontrol: implement memcg.interleave_weights Gregory Price
2023-11-09  0:25 ` [RFC PATCH v4 2/3] mm/mempolicy: implement weighted interleave Gregory Price
2023-11-10 15:26   ` Ravi Jonnalagadda
2023-11-09  0:25 ` [RFC PATCH v4 3/3] Documentation: sysfs entries for cgroup.memory.interleave_weights Gregory Price
2023-11-09 10:02 ` [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Michal Hocko
2023-11-09 15:10   ` Gregory Price
2023-11-09 16:34   ` Gregory Price
2023-11-10  9:05     ` Michal Hocko
2023-11-10 21:24       ` Gregory Price [this message]
     [not found] ` <klhcqksrg7uvdrf6hoi5tegifycjltz2kx2d62hapmw3ulr7oa@woibsnrpgox4>
2023-11-09 22:48   ` John Groves
2023-11-10 22:05     ` tj
2023-11-10 22:29       ` Gregory Price
2023-11-11  3:05         ` tj
2023-11-11  3:42           ` Gregory Price
2023-11-11 11:16             ` tj
2023-11-11 23:54               ` Dan Williams
2023-11-13  2:22                 ` Gregory Price
2023-11-14  9:43             ` Michal Hocko
2023-11-14 15:50               ` Gregory Price
2023-11-14 17:01                 ` Michal Hocko
2023-11-14 17:49                   ` Gregory Price
2023-11-15  5:56                     ` Huang, Ying
2023-12-04  3:33                       ` Gregory Price
2023-12-04  8:19                         ` Huang, Ying
2023-12-04 13:50                           ` Gregory Price
2023-12-05  9:01                             ` Huang, Ying
2023-12-05 14:47                               ` Gregory Price
2023-12-06  0:50                                 ` Huang, Ying
2023-12-06  2:01                                   ` Gregory Price
2023-11-10  6:16 ` Huang, Ying
2023-11-10 19:54   ` Gregory Price
2023-11-13  1:31     ` Huang, Ying
2023-11-13  2:28       ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZU6fkqsk6xiDzsnM@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=gourry.memverge@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox