From: Gregory Price <gregory.price@memverge.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Gregory Price <gourry.memverge@gmail.com>,
linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-mm@kvack.org, ying.huang@intel.com,
akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
weixugc@google.com, apopple@nvidia.com, tim.c.chen@intel.com,
dave.hansen@intel.com, shy828301@gmail.com,
gregkh@linuxfoundation.org, rafael@kernel.org
Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave
Date: Tue, 31 Oct 2023 00:27:04 -0400 [thread overview]
Message-ID: <ZUCCGJgrqqk87aGN@memverge.com> (raw)
In-Reply-To: <jgh5b5bm73qe7m3qmnsjo3drazgfaix3ycqmom5u6tfp6hcerj@ij4vftrutvrt>
On Tue, Oct 31, 2023 at 04:56:27PM +0100, Michal Hocko wrote:
> > This hopefully also explains why it's a global setting. The usecase is
> > different from conventional NUMA interleaving, which is used as a
> > locality measure: spread shared data evenly between compute
> > nodes. This one isn't about locality - the CXL tier doesn't have local
> > compute. Instead, the optimal spread is based on hardware parameters,
> > which is a global property rather than a per-workload one.
>
> Well, I am not convinced about that TBH. Sure it is probably a good fit
> for this specific CXL usecase but it just doesn't fit into many others I
> can think of - e.g. proportional use of those tiers based on the
> workload - you get what you pay for.
>
> Is there any specific reason for not having a new interleave interface
> which defines weights for the nodemask? Is this because the policy
> itself is very dynamic or is this more driven by simplicity of use?
>
I had originally implemented it this way while experimenting with new
mempolicies.
https://lore.kernel.org/linux-cxl/20231003002156.740595-5-gregory.price@memverge.com/
The downside of doing it in mempolicy is...
1) mempolicy is not sysfs friendly, and to make it sysfs friendly is a
non-trivial task. It is very "current-task" centric.
2) Barring a change to mempolicy to be sysfs friendly, the options for
implementing weights in the mempolicy are either a) new flag and
setting every weight individually in many syscalls, or b) a new
syscall (set_mempolicy2), which is what I demonstrated in the RFC.
3) mempolicy is also subject to cgroup nodemasks, and as a result you
end up with a rats nest of interactions between mempolicy nodemasks
changing as a result of cgroup migrations, nodes potentially coming
and going (hotplug under CXL), and others I'm probably forgetting.
Basically: If a node leaves the nodemask, should you retain the
weight, or should you reset it? If a new node comes into the node
mask... what weight should you set? I did not have answers to these
questions.
It was recommended to explore placing it in tiers instead, so I took a
crack at it here:
https://lore.kernel.org/linux-mm/20231009204259.875232-1-gregory.price@memverge.com/
This had similar issue with the idea of hotplug nodes: if you give a
tier a weight, and one or more of the nodes goes away/comes back... what
should you do with the weight? Split it up among the remaining nodes?
Rebalance? Etc.
The result of this discussion lead us to simply say "What if we place
the weights directly in the node". And that lead us to this RFC.
I am not against implementing it in mempolicy (as proof: my first RFC).
I am simply searching for the acceptable way to implement it.
One of the benefits of having it set as a global setting is that weights
can be automatically generated from HMAT/HMEM information (ACPI tables)
and programs already using MPOL_INTERLEAVE will have a direct benefit.
I have been considering whether MPOL_WEIGHTED_INTERLEAVE should be added
along side this patch so that MPOL_INTERLEAVE is left entirely alone.
Happy to discuss more,
~Gregory
next prev parent reply other threads:[~2023-10-31 16:56 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-31 0:38 Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 1/4] base/node.c: initialize the accessor list before registering Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 2/4] node: add accessors to sysfs when nodes are created Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 3/4] node: add interleave weights to node accessor Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 4/4] mm/mempolicy: modify interleave mempolicy to use node weights Gregory Price
2023-10-31 17:52 ` [EXT] " Srinivasulu Thanneeru
2023-10-31 18:23 ` Srinivasulu Thanneeru
2023-10-31 9:53 ` [RFC PATCH v3 0/4] Node Weights and Weighted Interleave Michal Hocko
2023-10-31 15:21 ` Johannes Weiner
2023-10-31 15:56 ` Michal Hocko
2023-10-31 4:27 ` Gregory Price [this message]
2023-11-01 13:45 ` Michal Hocko
2023-11-01 16:58 ` Gregory Price
2023-11-02 9:47 ` Michal Hocko
2023-11-02 3:18 ` Gregory Price
2023-11-03 7:45 ` Huang, Ying
2023-11-03 14:16 ` Jonathan Cameron
2023-11-06 3:20 ` Huang, Ying
2023-11-03 9:56 ` Michal Hocko
2023-11-02 18:21 ` Gregory Price
2023-11-03 16:59 ` Michal Hocko
2023-11-02 2:01 ` Huang, Ying
2023-10-31 16:22 ` Johannes Weiner
2023-10-31 4:29 ` Gregory Price
2023-11-01 2:34 ` Huang, Ying
2023-11-01 9:29 ` Ravi Jonnalagadda
2023-11-02 6:41 ` Huang, Ying
2023-11-02 9:35 ` Ravi Jonnalagadda
2023-11-02 14:13 ` Jonathan Cameron
2023-11-03 7:00 ` Huang, Ying
2023-11-01 13:56 ` Michal Hocko
2023-11-02 6:21 ` Huang, Ying
2023-11-02 9:30 ` Michal Hocko
2023-11-01 2:21 ` Huang, Ying
2023-11-01 14:01 ` Michal Hocko
2023-11-02 6:11 ` Huang, Ying
2023-11-02 9:28 ` Michal Hocko
2023-11-03 7:10 ` Huang, Ying
2023-11-03 9:39 ` Michal Hocko
2023-11-06 5:08 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZUCCGJgrqqk87aGN@memverge.com \
--to=gregory.price@memverge.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=apopple@nvidia.com \
--cc=dave.hansen@intel.com \
--cc=gourry.memverge@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=rafael@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox