From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Gregory Price <gourry.memverge@gmail.com>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
<linux-mm@kvack.org>, <akpm@linux-foundation.org>,
<aneesh.kumar@linux.ibm.com>, <weixugc@google.com>,
<apopple@nvidia.com>, <tim.c.chen@intel.com>,
<dave.hansen@intel.com>, <shy828301@gmail.com>,
<gregkh@linuxfoundation.org>, <rafael@kernel.org>
Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave
Date: Thu, 02 Nov 2023 10:01:20 +0800 [thread overview]
Message-ID: <87cyws3ocv.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <ZUCCGJgrqqk87aGN@memverge.com> (Gregory Price's message of "Tue, 31 Oct 2023 00:27:04 -0400")
Gregory Price <gregory.price@memverge.com> writes:
> On Tue, Oct 31, 2023 at 04:56:27PM +0100, Michal Hocko wrote:
>
>> > This hopefully also explains why it's a global setting. The usecase is
>> > different from conventional NUMA interleaving, which is used as a
>> > locality measure: spread shared data evenly between compute
>> > nodes. This one isn't about locality - the CXL tier doesn't have local
>> > compute. Instead, the optimal spread is based on hardware parameters,
>> > which is a global property rather than a per-workload one.
>>
>> Well, I am not convinced about that TBH. Sure it is probably a good fit
>> for this specific CXL usecase but it just doesn't fit into many others I
>> can think of - e.g. proportional use of those tiers based on the
>> workload - you get what you pay for.
>>
>> Is there any specific reason for not having a new interleave interface
>> which defines weights for the nodemask? Is this because the policy
>> itself is very dynamic or is this more driven by simplicity of use?
>>
>
> I had originally implemented it this way while experimenting with new
> mempolicies.
>
> https://lore.kernel.org/linux-cxl/20231003002156.740595-5-gregory.price@memverge.com/
>
> The downside of doing it in mempolicy is...
> 1) mempolicy is not sysfs friendly, and to make it sysfs friendly is a
> non-trivial task. It is very "current-task" centric.
>
> 2) Barring a change to mempolicy to be sysfs friendly, the options for
> implementing weights in the mempolicy are either a) new flag and
> setting every weight individually in many syscalls, or b) a new
> syscall (set_mempolicy2), which is what I demonstrated in the RFC.
>
> 3) mempolicy is also subject to cgroup nodemasks, and as a result you
> end up with a rats nest of interactions between mempolicy nodemasks
> changing as a result of cgroup migrations, nodes potentially coming
> and going (hotplug under CXL), and others I'm probably forgetting.
>
> Basically: If a node leaves the nodemask, should you retain the
> weight, or should you reset it? If a new node comes into the node
> mask... what weight should you set? I did not have answers to these
> questions.
>
>
> It was recommended to explore placing it in tiers instead, so I took a
> crack at it here:
>
> https://lore.kernel.org/linux-mm/20231009204259.875232-1-gregory.price@memverge.com/
>
> This had similar issue with the idea of hotplug nodes: if you give a
> tier a weight, and one or more of the nodes goes away/comes back... what
> should you do with the weight? Split it up among the remaining nodes?
> Rebalance? Etc.
The weight of a tier can be defined as the weight of one node of the
tier instead of the weight of all nodes of the tier. That is, for a
system as follows,
tier 0: node 0, node 1; weight=4
tier 1: node 2, node 3; weight=1
If you run workload with `numactl --weighted-interleave -n 0,2,3`, the
proportion will be: "4:0:1:1" on each node.
While for `numactl --weighted-interleave -n 0,2`, it will be: "4:0:1:0".
--
Best Regards,
Huang, Ying
> The result of this discussion lead us to simply say "What if we place
> the weights directly in the node". And that lead us to this RFC.
>
>
> I am not against implementing it in mempolicy (as proof: my first RFC).
> I am simply searching for the acceptable way to implement it.
>
> One of the benefits of having it set as a global setting is that weights
> can be automatically generated from HMAT/HMEM information (ACPI tables)
> and programs already using MPOL_INTERLEAVE will have a direct benefit.
>
> I have been considering whether MPOL_WEIGHTED_INTERLEAVE should be added
> along side this patch so that MPOL_INTERLEAVE is left entirely alone.
>
> Happy to discuss more,
> ~Gregory
next prev parent reply other threads:[~2023-11-02 2:03 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-31 0:38 Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 1/4] base/node.c: initialize the accessor list before registering Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 2/4] node: add accessors to sysfs when nodes are created Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 3/4] node: add interleave weights to node accessor Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 4/4] mm/mempolicy: modify interleave mempolicy to use node weights Gregory Price
2023-10-31 17:52 ` [EXT] " Srinivasulu Thanneeru
2023-10-31 18:23 ` Srinivasulu Thanneeru
2023-10-31 9:53 ` [RFC PATCH v3 0/4] Node Weights and Weighted Interleave Michal Hocko
2023-10-31 15:21 ` Johannes Weiner
2023-10-31 15:56 ` Michal Hocko
2023-10-31 4:27 ` Gregory Price
2023-11-01 13:45 ` Michal Hocko
2023-11-01 16:58 ` Gregory Price
2023-11-02 9:47 ` Michal Hocko
2023-11-02 3:18 ` Gregory Price
2023-11-03 7:45 ` Huang, Ying
2023-11-03 14:16 ` Jonathan Cameron
2023-11-06 3:20 ` Huang, Ying
2023-11-03 9:56 ` Michal Hocko
2023-11-02 18:21 ` Gregory Price
2023-11-03 16:59 ` Michal Hocko
2023-11-02 2:01 ` Huang, Ying [this message]
2023-10-31 16:22 ` Johannes Weiner
2023-10-31 4:29 ` Gregory Price
2023-11-01 2:34 ` Huang, Ying
2023-11-01 9:29 ` Ravi Jonnalagadda
2023-11-02 6:41 ` Huang, Ying
2023-11-02 9:35 ` Ravi Jonnalagadda
2023-11-02 14:13 ` Jonathan Cameron
2023-11-03 7:00 ` Huang, Ying
2023-11-01 13:56 ` Michal Hocko
2023-11-02 6:21 ` Huang, Ying
2023-11-02 9:30 ` Michal Hocko
2023-11-01 2:21 ` Huang, Ying
2023-11-01 14:01 ` Michal Hocko
2023-11-02 6:11 ` Huang, Ying
2023-11-02 9:28 ` Michal Hocko
2023-11-03 7:10 ` Huang, Ying
2023-11-03 9:39 ` Michal Hocko
2023-11-06 5:08 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cyws3ocv.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=apopple@nvidia.com \
--cc=dave.hansen@intel.com \
--cc=gourry.memverge@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=gregory.price@memverge.com \
--cc=hannes@cmpxchg.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=rafael@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox