From: Michal Hocko <mhocko@suse.com>
To: Gregory Price <gourry.memverge@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
linux-mm@kvack.org, ying.huang@intel.com,
akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
weixugc@google.com, apopple@nvidia.com, hannes@cmpxchg.org,
tim.c.chen@intel.com, dave.hansen@intel.com,
shy828301@gmail.com, gregkh@linuxfoundation.org,
rafael@kernel.org, Gregory Price <gregory.price@memverge.com>
Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave
Date: Tue, 31 Oct 2023 10:53:41 +0100 [thread overview]
Message-ID: <rm43wgtlvwowjolzcf6gj4un4qac4myngxqnd2jwt5yqxree62@t66scnrruttc> (raw)
In-Reply-To: <20231031003810.4532-1-gregory.price@memverge.com>
On Mon 30-10-23 20:38:06, Gregory Price wrote:
> This patchset implements weighted interleave and adds a new sysfs
> entry: /sys/devices/system/node/nodeN/accessM/il_weight.
>
> The il_weight of a node is used by mempolicy to implement weighted
> interleave when `numactl --interleave=...` is invoked. By default
> il_weight for a node is always 1, which preserves the default round
> robin interleave behavior.
>
> Interleave weights may be set from 0-100, and denote the number of
> pages that should be allocated from the node when interleaving
> occurs.
>
> For example, if a node's interleave weight is set to 5, 5 pages
> will be allocated from that node before the next node is scheduled
> for allocations.
I find this semantic rather weird TBH. First of all why do you think it
makes sense to have those weights global for all users? What if
different applications have different view on how to spred their
interleaved memory?
I do get that you might have a different tiers with largerly different
runtime characteristics but why would you want to interleave them into a
single mapping and have hard to predict runtime behavior?
[...]
> In this way it becomes possible to set an interleaving strategy
> that fits the available bandwidth for the devices available on
> the system. An example system:
>
> Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket)
> Node 1 - CPU+DRAM, 400GB/s BW (200 cross socket)
> Node 2 - CXL Memory. 64GB/s BW, on Node 0 root complex
> Node 3 - CXL Memory. 64GB/s BW, on Node 1 root complex
>
> In this setup, the effective weights for nodes 0-3 for a task
> running on Node 0 may be [60, 20, 10, 10].
>
> This spreads memory out across devices which all have different
> latency and bandwidth attributes at a way that can maximize the
> available resources.
OK, so why is this any better than not using any memory policy rely
on demotion to push out cold memory down the tier hierarchy?
What is the actual real life usecase and what kind of benefits you can
present?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2023-10-31 9:53 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-31 0:38 Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 1/4] base/node.c: initialize the accessor list before registering Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 2/4] node: add accessors to sysfs when nodes are created Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 3/4] node: add interleave weights to node accessor Gregory Price
2023-10-31 0:38 ` [RFC PATCH v3 4/4] mm/mempolicy: modify interleave mempolicy to use node weights Gregory Price
2023-10-31 17:52 ` [EXT] " Srinivasulu Thanneeru
2023-10-31 18:23 ` Srinivasulu Thanneeru
2023-10-31 9:53 ` Michal Hocko [this message]
2023-10-31 15:21 ` [RFC PATCH v3 0/4] Node Weights and Weighted Interleave Johannes Weiner
2023-10-31 15:56 ` Michal Hocko
2023-10-31 4:27 ` Gregory Price
2023-11-01 13:45 ` Michal Hocko
2023-11-01 16:58 ` Gregory Price
2023-11-02 9:47 ` Michal Hocko
2023-11-02 3:18 ` Gregory Price
2023-11-03 7:45 ` Huang, Ying
2023-11-03 14:16 ` Jonathan Cameron
2023-11-06 3:20 ` Huang, Ying
2023-11-03 9:56 ` Michal Hocko
2023-11-02 18:21 ` Gregory Price
2023-11-03 16:59 ` Michal Hocko
2023-11-02 2:01 ` Huang, Ying
2023-10-31 16:22 ` Johannes Weiner
2023-10-31 4:29 ` Gregory Price
2023-11-01 2:34 ` Huang, Ying
2023-11-01 9:29 ` Ravi Jonnalagadda
2023-11-02 6:41 ` Huang, Ying
2023-11-02 9:35 ` Ravi Jonnalagadda
2023-11-02 14:13 ` Jonathan Cameron
2023-11-03 7:00 ` Huang, Ying
2023-11-01 13:56 ` Michal Hocko
2023-11-02 6:21 ` Huang, Ying
2023-11-02 9:30 ` Michal Hocko
2023-11-01 2:21 ` Huang, Ying
2023-11-01 14:01 ` Michal Hocko
2023-11-02 6:11 ` Huang, Ying
2023-11-02 9:28 ` Michal Hocko
2023-11-03 7:10 ` Huang, Ying
2023-11-03 9:39 ` Michal Hocko
2023-11-06 5:08 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=rm43wgtlvwowjolzcf6gj4un4qac4myngxqnd2jwt5yqxree62@t66scnrruttc \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=apopple@nvidia.com \
--cc=dave.hansen@intel.com \
--cc=gourry.memverge@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=gregory.price@memverge.com \
--cc=hannes@cmpxchg.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rafael@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox