From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Gregory Price <gourry.memverge@gmail.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
<akpm@linux-foundation.org>, <sthanneeru@micron.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving
Date: Wed, 18 Oct 2023 16:31:11 +0800 [thread overview]
Message-ID: <87lec0wcvk.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> (Ying Huang's message of "Wed, 18 Oct 2023 16:29:02 +0800")
Forget to Cc more people.
"Huang, Ying" <ying.huang@intel.com> writes:
> Gregory Price <gregory.price@memverge.com> writes:
>
>> On Mon, Oct 16, 2023 at 03:57:52PM +0800, Huang, Ying wrote:
>>> Gregory Price <gourry.memverge@gmail.com> writes:
>>>
>>> > == Mutex to Semaphore change:
>>> >
>>> > Since it is expected that many threads will be accessing this data
>>> > during allocations, a mutex is not appropriate.
>>>
>>> IIUC, this is a change for performance. If so, please show some
>>> performance data.
>>>
>>
>> This change will be dropped in v3 in favor of the existing
>> RCU mechanism in memory-tiers.c as pointed out by Matthew.
>>
>>> > == Source-node relative weighting:
>>> >
>>> > 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
>>> > echo source_node:weight > /path/to/interleave_weight
>>>
>>> If source_node is considered, why not consider target_node too? On a
>>> system with only 1 tier (DRAM), do you want weighted interleaving among
>>> NUMA nodes? If so, why tie weighted interleaving with memory tiers?
>>> Why not just introduce weighted interleaving for NUMA nodes?
>>>
>>
>> The short answer: Practicality and ease-of-use.
>>
>> The long answer: We have been discussing how to make this more flexible..
>>
>> Personally, I agree with you. If Task A is on Socket 0, the weight on
>> Socket 0 DRAM should not be the same as the weight on Socket 1 DRAM.
>> However, right now, DRAM nodes are lumped into the same tier together,
>> resulting in them having the same weight.
>>
>> If you scrollback through the list, you'll find an RFC I posted for
>> set_mempolicy2 which implements weighted interleave in mm/mempolicy.
>> However, mm/mempolicy is extremely `current-centric` at the moment,
>> so that makes changing weights at runtime (in response to a hotplug
>> event, for example) very difficult.
>>
>> I still think there is room to extend set_mempolicy to allow
>> task-defined weights to take preference over tier defined weights.
>>
>> We have discussed adding the following features to memory-tiers:
>>
>> 1) breaking up tiers to allow 1 tier per node, as opposed to defaulting
>> to lumping all nodes of a simlar quality into the same tier
>>
>> 2) enabling movemnet of nodes between tiers (for the purpose of
>> reconfiguring due to hotplug and other situations)
>>
>> For users that require fine-grained control over each individual node,
>> this would allow for weights to be applied per-node, because a
>> node=tier. For the majority of use cases, it would allow clumping of
>> nodes into tiers based on physical topology and performance class, and
>> then allow for the general weighting to apply. This seems like the most
>> obvious use-case that a majority of users would use, and also the
>> easiest to set-up in the short term.
>>
>> That said, there are probably 3 or 4 different ways/places to implement
>> this feature. The question is what is the clear and obvious way?
>> I don't have a definitive answer for that, hence the RFC.
>>
>> There are at least 5 proposals that i know of at the moment
>>
>> 1) mempolicy
>> 2) memory-tiers
>> 3) memory-block interleaving? (weighting among blocks inside a node)
>> Maybe relevant if Dynamic Capacity devices arrive, but it seems
>> like the wrong place to do this.
>> 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...)
>> 5) "just do it in hardware"
>
> It may be easier to start with the use case. What is the practical use
> cases in your mind that can not be satisfied with simple per-memory-tier
> weight? Can you compare the memory layout with different proposals?
>
>>> > # Set tier4 weight from node 0 to 85
>>> > echo 0:85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
>>> > # Set tier4 weight from node 1 to 65
>>> > echo 1:65 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
>>> > # Set tier22 weight from node 0 to 15
>>> > echo 0:15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>>> > # Set tier22 weight from node 1 to 10
>>> > echo 1:10 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>
> --
> Best Regards,
> Huang, Ying
prev parent reply other threads:[~2023-10-18 8:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-09 20:42 Gregory Price
2023-10-09 20:42 ` [RFC PATCH v2 1/3] mm/memory-tiers: change mutex to rw semaphore Gregory Price
2023-10-09 20:42 ` [RFC PATCH v2 2/3] mm/memory-tiers: Introduce sysfs for tier interleave weights Gregory Price
2023-10-09 20:42 ` [RFC PATCH v2 3/3] mm/mempolicy: modify interleave mempolicy to use memtier weights Gregory Price
2023-10-11 21:15 ` [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving Matthew Wilcox
2023-10-10 1:07 ` Gregory Price
2023-10-16 7:57 ` Huang, Ying
2023-10-17 1:28 ` Gregory Price
2023-10-18 8:29 ` Huang, Ying
2023-10-17 2:52 ` Gregory Price
2023-10-19 6:28 ` Huang, Ying
2023-10-18 2:47 ` Gregory Price
2023-10-20 6:11 ` Huang, Ying
2023-10-19 13:26 ` Gregory Price
2023-10-23 2:09 ` Huang, Ying
2023-10-24 15:32 ` Gregory Price
2023-10-25 1:13 ` Huang, Ying
2023-10-25 19:51 ` Gregory Price
2023-10-30 2:20 ` Huang, Ying
2023-10-30 4:19 ` Gregory Price
2023-10-30 5:23 ` Huang, Ying
2023-10-18 8:31 ` Huang, Ying [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lec0wcvk.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=gourry.memverge@gmail.com \
--cc=gregory.price@memverge.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sthanneeru@micron.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox