Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,  <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,  <linux-cxl@vger.kernel.org>,
	<akpm@linux-foundation.org>,  <sthanneeru@micron.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving
Date: Wed, 18 Oct 2023 16:31:11 +0800	[thread overview]
Message-ID: <87lec0wcvk.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> (Ying Huang's message of "Wed, 18 Oct 2023 16:29:02 +0800")

Forget to Cc more people.

"Huang, Ying" <ying.huang@intel.com> writes:

> Gregory Price <gregory.price@memverge.com> writes:
>
>> On Mon, Oct 16, 2023 at 03:57:52PM +0800, Huang, Ying wrote:
>>> Gregory Price <gourry.memverge@gmail.com> writes:
>>> 
>>> > == Mutex to Semaphore change:
>>> >
>>> > Since it is expected that many threads will be accessing this data
>>> > during allocations, a mutex is not appropriate.
>>> 
>>> IIUC, this is a change for performance.  If so, please show some
>>> performance data.
>>>
>>
>> This change will be dropped in v3 in favor of the existing
>> RCU mechanism in memory-tiers.c as pointed out by Matthew.
>>
>>> > == Source-node relative weighting:
>>> >
>>> > 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
>>> >    echo source_node:weight > /path/to/interleave_weight
>>> 
>>> If source_node is considered, why not consider target_node too?  On a
>>> system with only 1 tier (DRAM), do you want weighted interleaving among
>>> NUMA nodes?  If so, why tie weighted interleaving with memory tiers?
>>> Why not just introduce weighted interleaving for NUMA nodes?
>>>
>>
>> The short answer: Practicality and ease-of-use.
>>
>> The long answer: We have been discussing how to make this more flexible..
>>
>> Personally, I agree with you.  If Task A is on Socket 0, the weight on
>> Socket 0 DRAM should not be the same as the weight on Socket 1 DRAM.
>> However, right now, DRAM nodes are lumped into the same tier together,
>> resulting in them having the same weight.
>>
>> If you scrollback through the list, you'll find an RFC I posted for
>> set_mempolicy2 which implements weighted interleave in mm/mempolicy.
>> However, mm/mempolicy is extremely `current-centric` at the moment,
>> so that makes changing weights at runtime (in response to a hotplug
>> event, for example) very difficult.
>>
>> I still think there is room to extend set_mempolicy to allow
>> task-defined weights to take preference over tier defined weights.
>>
>> We have discussed adding the following features to memory-tiers:
>>
>> 1) breaking up tiers to allow 1 tier per node, as opposed to defaulting
>>    to lumping all nodes of a simlar quality into the same tier
>>
>> 2) enabling movemnet of nodes between tiers (for the purpose of
>>    reconfiguring due to hotplug and other situations)
>>
>> For users that require fine-grained control over each individual node,
>> this would allow for weights to be applied per-node, because a
>> node=tier. For the majority of use cases, it would allow clumping of
>> nodes into tiers based on physical topology and performance class, and
>> then allow for the general weighting to apply.  This seems like the most
>> obvious use-case that a majority of users would use, and also the
>> easiest to set-up in the short term.
>>
>> That said, there are probably 3 or 4 different ways/places to implement
>> this feature.  The question is what is the clear and obvious way?
>> I don't have a definitive answer for that, hence the RFC.
>>
>> There are at least 5 proposals that i know of at the moment
>>
>> 1) mempolicy
>> 2) memory-tiers
>> 3) memory-block interleaving? (weighting among blocks inside a node)
>>    Maybe relevant if Dynamic Capacity devices arrive, but it seems
>>    like the wrong place to do this.
>> 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...)
>> 5) "just do it in hardware"
>
> It may be easier to start with the use case.  What is the practical use
> cases in your mind that can not be satisfied with simple per-memory-tier
> weight?  Can you compare the memory layout with different proposals?
>
>>> > # Set tier4 weight from node 0 to 85
>>> > echo 0:85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
>>> > # Set tier4 weight from node 1 to 65
>>> > echo 1:65 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
>>> > # Set tier22 weight from node 0 to 15
>>> > echo 0:15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>>> > # Set tier22 weight from node 1 to 10
>>> > echo 1:10 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>
> --
> Best Regards,
> Huang, Ying

     prev parent reply	other threads:[~2023-10-18  8:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-09 20:42 Gregory Price
2023-10-09 20:42 ` [RFC PATCH v2 1/3] mm/memory-tiers: change mutex to rw semaphore Gregory Price
2023-10-09 20:42 ` [RFC PATCH v2 2/3] mm/memory-tiers: Introduce sysfs for tier interleave weights Gregory Price
2023-10-09 20:42 ` [RFC PATCH v2 3/3] mm/mempolicy: modify interleave mempolicy to use memtier weights Gregory Price
2023-10-11 21:15 ` [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving Matthew Wilcox
2023-10-10  1:07   ` Gregory Price
2023-10-16  7:57 ` Huang, Ying
2023-10-17  1:28   ` Gregory Price
2023-10-18  8:29     ` Huang, Ying
2023-10-17  2:52       ` Gregory Price
2023-10-19  6:28         ` Huang, Ying
2023-10-18  2:47           ` Gregory Price
2023-10-20  6:11             ` Huang, Ying
2023-10-19 13:26               ` Gregory Price
2023-10-23  2:09                 ` Huang, Ying
2023-10-24 15:32                   ` Gregory Price
2023-10-25  1:13                     ` Huang, Ying
2023-10-25 19:51                       ` Gregory Price
2023-10-30  2:20                         ` Huang, Ying
2023-10-30  4:19                           ` Gregory Price
2023-10-30  5:23                             ` Huang, Ying
2023-10-18  8:31       ` Huang, Ying [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lec0wcvk.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sthanneeru@micron.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox