From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Srinivasulu Thanneeru <sthanneeru@micron.com>,
Srinivasulu Opensrc <sthanneeru.opensrc@micron.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"aneesh.kumar@linux.ibm.com" <aneesh.kumar@linux.ibm.com>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"tj@kernel.org" <tj@kernel.org>,
"john@jagalactic.com" <john@jagalactic.com>,
Eishan Mirakhur <emirakhur@micron.com>,
"Vinicius Tavares Petrucci" <vtavarespetr@micron.com>,
Ravis OpenSrc <Ravis.OpenSrc@micron.com>,
"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Wei Xu <weixugc@google.com>,
Hao Xiang <hao.xiang@bytedance.com>,
"Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>
Subject: Re: [EXT] Re: [RFC PATCH v2 0/2] Node migration between memory tiers
Date: Tue, 09 Jan 2024 11:41:11 +0800 [thread overview]
Message-ID: <87o7dv897s.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <ZZwrIoP9+ey7rp3C@memverge.com> (Gregory Price's message of "Mon, 8 Jan 2024 12:04:34 -0500")
Gregory Price <gregory.price@memverge.com> writes:
> On Thu, Jan 04, 2024 at 02:05:01PM +0800, Huang, Ying wrote:
>> >
>> > From https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf
>> > abstract_distance_offset: override by users to deal with firmware issue.
>> >
>> > say firmware can configure the cxl node into wrong tiers, similar to
>> > that it may also configure all cxl nodes into single memtype, hence
>> > all these nodes can fall into a single wrong tier.
>> > In this case, per node adistance_offset would be good to have ?
>>
>> I think that it's better to fix the error firmware if possible. And
>> these are only theoretical, not practical issues. Do you have some
>> practical issues?
>>
>> I understand that users may want to move nodes between memory tiers for
>> different policy choices. For that, memory_type based adistance_offset
>> should be good.
>>
>
> There's actually an affirmative case to change memory tiering to allow
> either movement of nodes between tiers, or at least base placement on
> HMAT information. Preferably, membership would be changable to allow
> hotplug/DCD to be managed (there's no guarantee that the memory passed
> through will always be what HMAT says on initial boot).
IIUC, from Jonathan Cameron as below, the performance of memory
shouldn't change even for DCD devices.
https://lore.kernel.org/linux-mm/20231103141636.000007e4@Huawei.com/
It's possible to change the performance of a NUMA node changed, if we
hot-remove a memory device, then hot-add another different memory
device. It's hoped that the CDAT changes too.
So, all in all, HMAT + CDAT can help us to put the memory device in
appropriate memory tiers. Now, we have HMAT support in upstream. We
will working on CDAT support.
--
Best Regards,
Huang, Ying
> https://lore.kernel.org/linux-cxl/CAAYibXjZ0HSCqMrzXGv62cMLncS_81R3e1uNV5Fu4CPm0zAtYw@mail.gmail.com/
>
> This group wants to enable passing CXL memory through to KVM/QEMU
> (i.e. host CXL expander memory passed through to the guest), and
> allow the guest to apply memory tiering.
>
> There are multiple issues with this, presently:
>
> 1. The QEMU CXL virtual device is not and probably never will be
> performant enough to be a commodity class virtualization. The
> reason is that the virtual CXL device is built off the I/O
> virtualization stack, which treats memory accesses as I/O accesses.
>
> KVM also seems incompatible with the design of the CXL memory device
> in general, but this problem may or may not be a blocker.
>
> As a result, access to virtual CXL memory device leads to QEMU
> crawling to a halt - and this is unlikely to change.
>
> There is presently no good way forward to create a performant virtual
> CXL device in QEMU. This means the memory tiering component in the
> kernel is functionally useless for virtual CXL memory, because...
>
> 2. When passing memory through as an explicit NUMA node, but not as
> part of a CXL memory device, the nodes are lumped together in the
> DRAM tier.
>
> None of this has to do with firmware.
>
> Memory-type is an awful way of denoting membership of a tier, but we
> have HMAT information that can be passed through via QEMU:
>
> -object memory-backend-ram,size=4G,id=ram-node0 \
> -object memory-backend-ram,size=4G,id=ram-node1 \
> -numa node,nodeid=0,cpus=0-4,memdev=ram-node0 \
> -numa node,initiator=0,nodeid=1,memdev=ram-node1 \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10 \
> -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10485760 \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=20 \
> -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5242880
>
> Not only would it be nice if we could change tier membership based on
> this data, it's realistically the only way to allow guests to accomplish
> memory tiering w/ KVM/QEMU and CXL memory passed through to the guest.
>
> ~Gregory
next prev parent reply other threads:[~2024-01-09 3:43 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-13 17:53 sthanneeru.opensrc
2023-12-13 17:53 ` [PATCH 1/2] base/node: Add sysfs for memtier_override sthanneeru.opensrc
2023-12-13 17:53 ` [PATCH 2/2] memory tier: Support node migration between tiers sthanneeru.opensrc
2023-12-15 5:02 ` [RFC PATCH v2 0/2] Node migration between memory tiers Huang, Ying
2023-12-15 17:42 ` Gregory Price
2023-12-18 5:55 ` Huang, Ying
2024-01-03 5:26 ` [EXT] " Srinivasulu Thanneeru
2024-01-03 6:07 ` Huang, Ying
2024-01-03 7:56 ` Srinivasulu Thanneeru
2024-01-03 8:29 ` Huang, Ying
2024-01-03 8:47 ` Srinivasulu Thanneeru
2024-01-04 6:05 ` Huang, Ying
2024-01-08 17:04 ` Gregory Price
2024-01-09 3:41 ` Huang, Ying [this message]
2024-01-09 15:50 ` Jonathan Cameron
2024-01-09 17:59 ` Gregory Price
2024-01-10 0:28 ` [External] " Hao Xiang
2024-01-10 14:18 ` Jonathan Cameron
2024-01-10 19:29 ` Hao Xiang
2024-01-12 7:00 ` Huang, Ying
2024-01-12 8:14 ` Hao Xiang
2024-01-15 1:24 ` Huang, Ying
2024-01-10 5:47 ` Huang, Ying
2024-01-10 14:11 ` Jonathan Cameron
2024-01-10 6:06 ` Huang, Ying
2024-01-09 17:34 ` Gregory Price
2023-12-18 8:56 ` Srinivasulu Thanneeru
2023-12-19 3:57 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7dv897s.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Ravis.OpenSrc@micron.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=emirakhur@micron.com \
--cc=gregory.price@memverge.com \
--cc=hannes@cmpxchg.org \
--cc=hao.xiang@bytedance.com \
--cc=horenchuang@bytedance.com \
--cc=john@jagalactic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=sthanneeru.opensrc@micron.com \
--cc=sthanneeru@micron.com \
--cc=tj@kernel.org \
--cc=vtavarespetr@micron.com \
--cc=weixugc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox