From: Kyungsan Kim <ks0204.kim@samsung.com>
To: david@redhat.com
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org,
a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com,
dan.j.williams@intel.com, seungjun.ha@samsung.com,
wj28.lee@samsung.com
Subject: RE: Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL
Date: Fri, 7 Apr 2023 18:30:07 +0900 [thread overview]
Message-ID: <20230407093007.420852-1-ks0204.kim@samsung.com> (raw)
In-Reply-To: <6ebf38f1-b7c4-cb38-b72f-2e406d2a2fdc@redhat.com>
>On 05.04.23 21:42, Dan Williams wrote:
>> Matthew Wilcox wrote:
>>> On Tue, Apr 04, 2023 at 09:48:41PM -0700, Dan Williams wrote:
>>>> Kyungsan Kim wrote:
>>>>> We know the situation. When a CXL DRAM channel is located under ZONE_NORMAL,
>>>>> a random allocation of a kernel object by calling kmalloc() siblings makes the entire CXL DRAM unremovable.
>>>>> Also, not all kernel objects can be allocated from ZONE_MOVABLE.
>>>>>
>>>>> ZONE_EXMEM does not confine a movability attribute(movable or unmovable), rather it allows a calling context can decide it.
>>>>> In that aspect, it is the same with ZONE_NORMAL but ZONE_EXMEM works for extended memory device.
>>>>> It does not mean ZONE_EXMEM support both movability and kernel object allocation at the same time.
>>>>> In case multiple CXL DRAM channels are connected, we think a memory consumer possibly dedicate a channel for movable or unmovable purpose.
>>>>>
>>>>
>>>> I want to clarify that I expect the number of people doing physical CXL
>>>> hotplug of whole devices to be small compared to dynamic capacity
>>>> devices (DCD). DCD is a new feature of the CXL 3.0 specification where a
>>>> device maps 1 or more thinly provisioned memory regions that have
>>>> individual extents get populated and depopulated by a fabric manager.
>>>>
>>>> In that scenario there is a semantic where the fabric manager hands out
>>>> 100G to a host and asks for it back, it is within the protocol that the
>>>> host can say "I can give 97GB back now, come back and ask again if you
>>>> need that last 3GB".
>>>
>>> Presumably it can't give back arbitrary chunks of that 100GB? There's
>>> some granularity that's preferred; maybe on 1GB boundaries or something?
>>
>> The device picks a granularity that can be tiny per spec, but it makes
>> the hardware more expensive to track in small extents, so I expect
>> something reasonable like 1GB, but time will tell once actual devices
>> start showing up.
>
>It all sounds a lot like virtio-mem using real hardware [I know, there
>are important differences, but for the dynamic aspect there are very
>similar issues to solve]
>
>Fir virtio-mem, the current best way to support hotplugging of large
>memory to a VM to eventually be able to unplug a big fraction again is
>using a combination of ZONE_MOVABLE and ZONE_NORMAL -- "auto-movable"
>memory onlining policy. What's online to ZONE_MOVABLE can get (fairly)
>reliably unplugged again. What's onlined to ZONE_NORMAL is possibly lost
>forever.
>
>Like (incrementally) hotplugging 1 TiB to a 4 GiB VM. Being able to
>unplug 1 TiB reliably again is pretty much out of scope. But the more
>memory we can reliably get back the better. And the more memory we can
>get in the common case, the better. With a ZONE_NORMAL vs. ZONE_MOVABLE
>ration of 1:3 on could unplug ~768 GiB again reliably. The remainder
>depends on fragmentation on the actual system and the unplug granularity.
>
>The original plan was to use ZONE_PREFER_MOVABLE as a safety buffer to
>reduce ZONE_NORMAL memory without increasing ZONE_MOVABLE memory (and
>possibly harming the system). The underlying idea was that in many
>setups that memory in ZONE_PREFER_MOVABLE would not get used for
>unmovable allocations and it could, therefore, get unplugged fairly
>reliably in these setups. For all other setups, unmmovable allocations
>could leak into ZONE_PREFER_MOVABLE and reduce the number of memory we
>could unplug again. But the system would try to keep unmovable
>allocations to ZONE_NORMAL, so in most cases with some
>ZONE_PREFER_MOVABLE memory we would perform better than with only
>ZONE_NORMAL.
Probably memory hotplug mechanism would be separated into two stages, physical memory add/remove and logical memory on/offline[1].
We think ZONE_PREFER_MOVABLE could help logical memory on/offline. But, there would be trade-off between physical add/remove and device utilization.
In case of ZONE_PREFER_MOVABLE allocation on switched CXL DRAM devices,
when pages are evenly allocated among physical CXL DRAM devices, then it would not help physical memory add/remove.
Meanwhile, when page are sequentially allocated among physical CXL DRAM devices, it would be opposite.
ZONE_EXMEM provides provision of CXL DRAM devices[2], we think the idea of ZONE_PREFER_MOVABLE idea can be applied on that.
For example, preferred movable page per CXL DRAM device within the zone.
[1] https://docs.kernel.org/admin-guide/mm/memory-hotplug.html#phases-of-memory-hotplug
[2] https://github.com/OpenMPDK/SMDK/wiki/2.-SMDK-Architecture#memory-partition
>
>--
>Thanks,
>
>David / dhildenb
next prev parent reply other threads:[~2023-04-07 9:30 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d@epcas2p1.samsung.com>
2023-02-21 1:41 ` Kyungsan Kim
2023-02-27 23:14 ` Dan Williams
[not found] ` <CGME20230228043551epcas2p3085444899b00b106c2901e1f51814d2c@epcas2p3.samsung.com>
2023-02-28 4:35 ` Kyungsan Kim
2023-03-03 6:07 ` Huang, Ying
[not found] ` <CGME20230322043354epcas2p2227bcad190a470d635b92f92587dc69e@epcas2p2.samsung.com>
2023-03-22 4:33 ` FW: " Kyungsan Kim
2023-03-22 22:03 ` Dan Williams
[not found] ` <CGME20230323105106epcas2p39ea8de619622376a4698db425c6a6fb3@epcas2p3.samsung.com>
2023-03-23 10:51 ` RE(2): " Kyungsan Kim
2023-03-23 12:25 ` David Hildenbrand
[not found] ` <CGME20230324090923epcas2p2710ba4dc8157f9141c03104cf66e9d26@epcas2p2.samsung.com>
2023-03-24 9:09 ` RE(4): " Kyungsan Kim
2023-03-24 9:12 ` David Hildenbrand
[not found] ` <CGME20230324092731epcas2p315c348bd76ef9fc84bffdb158e4c1aa4@epcas2p3.samsung.com>
2023-03-24 9:27 ` RE(2): " Kyungsan Kim
2023-03-24 9:30 ` David Hildenbrand
[not found] ` <CGME20230324095031epcas2p284095ae90b25a47360b5098478dffdaa@epcas2p2.samsung.com>
2023-03-24 9:50 ` RE(3): " Kyungsan Kim
2023-03-24 13:08 ` Jørgen Hansen
2023-03-24 22:33 ` David Hildenbrand
[not found] ` <CGME20230331114220epcas2p2d5734efcbdd8956f861f8e7178cd5288@epcas2p2.samsung.com>
2023-03-31 11:42 ` Kyungsan Kim
2023-03-31 13:42 ` Matthew Wilcox
2023-03-31 15:56 ` Frank van der Linden
2023-04-03 8:34 ` David Hildenbrand
[not found] ` <CGME20230405021655epcas2p2364b1f56dcde629bbd05bc796c2896aa@epcas2p2.samsung.com>
2023-04-05 2:16 ` Kyungsan Kim
[not found] ` <CGME20230405020631epcas2p1c85058b28a70bbd46d587e78a9c9c7ad@epcas2p1.samsung.com>
2023-04-05 2:06 ` Re: " Kyungsan Kim
2023-04-05 5:00 ` Dan Williams
[not found] ` <CGME20230405020121epcas2p2d9d39c151b6c5ab9e568ab9e2ab826ce@epcas2p2.samsung.com>
2023-04-05 2:01 ` Kyungsan Kim
2023-04-05 3:11 ` Matthew Wilcox
2023-04-03 8:28 ` David Hildenbrand
[not found] ` <CGME20230405020916epcas2p24cf04f5354c12632eba50b64b217e403@epcas2p2.samsung.com>
2023-04-05 2:09 ` Kyungsan Kim
[not found] ` <CGME20230331113147epcas2p12655777fec6839f7070ffcc446e3581b@epcas2p1.samsung.com>
2023-03-31 11:31 ` RE: RE(3): " Kyungsan Kim
2023-03-24 0:41 ` RE(2): " Huang, Ying
[not found] ` <CGME20230324084808epcas2p354865d38dccddcb5cd46b17610345a5f@epcas2p3.samsung.com>
2023-03-24 8:48 ` RE(4): " Kyungsan Kim
2023-03-24 13:46 ` Gregory Price
[not found] ` <CGME20230331113417epcas2p20a886e1712dbdb1f8eec03a2ac0a47e2@epcas2p2.samsung.com>
2023-03-31 11:34 ` Kyungsan Kim
2023-03-31 15:53 ` Gregory Price
[not found] ` <CGME20230405020257epcas2p11b253f8c97a353890b96e6ae6eb515d3@epcas2p1.samsung.com>
2023-04-05 2:02 ` Kyungsan Kim
2023-03-24 14:55 ` RE(2): " Matthew Wilcox
2023-03-24 17:49 ` Matthew Wilcox
[not found] ` <CGME20230331113715epcas2p13127b95af4000ec1ed96a2e9d89b7444@epcas2p1.samsung.com>
2023-03-31 11:37 ` Kyungsan Kim
2023-03-31 12:54 ` Matthew Wilcox
[not found] ` <CGME20230405020027epcas2p4682d43446a493385b60c39a1dbbf07d6@epcas2p4.samsung.com>
2023-04-05 2:00 ` Kyungsan Kim
2023-04-05 4:48 ` Dan Williams
2023-04-05 18:12 ` Matthew Wilcox
2023-04-05 19:42 ` Dan Williams
2023-04-06 12:27 ` David Hildenbrand
[not found] ` <CGME20230407093007epcas2p32addf5da24110c3e45c90a15dcde0d01@epcas2p3.samsung.com>
2023-04-07 9:30 ` Kyungsan Kim [this message]
[not found] ` <CGME20230331113845epcas2p313118617918ae2bf634c3c475fc5dbd8@epcas2p3.samsung.com>
2023-03-31 11:38 ` Re: RE(2): " Kyungsan Kim
2023-03-26 7:21 ` Mike Rapoport
2023-03-30 22:03 ` Dragan Stancevic
2023-04-03 8:44 ` Mike Rapoport
2023-04-04 4:27 ` Dragan Stancevic
2023-04-04 6:47 ` Huang, Ying
2023-04-06 22:27 ` Dragan Stancevic
2023-04-07 0:58 ` Huang, Ying
[not found] ` <CGME20230407092950epcas2p12bc20c2952a800cf3f4f1d0b695f67e2@epcas2p1.samsung.com>
2023-04-07 9:29 ` Kyungsan Kim
2023-04-07 14:35 ` Dragan Stancevic
[not found] ` <CGME20230405101840epcas2p4c92037ceba77dfe963d24791a9058450@epcas2p4.samsung.com>
2023-04-05 10:18 ` Kyungsan Kim
[not found] ` <CGME20230331114526epcas2p2b6f1d4c8c1c0b2e3c12a425b6e48c0d8@epcas2p2.samsung.com>
2023-03-31 11:45 ` RE: RE(2): " Kyungsan Kim
2023-04-04 8:31 ` Mike Rapoport
2023-04-04 17:58 ` Adam Manzanares
2023-04-01 10:51 ` Gregory Price
2023-04-04 18:59 ` [External] " Viacheslav A.Dubeyko
2023-04-01 11:51 ` Gregory Price
2023-04-04 21:09 ` Viacheslav A.Dubeyko
2023-04-04 23:51 ` Dan Williams
2023-04-05 2:34 ` Gregory Price
[not found] ` <CGME20230405101843epcas2p2c819c8d60b2a9a776124c2b4bc25af14@epcas2p2.samsung.com>
2023-04-05 10:18 ` Kyungsan Kim
2023-03-30 22:02 ` Dragan Stancevic
[not found] ` <CGME20230331114649epcas2p23d52cd1d224085e6192a0aaf22948e3e@epcas2p2.samsung.com>
2023-03-31 11:46 ` Kyungsan Kim
[not found] ` <CGME20230414084120epcas2p37f105901350410772a3115a5a490c215@epcas2p3.samsung.com>
2023-04-14 8:41 ` FW: " Kyungsan Kim
2023-05-09 18:45 ` MTK
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230407093007.420852-1-ks0204.kim@samsung.com \
--to=ks0204.kim@samsung.com \
--cc=a.manzanares@samsung.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=seungjun.ha@samsung.com \
--cc=viacheslav.dubeyko@bytedance.com \
--cc=wj28.lee@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox