From: Kyungsan Kim <ks0204.kim@samsung.com>
To: ying.huang@intel.com
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org,
a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com,
dan.j.williams@intel.com
Subject: RE: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL
Date: Wed, 22 Mar 2023 13:33:53 +0900 [thread overview]
Message-ID: <20230322043353.143487-1-ks0204.kim@samsung.com> (raw)
In-Reply-To: <87y1oe74g5.fsf@yhuang6-desk2.ccr.corp.intel.com>
Hi Huang Ying,
I apologize late reply for personal schedule.
Thank you for sharing your viewpoint and the information.
>Hi, Kyungsan,
>
>Kyungsan Kim <ks0204.kim@samsung.com> writes:
>
>> CXL is a promising technology that leads to fundamental changes in computing architecture.
>> To facilitate adoption and widespread of CXL memory, we are developing a memory tiering solution, called SMDK[1][2].
>> Using SMDK and CXL RAM device, our team has been working with industry and academic partners over last year.
>> Also, thanks to many researcher's effort, CXL adoption stage is gradually moving forward from basic enablement to real-world composite usecases.
>> At this moment, based on the researches and experiences gained working on SMDK, we would like to suggest a session at LSF/MM/BFP this year
>> to propose possible Linux MM changes with a brief of SMDK.
>>
>> Adam Manzanares kindly adviced me that it is preferred to discuss implementation details on given problem and consensus at LSF/MM/BFP.
>> Considering the adoption stage of CXL technology, however, let me suggest a design level discussion on the two MM expansions of SMDK this year.
>> When we have design consensus with participants, we want to continue follow-up discussions with additional implementation details, hopefully.
>>
>>
>> 1. A new zone, ZONE_EXMEM
>> We added ZONE_EXMEM to manage CXL RAM device(s), separated from ZONE_NORMAL for usual DRAM due to the three reasons below.
>>
>> 1) a CXL RAM has many different characteristics with conventional DRAM because a CXL device inherits and expands PCIe specification.
>> ex) frequency range, pluggability, link speed/width negotiation, host/device flow control, power throttling, channel-interleaving methodology, error handling, and etc.
>> It is likely that the primary usecase of CXL RAM would be System RAM.
>> However, to deal with the hardware differences properly, different MM algorithms are needed accordingly.
>>
>> 2) Historically, zone has been expanded by reflecting the evolution of CPU, IO, and memory devices.
>> ex) ZONE_DMA(32), ZONE_HIGHMEM, ZONE_DEVICE, and ZONE_MOVABLE.
>> Each zone applies different MM algorithms such as page reclaim, compaction, migration, and fragmentation.
>> At first, we tried reuse of existing zones, ZONE_DEVICE and ZONE_MOVABLE, for CXL RAM purpose.
>> However, the purpose and implementation of the zones are not fit for CXL RAM.
>>
>> 3) Industry is preparing a CXL-capable system that connects dozens of CXL devices in a server system.
>> When a CXL device becomes a separate node, an administrator/programmer needs to be aware of and manually control all nodes using 3rd party software, such as numactl and libnuma.
>> ZONE_EXMEM allows the assemble of CXL RAM devices into the single ZONE_EXMEM zone, and provides an abstraction to userspace by seamlessly managing the devices.
>> Also, the zone is able to interleave assembled devices in a software way to lead to aggregated bandwidth.
>> We would like to suggest if it is co-existable with HW interleaving like SW/HW raid0.
>> To help understanding, please refer to the node partition part of the picture[3].
>
>In addition to CXL memory, we may have other kind of memory in the
>system, for example, HBM (High Bandwidth Memory), memory in FPGA card,
>memory in GPU card, etc. I guess that we need to consider them
>together. Do we need to add one zone type for each kind of memory?
We also don't think a new zone is needed for every single memory device.
Our viewpoint is the sole ZONE_NORMAL becomes not enough to manage multiple volatile memory devices due to the increased device types.
Including CXL DRAM, we think the ZONE_EXMEM can be used to represent extended volatile memories that have different HW characteristics.
>
>>
>> 2. User/Kernelspace Programmable Interface
>> In terms of a memory tiering solution, it is typical that the solution attempts to locate hot data on near memory, and cold data on far memory as accurately as possible.[4][5][6][7]
>> We noticed that the hot/coldness of data is determined by the memory access pattern of running application and/or kernel context.
>> Hence, a running context needs a near/far memory identifier to determine near/far memory.
>> When CXL RAM(s) is manipulated as a NUMA node, a node id can be function as a CXL identifier more or less.
>> However, the node id has limitation in that it is an ephemeral information that dynamically varies according to online status of CXL topology and system socket.
>> In this sense, we provides programmable interfaces for userspace and kernelspace context to explicitly (de)allocate memory from DRAM and CXL RAM regardless of a system change.
>> Specifically, MAP_EXMEM and GFP_EXMEM flags were added to mmap() syscall and kmalloc() siblings, respectively.
>
>In addition to NUMA node, we have defined the following interfaces to
>expose information about different kind of memory in the system.
>
>https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html#abi-sys-devices-virtual-memory-tiering
>
>Best Regards,
>Huang, Ying
The sysfs looks useful to prioritize a group of fast/slow memory-node using a list of node id.
We would say it is collaborative with the programmable interfaces we suggested.
User/Kernel context (MAP_EXMEM/GFP_EXMEM)
|
---------------------------------------------
| |
[sysfs/memory_tier0 - DDR Node list] [sysfs/memory_tier1 - CXL Node list]
>
>> Thanks to Adam Manzanares for reviewing this CFP thoroughly.
>>
>>
>> [1]SMDK: https://github.com/openMPDK/SMDK
>> [2]SMT: Software-defined Memory Tiering for Heterogeneous Computing systems with CXL Memory Expander, https://ieeexplore.ieee.org/document/10032695
>> [3]SMDK node partition: https://github.com/OpenMPDK/SMDK/wiki/2.-SMDK-Architecture#memory-partition
>> [4]TMO: Transparent Memory Offloading in Datacenters, https://dl.acm.org/doi/10.1145/3503222.3507731
>> [5]TPP: Transparent Page Placement for CXL-Enabled Tiered Memory, https://arxiv.org/abs/2206.02878
>> [6]Pond: CXL-Based Memory Pooling Systems for Cloud Platforms, https://dl.acm.org/doi/10.1145/3575693.3578835
>> [7]Hierarchical NUMA: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf
next prev parent reply other threads:[~2023-03-22 4:34 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d@epcas2p1.samsung.com>
2023-02-21 1:41 ` Kyungsan Kim
2023-02-27 23:14 ` Dan Williams
[not found] ` <CGME20230228043551epcas2p3085444899b00b106c2901e1f51814d2c@epcas2p3.samsung.com>
2023-02-28 4:35 ` Kyungsan Kim
2023-03-03 6:07 ` Huang, Ying
[not found] ` <CGME20230322043354epcas2p2227bcad190a470d635b92f92587dc69e@epcas2p2.samsung.com>
2023-03-22 4:33 ` Kyungsan Kim [this message]
2023-03-22 22:03 ` FW: " Dan Williams
[not found] ` <CGME20230323105106epcas2p39ea8de619622376a4698db425c6a6fb3@epcas2p3.samsung.com>
2023-03-23 10:51 ` RE(2): " Kyungsan Kim
2023-03-23 12:25 ` David Hildenbrand
[not found] ` <CGME20230324090923epcas2p2710ba4dc8157f9141c03104cf66e9d26@epcas2p2.samsung.com>
2023-03-24 9:09 ` RE(4): " Kyungsan Kim
2023-03-24 9:12 ` David Hildenbrand
[not found] ` <CGME20230324092731epcas2p315c348bd76ef9fc84bffdb158e4c1aa4@epcas2p3.samsung.com>
2023-03-24 9:27 ` RE(2): " Kyungsan Kim
2023-03-24 9:30 ` David Hildenbrand
[not found] ` <CGME20230324095031epcas2p284095ae90b25a47360b5098478dffdaa@epcas2p2.samsung.com>
2023-03-24 9:50 ` RE(3): " Kyungsan Kim
2023-03-24 13:08 ` Jørgen Hansen
2023-03-24 22:33 ` David Hildenbrand
[not found] ` <CGME20230331114220epcas2p2d5734efcbdd8956f861f8e7178cd5288@epcas2p2.samsung.com>
2023-03-31 11:42 ` Kyungsan Kim
2023-03-31 13:42 ` Matthew Wilcox
2023-03-31 15:56 ` Frank van der Linden
2023-04-03 8:34 ` David Hildenbrand
[not found] ` <CGME20230405021655epcas2p2364b1f56dcde629bbd05bc796c2896aa@epcas2p2.samsung.com>
2023-04-05 2:16 ` Kyungsan Kim
[not found] ` <CGME20230405020631epcas2p1c85058b28a70bbd46d587e78a9c9c7ad@epcas2p1.samsung.com>
2023-04-05 2:06 ` Re: " Kyungsan Kim
2023-04-05 5:00 ` Dan Williams
[not found] ` <CGME20230405020121epcas2p2d9d39c151b6c5ab9e568ab9e2ab826ce@epcas2p2.samsung.com>
2023-04-05 2:01 ` Kyungsan Kim
2023-04-05 3:11 ` Matthew Wilcox
2023-04-03 8:28 ` David Hildenbrand
[not found] ` <CGME20230405020916epcas2p24cf04f5354c12632eba50b64b217e403@epcas2p2.samsung.com>
2023-04-05 2:09 ` Kyungsan Kim
[not found] ` <CGME20230331113147epcas2p12655777fec6839f7070ffcc446e3581b@epcas2p1.samsung.com>
2023-03-31 11:31 ` RE: RE(3): " Kyungsan Kim
2023-03-24 0:41 ` RE(2): " Huang, Ying
[not found] ` <CGME20230324084808epcas2p354865d38dccddcb5cd46b17610345a5f@epcas2p3.samsung.com>
2023-03-24 8:48 ` RE(4): " Kyungsan Kim
2023-03-24 13:46 ` Gregory Price
[not found] ` <CGME20230331113417epcas2p20a886e1712dbdb1f8eec03a2ac0a47e2@epcas2p2.samsung.com>
2023-03-31 11:34 ` Kyungsan Kim
2023-03-31 15:53 ` Gregory Price
[not found] ` <CGME20230405020257epcas2p11b253f8c97a353890b96e6ae6eb515d3@epcas2p1.samsung.com>
2023-04-05 2:02 ` Kyungsan Kim
2023-03-24 14:55 ` RE(2): " Matthew Wilcox
2023-03-24 17:49 ` Matthew Wilcox
[not found] ` <CGME20230331113715epcas2p13127b95af4000ec1ed96a2e9d89b7444@epcas2p1.samsung.com>
2023-03-31 11:37 ` Kyungsan Kim
2023-03-31 12:54 ` Matthew Wilcox
[not found] ` <CGME20230405020027epcas2p4682d43446a493385b60c39a1dbbf07d6@epcas2p4.samsung.com>
2023-04-05 2:00 ` Kyungsan Kim
2023-04-05 4:48 ` Dan Williams
2023-04-05 18:12 ` Matthew Wilcox
2023-04-05 19:42 ` Dan Williams
2023-04-06 12:27 ` David Hildenbrand
[not found] ` <CGME20230407093007epcas2p32addf5da24110c3e45c90a15dcde0d01@epcas2p3.samsung.com>
2023-04-07 9:30 ` Kyungsan Kim
[not found] ` <CGME20230331113845epcas2p313118617918ae2bf634c3c475fc5dbd8@epcas2p3.samsung.com>
2023-03-31 11:38 ` Re: RE(2): " Kyungsan Kim
2023-03-26 7:21 ` Mike Rapoport
2023-03-30 22:03 ` Dragan Stancevic
2023-04-03 8:44 ` Mike Rapoport
2023-04-04 4:27 ` Dragan Stancevic
2023-04-04 6:47 ` Huang, Ying
2023-04-06 22:27 ` Dragan Stancevic
2023-04-07 0:58 ` Huang, Ying
[not found] ` <CGME20230407092950epcas2p12bc20c2952a800cf3f4f1d0b695f67e2@epcas2p1.samsung.com>
2023-04-07 9:29 ` Kyungsan Kim
2023-04-07 14:35 ` Dragan Stancevic
[not found] ` <CGME20230405101840epcas2p4c92037ceba77dfe963d24791a9058450@epcas2p4.samsung.com>
2023-04-05 10:18 ` Kyungsan Kim
[not found] ` <CGME20230331114526epcas2p2b6f1d4c8c1c0b2e3c12a425b6e48c0d8@epcas2p2.samsung.com>
2023-03-31 11:45 ` RE: RE(2): " Kyungsan Kim
2023-04-04 8:31 ` Mike Rapoport
2023-04-04 17:58 ` Adam Manzanares
2023-04-01 10:51 ` Gregory Price
2023-04-04 18:59 ` [External] " Viacheslav A.Dubeyko
2023-04-01 11:51 ` Gregory Price
2023-04-04 21:09 ` Viacheslav A.Dubeyko
2023-04-04 23:51 ` Dan Williams
2023-04-05 2:34 ` Gregory Price
[not found] ` <CGME20230405101843epcas2p2c819c8d60b2a9a776124c2b4bc25af14@epcas2p2.samsung.com>
2023-04-05 10:18 ` Kyungsan Kim
2023-03-30 22:02 ` Dragan Stancevic
[not found] ` <CGME20230331114649epcas2p23d52cd1d224085e6192a0aaf22948e3e@epcas2p2.samsung.com>
2023-03-31 11:46 ` Kyungsan Kim
[not found] ` <CGME20230414084120epcas2p37f105901350410772a3115a5a490c215@epcas2p3.samsung.com>
2023-04-14 8:41 ` FW: " Kyungsan Kim
2023-05-09 18:45 ` MTK
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230322043353.143487-1-ks0204.kim@samsung.com \
--to=ks0204.kim@samsung.com \
--cc=a.manzanares@samsung.com \
--cc=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=viacheslav.dubeyko@bytedance.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox