From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51DB1C636D6 for ; Tue, 21 Feb 2023 01:41:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E2C06B0071; Mon, 20 Feb 2023 20:41:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 693376B0072; Mon, 20 Feb 2023 20:41:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5337C6B0073; Mon, 20 Feb 2023 20:41:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3DE256B0071 for ; Mon, 20 Feb 2023 20:41:23 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 101DC80BE2 for ; Tue, 21 Feb 2023 01:41:23 +0000 (UTC) X-FDA: 80489596446.04.6A88F6F Received: from mailout4.samsung.com (mailout4.samsung.com [203.254.224.34]) by imf20.hostedemail.com (Postfix) with ESMTP id D4A911C0005 for ; Tue, 21 Feb 2023 01:41:19 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=WccSlifx; spf=pass (imf20.hostedemail.com: domain of ks0204.kim@samsung.com designates 203.254.224.34 as permitted sender) smtp.mailfrom=ks0204.kim@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676943680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=WEUybv9VWfF96kQxDc8qrTU6bMD4lvn/2S8D+4YxiQI=; b=qs+M5pQcIcw6+Cap4BmVCP4ZfputVY9u7yjGVHXdnzpQ6JhwIVRxv0Ar+iJqAoALocU9XU HxAW8L6HZB10rFylnO1IsI+Ku55iJ6ccwlVhZKWZ2addL/ZFusQQon9y20Q5KSb1DyBkV0 x9KdzRvayyFx4Ac8zXCGIUA77K7R6hI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=WccSlifx; spf=pass (imf20.hostedemail.com: domain of ks0204.kim@samsung.com designates 203.254.224.34 as permitted sender) smtp.mailfrom=ks0204.kim@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676943680; a=rsa-sha256; cv=none; b=NoHgCcE9uwoA0VEZF7S25+h5VFQJeQTdXDSH62XDlnq0iDLOPoOBW0qyb5qnXgJBB5AnRR K3NBr85M3QAyBayq1GEbxqR+K1ivKEpGrd8o4ASivuUF8huMWZsjOKTfTZhIMACP8kLUP3 3224Uuv7mHaww1mgSlc/a/vXOTZJ2kw= Received: from epcas2p1.samsung.com (unknown [182.195.41.53]) by mailout4.samsung.com (KnoxPortal) with ESMTP id 20230221014116epoutp04c51e5bca2c47ea3e749e668acb5bd2b8~Fs2OGVk8x0037100371epoutp04e for ; Tue, 21 Feb 2023 01:41:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout4.samsung.com 20230221014116epoutp04c51e5bca2c47ea3e749e668acb5bd2b8~Fs2OGVk8x0037100371epoutp04e DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1676943676; bh=WEUybv9VWfF96kQxDc8qrTU6bMD4lvn/2S8D+4YxiQI=; h=From:To:Cc:Subject:Date:References:From; b=WccSlifx2SuOHKTClB6CVHRYhtMKrRc5RZEM4GlrMpPVsQTjcFGIsBVrb1V6/Y9dh DngHYW5YJBsHimgXynJiAkyh96uklEDDDgHIpsv1vTadsxieiGBshPXmRCO9CQqiT9 npRGRpMywocaWAGq6dbnIfCYKkekgo88TCsVn1l0= Received: from epsnrtp4.localdomain (unknown [182.195.42.165]) by epcas2p4.samsung.com (KnoxPortal) with ESMTP id 20230221014116epcas2p40f6da9d19d902fb89020a7088b79294c~Fs2N0zIUA1922519225epcas2p4H; Tue, 21 Feb 2023 01:41:16 +0000 (GMT) Received: from epsmges2p1.samsung.com (unknown [182.195.36.88]) by epsnrtp4.localdomain (Postfix) with ESMTP id 4PLMSq35DTz4x9QG; Tue, 21 Feb 2023 01:41:15 +0000 (GMT) Received: from epcas2p4.samsung.com ( [182.195.41.56]) by epsmges2p1.samsung.com (Symantec Messaging Gateway) with SMTP id 2C.4F.61927.B3124F36; Tue, 21 Feb 2023 10:41:15 +0900 (KST) Received: from epsmtrp2.samsung.com (unknown [182.195.40.14]) by epcas2p1.samsung.com (KnoxPortal) with ESMTPA id 20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d~Fs2M1KaCc1783017830epcas2p1E; Tue, 21 Feb 2023 01:41:14 +0000 (GMT) Received: from epsmgms1p2.samsung.com (unknown [182.195.42.42]) by epsmtrp2.samsung.com (KnoxPortal) with ESMTP id 20230221014114epsmtrp2862f1fb4f6d4ba6ff25f634289578ae7~Fs2M0ZriV1365513655epsmtrp2f; Tue, 21 Feb 2023 01:41:14 +0000 (GMT) X-AuditID: b6c32a45-671ff7000001f1e7-44-63f4213b104c Received: from epsmtip2.samsung.com ( [182.195.34.31]) by epsmgms1p2.samsung.com (Symantec Messaging Gateway) with SMTP id B3.D0.17995.A3124F36; Tue, 21 Feb 2023 10:41:14 +0900 (KST) Received: from dell-Precision-7920-Tower.dsn.sec.samsung.com (unknown [10.229.83.99]) by epsmtip2.samsung.com (KnoxPortal) with ESMTPA id 20230221014114epsmtip2e349dbd81c9e2391a63e7ac21b148977~Fs2MmhEb80990509905epsmtip2I; Tue, 21 Feb 2023 01:41:14 +0000 (GMT) From: Kyungsan Kim To: lsf-pc@lists.linux-foundation.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org, a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com Subject: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL Date: Tue, 21 Feb 2023 10:41:14 +0900 Message-Id: <20230221014114.64888-1-ks0204.kim@samsung.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrFKsWRmVeSWpSXmKPExsWy7bCmha614pdkgy0X+SymH1a0OD/rFIvF nr0nWSzurfnParHv9V5mi44Nbxgd2Dz+nVjD5rHp0yR2j8k3ljN69G1ZxejxeZNcAGtUtk1G amJKapFCal5yfkpmXrqtkndwvHO8qZmBoa6hpYW5kkJeYm6qrZKLT4CuW2YO0AFKCmWJOaVA oYDE4mIlfTubovzSklSFjPziElul1IKUnALzAr3ixNzi0rx0vbzUEitDAwMjU6DChOyM52sm sRb8Vak4u2cRUwPjRvkuRk4OCQETiQl7FrJ3MXJxCAnsYJRo2zCXDcL5xCixpGUBlPOZUeLb 7AOMMC1NE68zQiR2MUq8nX6QBcLpYpLoON7JClLFJqAt8efKeTYQW0RAVeLv+iNgRcwCExgl 5na2sYAkhAVsJJ59agCzWYCK2t6vZQKxeQWsJWb97GGDWCcvMfPSd3aIuKDEyZlPwOqZgeLN W2czgwyVEDjGLrHi6wJ2iAYXibWtz6CahSVeHd8CFZeS+PxuL1S8WOLx639Q8RKJw0t+s0DY xhLvbj4H+oADaIGmxPpd+iCmhICyxJFbUGv5JDoO/2WHCPNKdLQJQTSqSGz/t5wZZtHp/Zug hntIHNl5BGypkECsRN+lfuYJjPKzkDwzC8kzsxD2LmBkXsUollpQnJueWmxUYAiP1eT83E2M 4ESo5bqDcfLbD3qHGJk4GA8xSnAwK4nw/uf9nCzEm5JYWZValB9fVJqTWnyI0RQYvBOZpUST 84GpOK8k3tDE0sDEzMzQ3MjUwFxJnFfa9mSykEB6YklqdmpqQWoRTB8TB6dUA9PcRY3MX19e KqgrbVY683zpC9bc33XuX8U2vOO23l3U+dyv6ORxCfFW79RXBrJyh3qFt60vdPjZkbfxwCSH dOnA6Xf1Dn1KyTLn4Nxx2ejc92WXvOu/be++O7Mx/JFxrs7jB9qOD48I2zw5a79v9cf3b+qO bK2QLDLLLe65GJH6IFY2S9sz0odNaI5cZ8p8j9LEY6nlByZYr3zLKOUrtuHv1aPOF00LPXYv flnSWr5F7UHjgoUsBiqfxV87vfhqcPjcoRi9s1djAjNmnVYouXF/0qX2mXcm2ey+OOdHvs6q YyUFgVlOvhuNll2IbDvaZLLfqjiw03mCvtwnhqXJFZzufPNnZWq/4n+nvsEkYpcSS3FGoqEW c1FxIgDj0q+pDQQAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrFLMWRmVeSWpSXmKPExsWy7bCSvK6V4pdkgymX9S2mH1a0OD/rFIvF nr0nWSzurfnParHv9V5mi44Nbxgd2Dz+nVjD5rHp0yR2j8k3ljN69G1ZxejxeZNcAGsUl01K ak5mWWqRvl0CV8bzNZNYC/6qVJzds4ipgXGjfBcjJ4eEgIlE08TrjF2MXBxCAjsYJXa2LWGC SEhJvD/dxg5hC0vcbznCClHUwSTx5McWNpAEm4C2xJ8r58FsEQFVib/rj7CAFDELTGGU2L78 ENgkYQEbiWefGlhAbBagorb3a8HivALWErN+9rBBbJCXmHnpOztEXFDi5MwnYPXMQPHmrbOZ JzDyzUKSmoUktYCRaRWjZGpBcW56brFhgVFearlecWJucWleul5yfu4mRnBwamntYNyz6oPe IUYmDsZDjBIczEoivP95PycL8aYkVlalFuXHF5XmpBYfYpTmYFES573QdTJeSCA9sSQ1OzW1 ILUIJsvEwSnVwNTN/2K32bZHD5817Zu74M+rnZ/D/LInPOhRZtvmIZxqLW4fdm/K/M8b5R/m XKn3ZWp+9Flr/YVEQXnBTYo+3huOsir/+L1F8vTF0OWiC75bp51qLp6+xnKtVLD01Z8CdlVb ag78f99c9TT+w7kPps2cHE8F8n32B7/YLtqxbLuL0oRWY7XLjzKPzmt+fS3gx1ad4jPMLC3X NmafSY65sNfopWLxSs8goag2u2jjJtvTL4uM/0/UOvGlcWLkt5MOjw5ZG659d7VAaveVMrHS fL9vD0ObhEvNSzLtkgo4PJL0T+uu93EzyaqL2XJiFSPHwr0ltl5HInzY4t5bzLx2yK7C8F+7 5cw9lRs/7/zhxa7EUpyRaKjFXFScCADV6dS+vQIAAA== X-CMS-MailID: 20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-Sendblock-Type: AUTO_CONFIDENTIAL CMS-TYPE: 102P DLP-Filter: Pass X-CFilter-Loop: Reflected X-CMS-RootMailID: 20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d References: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D4A911C0005 X-Stat-Signature: w7usxpiisb54ztx7b5sdo43w64scjx7c X-Rspam-User: X-HE-Tag: 1676943679-796641 X-HE-Meta: U2FsdGVkX18O5VPORF4HbeTr55Y4i2ZXxSI8y7EdiuOSBNjM6EFXIcUGKiVWviFWSOTb8Y21NId02JFvKneciMgN8mzncFnuzGidTIFy8yOJM+jGd55kynQZI+7/+/1CdFRi7PnNsOkWvnxigCrpPsSLoD5z8BDdRYK3DXunHbe7cshnc0ZbvA+0sUDKx+X2X49bvZFtzJgOzpSFqR76wZ+M2RFgfg/SNcbAqMPytQpVgNBYo1rIhPVBaupbCcrJCH6SXz/j8LJmiQ0PlqPbOmJbDhSN469NpunVvbZ0oy2Kmi46Mp+8wB37NvSZyHMovTnMANlfcVGd0gFSb1RqPlwF94A1xsWzqRLBvPL9PFlNOj+r4TDRzNZucZvVphelAAoh/aLEOZCAes5aMQx4t/V9ErlmOyb/YeQBSfPQZpe0tjuL9DH0fOGXdbuYbct/jbNdBLFEODnjr5TeTCrSj71gQCM8xhK+CX8UtbgIWE93PNW4bIVCzgh77z9PDcCijpgwuE7ClRYVSjQYcEtZrcf4cp2i/dXFtB86xUtyQrBjwnRfh5y8dk3yP+sEw/GPvMbjQKuRW/MBgs14+tbHFnstcA5cYeou+aiaO4NXqvNbO84Jp64dfeIjhHt7KUWGmeOyzPl+9FUH1iZBPSrCG7LJG5PdPvp8EuHYYBvjoJP9PUNiWqWuvuLImgq6a3HfVlKv1/rhDXm0zsbjUxYv1m9S6ntLRIrCaHCAwGUP+ZJL1H4kPzKxGG6YkX+lhoCopWv2adQf/dKyyuzff+kvUuO2POWdquzb1PWx9pH8vQ1A1B30hTrY6MHABFq/i2Ubky1zW0CBjeQBIhXUBr+1AdgR4C7U/7v4O5WoXtDOyD6YzeE/2t27As71fzUhXAdlJ3FlQxWOInir0L4i09vA09TW54yMZez9+4SlYgbnMNODDOpncAC2zdrxxF/qyvqnbbtZTJZp/z2WLjVbs0+ 8wPwoc9Z S4jFF8R8NACC+0kzmb8ePROC/sMz1dXCgpUjlEetLDGt3mvouyyaQwgCfAZqOfBxXHsovwH2L113P+/NuaTEZT2fFnhniinOd4AQbNYceRWV0gPM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CXL is a promising technology that leads to fundamental changes in computing architecture. To facilitate adoption and widespread of CXL memory, we are developing a memory tiering solution, called SMDK[1][2]. Using SMDK and CXL RAM device, our team has been working with industry and academic partners over last year. Also, thanks to many researcher's effort, CXL adoption stage is gradually moving forward from basic enablement to real-world composite usecases. At this moment, based on the researches and experiences gained working on SMDK, we would like to suggest a session at LSF/MM/BFP this year to propose possible Linux MM changes with a brief of SMDK. Adam Manzanares kindly adviced me that it is preferred to discuss implementation details on given problem and consensus at LSF/MM/BFP. Considering the adoption stage of CXL technology, however, let me suggest a design level discussion on the two MM expansions of SMDK this year. When we have design consensus with participants, we want to continue follow-up discussions with additional implementation details, hopefully. 1. A new zone, ZONE_EXMEM We added ZONE_EXMEM to manage CXL RAM device(s), separated from ZONE_NORMAL for usual DRAM due to the three reasons below. 1) a CXL RAM has many different characteristics with conventional DRAM because a CXL device inherits and expands PCIe specification. ex) frequency range, pluggability, link speed/width negotiation, host/device flow control, power throttling, channel-interleaving methodology, error handling, and etc. It is likely that the primary usecase of CXL RAM would be System RAM. However, to deal with the hardware differences properly, different MM algorithms are needed accordingly. 2) Historically, zone has been expanded by reflecting the evolution of CPU, IO, and memory devices. ex) ZONE_DMA(32), ZONE_HIGHMEM, ZONE_DEVICE, and ZONE_MOVABLE. Each zone applies different MM algorithms such as page reclaim, compaction, migration, and fragmentation. At first, we tried reuse of existing zones, ZONE_DEVICE and ZONE_MOVABLE, for CXL RAM purpose. However, the purpose and implementation of the zones are not fit for CXL RAM. 3) Industry is preparing a CXL-capable system that connects dozens of CXL devices in a server system. When a CXL device becomes a separate node, an administrator/programmer needs to be aware of and manually control all nodes using 3rd party software, such as numactl and libnuma. ZONE_EXMEM allows the assemble of CXL RAM devices into the single ZONE_EXMEM zone, and provides an abstraction to userspace by seamlessly managing the devices. Also, the zone is able to interleave assembled devices in a software way to lead to aggregated bandwidth. We would like to suggest if it is co-existable with HW interleaving like SW/HW raid0. To help understanding, please refer to the node partition part of the picture[3]. 2. User/Kernelspace Programmable Interface In terms of a memory tiering solution, it is typical that the solution attempts to locate hot data on near memory, and cold data on far memory as accurately as possible.[4][5][6][7] We noticed that the hot/coldness of data is determined by the memory access pattern of running application and/or kernel context. Hence, a running context needs a near/far memory identifier to determine near/far memory. When CXL RAM(s) is manipulated as a NUMA node, a node id can be function as a CXL identifier more or less. However, the node id has limitation in that it is an ephemeral information that dynamically varies according to online status of CXL topology and system socket. In this sense, we provides programmable interfaces for userspace and kernelspace context to explicitly (de)allocate memory from DRAM and CXL RAM regardless of a system change. Specifically, MAP_EXMEM and GFP_EXMEM flags were added to mmap() syscall and kmalloc() siblings, respectively. Thanks to Adam Manzanares for reviewing this CFP thoroughly. [1]SMDK: https://github.com/openMPDK/SMDK [2]SMT: Software-defined Memory Tiering for Heterogeneous Computing systems with CXL Memory Expander, https://ieeexplore.ieee.org/document/10032695 [3]SMDK node partition: https://github.com/OpenMPDK/SMDK/wiki/2.-SMDK-Architecture#memory-partition [4]TMO: Transparent Memory Offloading in Datacenters, https://dl.acm.org/doi/10.1145/3503222.3507731 [5]TPP: Transparent Page Placement for CXL-Enabled Tiered Memory, https://arxiv.org/abs/2206.02878 [6]Pond: CXL-Based Memory Pooling Systems for Cloud Platforms, https://dl.acm.org/doi/10.1145/3575693.3578835 [7]Hierarchical NUMA: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf