From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3958C6FD18 for ; Fri, 31 Mar 2023 11:31:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1DB1A6B0075; Fri, 31 Mar 2023 07:31:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 18AEB6B0078; Fri, 31 Mar 2023 07:31:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0049E6B007D; Fri, 31 Mar 2023 07:31:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E7C326B0075 for ; Fri, 31 Mar 2023 07:31:58 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B745740F52 for ; Fri, 31 Mar 2023 11:31:58 +0000 (UTC) X-FDA: 80628979116.11.C3E9BA6 Received: from mailout3.samsung.com (mailout3.samsung.com [203.254.224.33]) by imf21.hostedemail.com (Postfix) with ESMTP id 318211C001D for ; Fri, 31 Mar 2023 11:31:53 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=SErZU8oM; spf=pass (imf21.hostedemail.com: domain of ks0204.kim@samsung.com designates 203.254.224.33 as permitted sender) smtp.mailfrom=ks0204.kim@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680262315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Oitv5yopQeA0fexutVY/k5ElDF3b1N+jbldLHWxDDa0=; b=kLyB90maiiMcZuTY1JjXLaPvalpI8K7CNj5qQwId7sI9E5aSAA3bQT1tirf1XYHRfvlkiQ L2f/ZiZyV3UaIOCq67QAiLiEa1AP658ryqTaF9KbSZ7aoKcdEEa667F5eheduw6Uwdr0HS mD1zHYagAqAyr3ESpsm9iet8pz+H5oE= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=SErZU8oM; spf=pass (imf21.hostedemail.com: domain of ks0204.kim@samsung.com designates 203.254.224.33 as permitted sender) smtp.mailfrom=ks0204.kim@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680262315; a=rsa-sha256; cv=none; b=IBtJcom6JTE2/btfF/jJo2Ui9Z5Ablvr5uBsfT1khYB+r3TaKpFqXgNN2alY4VuCvYAbdU nm8R7Xl0URaq2f71eNcP7bBCu3lqxKr4W0fnZ5t7nEpixzPgwNDHY2n1+P8KQ922WgbJFH DBnkCnaHhvlL9wYpydKIhMp9Pn0kJOI= Received: from epcas2p2.samsung.com (unknown [182.195.41.54]) by mailout3.samsung.com (KnoxPortal) with ESMTP id 20230331113150epoutp03e67e66aaf37671867e618acbcc6298ee~Rfas3CyiS0482704827epoutp03F for ; Fri, 31 Mar 2023 11:31:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout3.samsung.com 20230331113150epoutp03e67e66aaf37671867e618acbcc6298ee~Rfas3CyiS0482704827epoutp03F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1680262310; bh=Oitv5yopQeA0fexutVY/k5ElDF3b1N+jbldLHWxDDa0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SErZU8oMV4DteO0+F3ngwOSgoIoteORab5EoDFYcQIe5S4LcNSX9dSMFRzk3rZU7I oXlxUWcStboY2oiJ6vGD6MhoAD/fPCSTz5mNqpMYD1zpEw2qLOzc5WpePCoOml0z/j nROelnV7fsH9iw+R9bwL5SuBytsddvUkRW0u9Mbw= Received: from epsnrtp2.localdomain (unknown [182.195.42.163]) by epcas2p1.samsung.com (KnoxPortal) with ESMTP id 20230331113149epcas2p1a1d7e0504a92e1610ef96b5c22961401~Rfar1w9OE2337823378epcas2p1Q; Fri, 31 Mar 2023 11:31:49 +0000 (GMT) Received: from epsmges2p1.samsung.com (unknown [182.195.36.100]) by epsnrtp2.localdomain (Postfix) with ESMTP id 4Pnymh48Pqz4x9Pw; Fri, 31 Mar 2023 11:31:48 +0000 (GMT) Received: from epcas2p3.samsung.com ( [182.195.41.55]) by epsmges2p1.samsung.com (Symantec Messaging Gateway) with SMTP id 73.4B.61927.4A4C6246; Fri, 31 Mar 2023 20:31:48 +0900 (KST) Received: from epsmtrp1.samsung.com (unknown [182.195.40.13]) by epcas2p1.samsung.com (KnoxPortal) with ESMTPA id 20230331113147epcas2p12655777fec6839f7070ffcc446e3581b~Rfaqm7BZh2804528045epcas2p1y; Fri, 31 Mar 2023 11:31:47 +0000 (GMT) Received: from epsmgms1p1new.samsung.com (unknown [182.195.42.41]) by epsmtrp1.samsung.com (KnoxPortal) with ESMTP id 20230331113147epsmtrp1504753b15de5156965351f4ad565d568~Rfaql71w70592105921epsmtrp11; Fri, 31 Mar 2023 11:31:47 +0000 (GMT) X-AuditID: b6c32a45-671ff7000001f1e7-b7-6426c4a470fb Received: from epsmtip2.samsung.com ( [182.195.34.31]) by epsmgms1p1new.samsung.com (Symantec Messaging Gateway) with SMTP id B5.45.18071.3A4C6246; Fri, 31 Mar 2023 20:31:47 +0900 (KST) Received: from dell-Precision-7920-Tower.dsn.sec.samsung.com (unknown [10.229.83.99]) by epsmtip2.samsung.com (KnoxPortal) with ESMTPA id 20230331113147epsmtip274ef97950857f554f2bd32cc8a1511b7~RfaqZksUj0134601346epsmtip2H; Fri, 31 Mar 2023 11:31:47 +0000 (GMT) From: Kyungsan Kim To: Jorgen.Hansen@wdc.com Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org, a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com, dan.j.williams@intel.com, seungjun.ha@samsung.com, wj28.lee@samsung.com Subject: RE: RE: RE(3): FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL Date: Fri, 31 Mar 2023 20:31:47 +0900 Message-Id: <20230331113147.399972-1-ks0204.kim@samsung.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrEJsWRmVeSWpSXmKPExsWy7bCmue6SI2opBuc3SVtMP6xoMX3qBUaL vgmPmS3OzzrFYrFn70kWi3tr/rNa7Hu9l9niRedxJouODW8YLTbef8fmwOXx78QaNo/Fe14y eWz6NIndY/KN5YwefVtWMXp83iTn0X6gmymAPSrbJiM1MSW1SCE1Lzk/JTMv3VbJOzjeOd7U zMBQ19DSwlxJIS8xN9VWycUnQNctMwfoPCWFssScUqBQQGJxsZK+nU1RfmlJqkJGfnGJrVJq QUpOgXmBXnFibnFpXrpeXmqJlaGBgZEpUGFCdsb0bV+ZCy4bV1zb+I6xgXGTVhcjJ4eEgIlE z9b/zF2MXBxCAjsYJbY/uMoO4XxilHh9+wIThPONUeLlg62MMC1vl/xhhEjsZZRYu+oflNPF JNFw7AETSBWbgLbEnyvn2UBsEQFJiZUb1oDFmQX+MUrsuSwJYgsLBEq8uLOTBcRmEVCVaH5y lxXE5hWwkbg44TU7xDZ5iZmXvoPZnALWEk+bJkHVCEqcnPmEBWKmvETz1tlgT0gIzOSQ2NX4 lw2i2UXi3o2DUGcLS7w6vgVqqJTEy/42KLtY4vHrf1B2icThJb9ZIGxjiXc3nwMt4wBaoCmx fpc+iCkhoCxx5BbUWj6JjsN/2SHCvBIdbUIQjSoS2/8tZ4ZZdHr/JqjhHhKX5z9iAykXEuhj lJjNOoFRYRaSX2Yh+WUWwtoFjMyrGMVSC4pz01OLjQoM4fGbnJ+7iRGcWLVcdzBOfvtB7xAj EwfjIUYJDmYlEd5CY9UUId6UxMqq1KL8+KLSnNTiQ4ymwJCeyCwlmpwPTO15JfGGJpYGJmZm huZGpgbmSuK80rYnk4UE0hNLUrNTUwtSi2D6mDg4pRqYzvzNrqyT27hE6/6X+7wv/n/LOHNU P5nD8tVR4c/eQQ2br363vFr8YtOUuVV8kc3/3sQIsq5aK1N57cehDv34orLicI/9z1W+HCli 2Oxx3KuB4dhTy06pNVrdVva884Lc3z5ri6rYcnNCc/nO+Bn81UbCxvae+57xhU336U+e6Lfr TYCjmNSOyOpP2zW6dCQOX3OZ9YmdU7Ou6sid41rZHw/9ljuZtPT7klsvYuQMo+3mByR8fpfh dmLe4g12P3/t2HRKLWzTtQRlkxmG3Kx/E1lOX9/RxbGu9pWbgPG6pohNdS8j816xuxkddeO/ stBaer6MveZJhvLEB0cNsy+sF88QPb2wJb8o+odfTFCWEktxRqKhFnNRcSIAp4dZczUEAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrCLMWRmVeSWpSXmKPExsWy7bCSvO7iI2opBt836llMP6xoMX3qBUaL vgmPmS3OzzrFYrFn70kWi3tr/rNa7Hu9l9niRedxJouODW8YLTbef8fmwOXx78QaNo/Fe14y eWz6NIndY/KN5YwefVtWMXp83iTn0X6gmymAPYrLJiU1J7MstUjfLoErY/q2r8wFl40rrm18 x9jAuEmri5GTQ0LAROLtkj+MXYxcHEICuxklJq/fzwaRkJJ4f7qNHcIWlrjfcoQVxBYS6GCS WHreCMRmE9CW+HPlPFi9iICkxMoNa5hABjGD1Gy9PB0owcEhLOAvcbUHrJ5FQFWi+cldsDm8 AjYSFye8hpovLzHz0ncwm1PAWuJp0ySoXVYSx0+cZYaoF5Q4OfMJC8hIZgF1ifXzhEDCzECt zVtnM09gFJyFpGoWQtUsJFULGJlXMUqmFhTnpucWGxYY5qWW6xUn5haX5qXrJefnbmIER4yW 5g7G7as+6B1iZOJgPMQowcGsJMJbaKyaIsSbklhZlVqUH19UmpNafIhRmoNFSZz3QtfJeCGB 9MSS1OzU1ILUIpgsEwenVANTC0MR64O4xvWfuHv2ThXxS2nfcJD56oP1p1VrOdynbVkfkFpc /u1ygGSWOGuW1YVfcqbCYudU1vtzS8212RrHK8u2Wi5ZfcL176bnMoNvOb8vv39LP6Gk8e1+ C71fvTxB0pHPeY6dP3AlIcN625Z/C7RuRR1I7cxQOm5XlKJwdvHZugrGo7rvZHZtDesoSTrT eNHVeNH7nslMzGemy3ioGhjEt0QHdilPee9xs2GOzoNPgZZaPNY9bAIrTykUxM4tnhd/Wvej /pJP4X8W8txWkXkXJ9To57g2/BLHg4gDG55czEuzuDZX+WrxqbO/Uj+ebK9ZIVRk2FzLy9ZZ OIlz8ur53lt2f9nt7+EpekiJpTgj0VCLuag4EQCKYSbxBwMAAA== X-CMS-MailID: 20230331113147epcas2p12655777fec6839f7070ffcc446e3581b X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-Sendblock-Type: AUTO_CONFIDENTIAL CMS-TYPE: 102P DLP-Filter: Pass X-CFilter-Loop: Reflected X-CMS-RootMailID: 20230331113147epcas2p12655777fec6839f7070ffcc446e3581b References: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 318211C001D X-Stat-Signature: ppwd9c4uf8p7cbh3ed6egsd8yz7t3kdk X-Rspam-User: X-HE-Tag: 1680262313-256384 X-HE-Meta: U2FsdGVkX1/1s/OcAr+RamSBf60dADMOOTuEuuxPxl7TXWIs1l3IIMwjf+2RPvQgzlBAG9B6h1ndnWPlXRhRL6TZxPrKGMGtsWuLpifkWCxYYLDU+IuBm6RTt2FI0O2XZVB/4e6LTV656zLs+X1WChuUt/SlDRl8Hwq7koZ+1agklcDR8OdEd/UXkk18ixvJuc31mjp8IxCfKi/lfLELCz+5JHXvCKZezyGCv7OQAo0MsfcO4fwoaciZqWUpKpWqJQ/tYjLUy8pb5p2nyaOYRaoCakXyijTwvTM3NfXX3R/DdkJeyHHjFx+NVmmeHpntkwJ2fEWsKBopYTbkirbcMNH4AaRoRKDcJgOtNDIdVTtE2VesYB56ZD88/tkuUbqyz+bpYZbHZWGDbB1QYUEJoy+QU78rk+FOmJAo+5daqC7iw2yun0PxhrgIAHM8vh2dJk+Yu7/o7PwoJg0uMUWG0QBW2lc0XXcldjxMx8iOr3VKcEbz4yjW348lia+HkidXuD7/dEX0+GXmHnsVb5MIQWYbN50UoLTP9ax7XMOoHWta7u0Jl255vsVfPXK/iovhF8JBxcmPS5v/plTQrR+NCrgxNiQaOCn0cXjzHqPfUIJpqy7bPVvsnnHGx7SDz+xiXMMpxU6NbTd5zygtCLeISWqnoqUMewjvC3Z4bPHXwhAQ/03MRzD0rDTH4SRx7/ES63ki0LvRBPVZrRsHm4wcQh/PvdBe7DPG6blrj0OQlOyCLB+PcVHE2PtFBtSresjda+USDrzjqKJWNkXH/dmiCMOyBeQjWBYfnPPhdaIyP+g7EcU7GMT3mGbzce4d6C8ai6lJSXubW4mbocHjSiWwm7bAcJrBTsX9mg9Kx2IfpgH1ssnhv7yoCqQ40x2bXXrDSi2oLdUFyCcWPjriOMELNYeABtQK3kSqUvpwpM05a8TamJvNNFF/EndknZBbfTs59FX8TGeZnUjwgs4U7GF 5amW07nr dSw//IMT1ZSClUpHVmIrbvWzzk0fJ7oJTCpB1vIwJCnSdgdZhp/BbFrleFvhVaoTNgIyl6YH8ROIoDw0MXdHfEbSPL8ieGuJonTdJc7jXIZ9pTec= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Jorgen Hansen. Thank you for joining this topic and share your thoughts. I'm sorry for late reply due to some major tasks of our team this week. >> On 24 Mar 2023, at 10.50, Kyungsan Kim wrote: >> >>> On 24.03.23 10:27, Kyungsan Kim wrote: >>>>> On 24.03.23 10:09, Kyungsan Kim wrote: >>>>>> Thank you David Hinderbrand for your interest on this topic. >>>>>> >>>>>>>> >>>>>>>>> Kyungsan Kim wrote: >>>>>>>>> [..] >>>>>>>>>>> In addition to CXL memory, we may have other kind of memory in the >>>>>>>>>>> system, for example, HBM (High Bandwidth Memory), memory in FPGA card, >>>>>>>>>>> memory in GPU card, etc. I guess that we need to consider them >>>>>>>>>>> together. Do we need to add one zone type for each kind of memory? >>>>>>>>>> >>>>>>>>>> We also don't think a new zone is needed for every single memory >>>>>>>>>> device. Our viewpoint is the sole ZONE_NORMAL becomes not enough to >>>>>>>>>> manage multiple volatile memory devices due to the increased device >>>>>>>>>> types. Including CXL DRAM, we think the ZONE_EXMEM can be used to >>>>>>>>>> represent extended volatile memories that have different HW >>>>>>>>>> characteristics. >>>>>>>>> >>>>>>>>> Some advice for the LSF/MM discussion, the rationale will need to be >>>>>>>>> more than "we think the ZONE_EXMEM can be used to represent extended >>>>>>>>> volatile memories that have different HW characteristics". It needs to >>>>>>>>> be along the lines of "yes, to date Linux has been able to describe DDR >>>>>>>>> with NUMA effects, PMEM with high write overhead, and HBM with improved >>>>>>>>> bandwidth not necessarily latency, all without adding a new ZONE, but a >>>>>>>>> new ZONE is absolutely required now to enable use case FOO, or address >>>>>>>>> unfixable NUMA problem BAR." Without FOO and BAR to discuss the code >>>>>>>>> maintainability concern of "fewer degress of freedom in the ZONE >>>>>>>>> dimension" starts to dominate. >>>>>>>> >>>>>>>> One problem we experienced was occured in the combination of hot-remove and kerelspace allocation usecases. >>>>>>>> ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time. >>>>>>>> ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation. >>>>>>>> Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag. >>>>>> >>>>>>> That sounds like a bad hack :) . >>>>>> I consent you. >>>>>> >>>>>>>> In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped. >>>>>>>> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases. >>>>>> >>>>>>> I once raised the idea of a ZONE_PREFER_MOVABLE [1], maybe that's >>>>>>> similar to what you have in mind here. In general, adding new zones is >>>>>>> frowned upon. >>>>>> >>>>>> Actually, we have already studied your idea and thought it is similar with us in 2 aspects. >>>>>> 1. ZONE_PREFER_MOVABLE allows a kernelspace allocation using a new zone >>>>>> 2. ZONE_PREFER_MOVABLE helps less fragmentation by splitting zones, and ordering allocation requests from the zones. >>>>>> >>>>>> We think ZONE_EXMEM also helps less fragmentation. >>>>>> Because it is a separated zone and handles a page allocation as movable by default. >>>>> >>>>> So how is it different that it would justify a different (more confusing >>>>> IMHO) name? :) Of course, names don't matter that much, but I'd be >>>>> interested in which other aspect that zone would be "special". >>>> >>>> FYI for the first time I named it as ZONE_CXLMEM, but we thought it would be needed to cover other extended memory types as well. >>>> So I changed it as ZONE_EXMEM. >>>> We also would like to point out a "special" zone aspeact, which is different from ZONE_NORMAL for tranditional DDR DRAM. >>>> Of course, a symbol naming is important more or less to represent it very nicely, though. >>>> Do you prefer ZONE_SPECIAL? :) >>> >>> I called it ZONE_PREFER_MOVABLE. If you studied that approach there must >>> be a good reason to name it differently? >>> >> >> The intention of ZONE_EXMEM is a separated logical management dimension originated from the HW diffrences of extended memory devices. >> Althought the ZONE_EXMEM considers the movable and frementation aspect, it is not all what ZONE_EXMEM considers. >> So it is named as it. > >Given that CXL memory devices can potentially cover a wide range of technologies with quite different latency and bandwidth metrics, will one zone serve as the management vehicle that you seek? If a system contains both CXL attached DRAM and, let say, a byte-addressable CXL SSD - both used as (different) byte addressable tiers in a tiered memory hierarchy, allocating memory from the ZONE_EXMEM doesn’t really tell you much about what you get. So the client would still need an orthogonal method to characterize the desired performance characteristics. I agree that a heterogeneous system would be able to adopt multiple types of extended memory devices. We think ZONE_EXMEM can apply different management algorithms for each extended memory type. What we think is ZONE_NORMAL : ZONE_EXMEM = 1 : N, where N is the number of HW device type. ZONE_NORMAL is for conventional DDR DRAM on DIMM F/F, while ZONE_EXMEM is for extended memories, CXL DRAM, CXL SSD, etc on other F/Fs such as EDSFF. We think the movable attribute is a requirement for CXL DRAM device. However, there are other SW points we are concerning - implicit allocation and unintended migration - with CXL HW differences. So, I'm not sure if it is possible or good to cover the matters by combination of ZONE_MOVABLE and ZONE_PREFER_MOVABLE design. Let me point out again, we proposed the ZONE_EXMEM for the special logical management of extended memory devices. Specifically, for the performance metric, we think it would be handled not in the zone, but in a node unit. >This method could be combined with a fabric independent zone such as ZONE_PREFER_MOVABLE to address the kernel allocation issue. At the same time, this new zone could also be useful in other cases, such as virtio-mem. We agree with your thought. Along with adoption of CXL memory pool and fabric, virtualization SW layers would be added. Considering not only baremetal OS, but memory inflation/deflation between baremetal OS and a hypervisor, we think ZONE_EXMEM can be useful as the identifier for CXL memory. > >Thanks, >Jorgen