From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5786CC6FD1D for ; Fri, 7 Apr 2023 14:36:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 955FE900003; Fri, 7 Apr 2023 10:36:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E053900002; Fri, 7 Apr 2023 10:36:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 732EF900003; Fri, 7 Apr 2023 10:36:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5C359900002 for ; Fri, 7 Apr 2023 10:36:08 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0D3EB40F01 for ; Fri, 7 Apr 2023 14:36:07 +0000 (UTC) X-FDA: 80654844816.24.2B5EE9D Received: from hedgehog.birch.relay.mailchannels.net (hedgehog.birch.relay.mailchannels.net [23.83.209.81]) by imf03.hostedemail.com (Postfix) with ESMTP id 1237420003 for ; Fri, 7 Apr 2023 14:36:03 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=stancevic.com header.s=dreamhost header.b=DITUEkBO; dmarc=none; spf=pass (imf03.hostedemail.com: domain of dragan@stancevic.com designates 23.83.209.81 as permitted sender) smtp.mailfrom=dragan@stancevic.com; arc=pass ("mailchannels.net:s=arc-2022:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680878164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jLZaAYrhuQztgnFDhGZXuf14myjUOS4CvOq0JTxeDHc=; b=1iYRs+FaPV8U5uAOUluljhfPbfdAmrJ2iljjZ7fpHFyueb+E0CKZLNp3Ff4QSMme9Ys7UD S9cRrCiI8EsB+3m7CuF6pxSI0ni4JGkZasCeLm5AfzIhwgGsmRFYyk3xeiH0+mNFDwOk2j qvB55w4Vl3Gut6N6RwqspXR8a8RfJkw= ARC-Authentication-Results: i=2; imf03.hostedemail.com; dkim=pass header.d=stancevic.com header.s=dreamhost header.b=DITUEkBO; dmarc=none; spf=pass (imf03.hostedemail.com: domain of dragan@stancevic.com designates 23.83.209.81 as permitted sender) smtp.mailfrom=dragan@stancevic.com; arc=pass ("mailchannels.net:s=arc-2022:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1680878164; a=rsa-sha256; cv=pass; b=bFFmkaxumDFz54Apy4Dh+6LP3xFeoCpZ3ipRpm+HrWfNs3VFUuFoCODz0CpgFXJUvXrkXr nzk0s2Ct1T//uuUgfRrJPOCw5cjMR0j+Sh9tN6iW48DFCw0sETYUnkCm33jLyuMjD9BY0s sA7XnaVSJfbUkTxcIYdyxOX/s9inFbY= X-Sender-Id: dreamhost|x-authsender|dragan@stancevic.com Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 467463E1E69; Fri, 7 Apr 2023 14:36:02 +0000 (UTC) Received: from pdx1-sub0-mail-a294.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id B615B3E15BB; Fri, 7 Apr 2023 14:36:01 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1680878161; a=rsa-sha256; cv=none; b=Y9c4H89a9P/4aOWX4uVc1VM2KoEuc86OJIrnjyUHq/sJ5fvoZgppN5c6PTB/TGusr2ZF6b ogmKz8OTr4xdj+pBI+7Stf+sXNzuyXa756x4FseWs/5gKchnf23DByxbkU6W7L3MLbWt4c 3w65hHXktbfIqO6KFj2DK6do2y4uZ2m7scp97vRvWutcWkayTrqudlRXXuuTwLf0iV95Ke oNU+Af4rJ1I4knDOCA2usqLY2faPgIoQfCcuUiJm4SL6QU4pVWSYP0ru5WwBwGolhM1eSO XUXFHfaY5GPoQxG5AjL97SrvLSPo58eibNDP49c2nqX1HO0DqDgz+Fyf6dpbOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1680878161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jLZaAYrhuQztgnFDhGZXuf14myjUOS4CvOq0JTxeDHc=; b=+GxBlJ6irEdvQSTKhszkJmQcu3JOH4i0YxgEkJp8Jh7UiTY/mEDP/H5U1aio6tIhoPooU/ JjZyOEUEnJCRBmcfXZ+2CJf/vRLhQFeBTu/fdmWl3a100A16pZMRqsfDielBXfCbL+Dr/J LPI7wsuu/va2YinlDFlLP+HpMEKdov4xLluNKpQ7PmxD1w4lUkRjVRsBahIf7x7v12Nz8/ pz5inLZhYEm9HcL8dkXZgeDsZ5pzw5bxgA0EVegY1+Fmy6Rql8vYgbp6rI+vrASwmIU47Q Id7ZQOgMNnN1HObQaoilS3JU4kBP84ESmTtXdNsCvuHAvZuXoByd5lU+5nQg2A== ARC-Authentication-Results: i=1; rspamd-5468d68f6d-qs6w9; auth=pass smtp.auth=dreamhost smtp.mailfrom=dragan@stancevic.com X-Sender-Id: dreamhost|x-authsender|dragan@stancevic.com X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|dragan@stancevic.com X-MailChannels-Auth-Id: dreamhost X-White-Fearful: 498be03f240c216f_1680878162153_265360854 X-MC-Loop-Signature: 1680878162153:1003734348 X-MC-Ingress-Time: 1680878162153 Received: from pdx1-sub0-mail-a294.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.125.42.134 (trex/6.7.2); Fri, 07 Apr 2023 14:36:02 +0000 Received: from [192.168.1.31] (99-160-136-52.lightspeed.nsvltn.sbcglobal.net [99.160.136.52]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: dragan@stancevic.com) by pdx1-sub0-mail-a294.dreamhost.com (Postfix) with ESMTPSA id 4PtLX03qwczQb; Fri, 7 Apr 2023 07:36:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stancevic.com; s=dreamhost; t=1680878161; bh=jLZaAYrhuQztgnFDhGZXuf14myjUOS4CvOq0JTxeDHc=; h=Date:Subject:To:Cc:From:Content-Type:Content-Transfer-Encoding; b=DITUEkBOYDTFey/9IAmhtt/ly5u4afO5WKlFWMyjp6E0VqnQYI8avN6Z9IYe2SkOT 0vAUKh26YhMmgvL6kDjP+Ll2PNP/2uhTXpCzD0wL6VY6MkyBfLs2wN2bPwNszQL7EF euZ4UxH5oaULycDHZOk/Nr6MKT+dvXv0BixG60Bg8+7Wozxr/yW6FfjO5ih5Y54BHV SrxjrU+GpoQAR6fIu+Ua98YBT9M1E/ElO4cKtNa7nOjvFLG4WLzpZPARMt0S6q4H4y 8AN544p1Gs/3I6j01egBx1BIuYpIjmwFiFD57530uX+ZjV2PIKT60fF+pHz0me5nTf NlDO/0wyhFr2A== Message-ID: Date: Fri, 7 Apr 2023 09:35:59 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL Content-Language: en-US To: "Huang, Ying" Cc: Mike Rapoport , Kyungsan Kim , dan.j.williams@intel.com, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org, a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com, nil-migration@lists.linux.dev References: <641b7b2117d02_1b98bb294cb@dwillia2-xfh.jf.intel.com.notmuch> <20230323105105.145783-1-ks0204.kim@samsung.com> <362a9e19-fea5-e45a-3c22-3aa47e851aea@stancevic.com> <81baa7f2-6c95-5225-a675-71d1290032f0@stancevic.com> <87sfdgywha.fsf@yhuang6-desk2.ccr.corp.intel.com> <87a5zky0c8.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Dragan Stancevic In-Reply-To: <87a5zky0c8.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 1237420003 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: t54xzsi4z37t4yxxpiuzmdqd7mf97fnc X-HE-Tag: 1680878163-821553 X-HE-Meta: U2FsdGVkX19sDHKiT5AlERmMjrM/C7xN3Xt5QIEUyaDCeUJUuKgltc551clyyeg063TO8jqSw5921asskklFSFrSuBkvSeJpWS9//wGm1WWQfjoDV3NlgCMgvt7knlRyXy61lxGxMNWcp5jTcRL4BMC0T/5t9/M05FBkpHj2QxUV9R3PbefU3q6cQCkfAIbsuwnQhVeKIVK2Vdme83WcfgkdWq0FgdsuwzhZS0qydzZuuaQ6JY+LeYYfSqqDazsHeKPhqDyHyTxZH+3aiLwO56npLt3H1+jOVO8GTaC5+9i+RZUL0xEy+ByBpwumJqMM2L5yfe2sjm6jXtcbfaKpE/lnkHym5HsPED+9hi3JFuKhcCc5JIDy+mJMuh+yGupqDqEF40WAsQH16Wa6DTvDPyoIh0J+OQLhFJn2FYBm+2i3Qk/v0+f4rCnTtBenA7BxvlOcDNVQx0WfPXkPNS5LP48QNM8oQ1I4AGE1hiWwx3GDoO0POGcoNPDz8n5B1I8AoEPqNVyewOK3RDHImPNp4LmOHFM0pks54onpMCQbZyXa7z00XTNNqXukdarADqV7VOtaOg5ShEIRzmiJegTVOx6sRhT1aLQ/sZ7R/vpaIlmd15J62bKy7zY/WLKeAJUBKkCOb4VAknOuJrHuMlpo1Fimm9DAJ54YOENCs3f2R+jybqfIvIZg6jR07txJ8aeNkY42TBwfAK+xDkpGxfJbxngXZD3z61OeyqenauYwKZcBfIT47/Q5BuenAQRzjuN1Pf9iWBjS4DmDt3CP3RNoNUfzO2jSh2+YfUlPrLQJHPxpGxXjjE1QYGhWGVmXg5w15wJgPOgMz9lNrW5gT73GDgDdgjO+KIRs4NAQr1hVjCu9mqF4aUvMluJPOZkc2OMk9q7o1HlioezNXYEDGJqaEZEF246kzwtjfALli8N1S9ZmtmPmWaw+3Tof52zlxf7RXzJhg6ALjFzNel6wGoz toJC5XbM o/CnN7dvsGeaj9AIwAqBauVHhH276PTQ3HjzV2XT83VBfB1vVeE3XefxjZJyeu3MIZsRELY69l0ppX/OOII6GByuNQ0NPm58cF+ZEqdXyjJPxik2QbYVET8O8S4cN9OZsaZ+WDqBRNLOIf+QVlr77jmIh8UN5AvRC0NigG/rdFGU7hR/ooWL3gk4w5AjpXuRVNHeTEeuwPZsDFZETpi/yJieCIxHW6rU54n9OGWije99spIOygVSkjyFdWKYtRdB6wzDotJRR9m3EntBOPpZIuGAn6bq5Tn1ECPAl6Pxc5QquaVhJY/D1xFrsKl+ABptNhukhbRg0IS8iGpfSMuOXc4E6lEpR3Ya2sA18pTbYfqnep5boBLrJeMiaVrv9ZXw08dmfZ8M3TLXNJb7vIOmJfwxHBdougfJi9MztCg1Gnkb6NxjfrLbFbMGQ6wENwpAmz8lzT5Smz141EBlsrHPPSH7gwwv4Z0vBNkFsvesJ0eod+k0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Ying- On 4/6/23 19:58, Huang, Ying wrote: > Dragan Stancevic writes: > >> Hi Ying- >> >> On 4/4/23 01:47, Huang, Ying wrote: >>> Dragan Stancevic writes: >>> >>>> Hi Mike, >>>> >>>> On 4/3/23 03:44, Mike Rapoport wrote: >>>>> Hi Dragan, >>>>> On Thu, Mar 30, 2023 at 05:03:24PM -0500, Dragan Stancevic wrote: >>>>>> On 3/26/23 02:21, Mike Rapoport wrote: >>>>>>> Hi, >>>>>>> >>>>>>> [..] >> One problem we experienced was occured in the combination of >>>>>> hot-remove and kerelspace allocation usecases. >>>>>>>> ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time. >>>>>>>> ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation. >>>>>>>> Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag. >>>>>>>> In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped. >>>>>>>> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases. >>>>>>>> As you well know, among heterogeneous DRAM devices, CXL DRAM is the first PCIe basis device, which allows hot-pluggability, different RAS, and extended connectivity. >>>>>>>> So, we thought it could be a graceful approach adding a new zone and separately manage the new features. >>>>>>> >>>>>>> This still does not describe what are the use cases that require having >>>>>>> kernel allocations on CXL.mem. >>>>>>> >>>>>>> I believe it's important to start with explanation *why* it is important to >>>>>>> have kernel allocations on removable devices. >>>>>> >>>>>> Hi Mike, >>>>>> >>>>>> not speaking for Kyungsan here, but I am starting to tackle hypervisor >>>>>> clustering and VM migration over cxl.mem [1]. >>>>>> >>>>>> And in my mind, at least one reason that I can think of having kernel >>>>>> allocations from cxl.mem devices is where you have multiple VH connections >>>>>> sharing the memory [2]. Where for example you have a user space application >>>>>> stored in cxl.mem, and then you want the metadata about this >>>>>> process/application that the kernel keeps on one hypervisor be "passed on" >>>>>> to another hypervisor. So basically the same way processors in a single >>>>>> hypervisors cooperate on memory, you extend that across processors that span >>>>>> over physical hypervisors. If that makes sense... >>>>> Let me reiterate to make sure I understand your example. >>>>> If we focus on VM usecase, your suggestion is to store VM's memory and >>>>> associated KVM structures on a CXL.mem device shared by several nodes. >>>> >>>> Yes correct. That is what I am exploring, two different approaches: >>>> >>>> Approach 1: Use CXL.mem for VM migration between hypervisors. In this >>>> approach the VM and the metadata executes/resides on a traditional >>>> NUMA node (cpu+dram) and only uses CXL.mem to transition between >>>> hypervisors. It's not kept permanently there. So basically on >>>> hypervisor A you would do something along the lines of migrate_pages >>>> into cxl.mem and then on hypervisor B you would migrate_pages from >>>> cxl.mem and onto the regular NUMA node (cpu+dram). >>>> >>>> Approach 2: Use CXL.mem to cluster hypervisors to improve high >>>> availability of VMs. In this approach the VM and metadata would be >>>> kept in CXL.mem permanently and each hypervisor accessing this shared >>>> memory could have the potential to schedule/run the VM if the other >>>> hypervisor experienced a failure. >>>> >>>>> Even putting aside the aspect of keeping KVM structures on presumably >>>>> slower memory, >>>> >>>> Totally agree, presumption of memory speed dully noted. As far as I am >>>> aware, CXL.mem at this point has higher latency than DRAM, and >>>> switched CXL.mem has an additional latency. That may or may not change >>>> in the future, but even with actual CXL induced latency I think there >>>> are benefits to the approaches. >>>> >>>> In the example #1 above, I think even if you had a very noisy VM that >>>> is dirtying pages at a high rate, once migrate_pages has occurred, it >>>> wouldn't have to be quiesced for the migration to happen. A migration >>>> could basically occur in-between the CPU slices, once VCPU is done >>>> with it's slice on hypervisor A, the next slice could be on hypervisor >>>> B. >>>> >>>> And the example #2 above, you are trading memory speed for >>>> high-availability. Where either hypervisor A or B could run the CPU >>>> load of the VM. You could even have a VM where some of the VCPUs are >>>> executing on hypervisor A and others on hypervisor B to be able to >>>> shift CPU load across hypervisors in quasi real-time. >>>> >>>> >>>>> what ZONE_EXMEM will provide that cannot be accomplished >>>>> with having the cxl memory in a memoryless node and using that node to >>>>> allocate VM metadata? >>>> >>>> It has crossed my mind to perhaps use NUMA node distance for the two >>>> approaches above. But I think that is not sufficient because we can >>>> have varying distance, and distance in itself doesn't indicate >>>> switched/shared CXL.mem or non-switched/non-shared CXL.mem. Strictly >>>> speaking just for myself here, with the two approaches above, the >>>> crucial differentiator in order for #1 and #2 to work would be that >>>> switched/shared CXL.mem would have to be indicated as such in a way. >>>> Because switched memory would have to be treated and formatted in some >>>> kind of ABI way that would allow hypervisors to cooperate and follow >>>> certain protocols when using this memory. >>>> >>>> >>>> I can't answer what ZONE_EXMEM will provide since we haven's seen >>>> Kyungsan's talk yet, that's why I myself was very curious to find out >>>> more about ZONE_EXMEM proposal and if it includes some provisions for >>>> CXL switched/shared memory. >>>> >>>> To me, I don't think it makes a difference if pages are coming from >>>> ZONE_NORMAL, or ZONE_EXMEM but the part that I was curious about was >>>> if I could allocate from or migrate_pages to (ZONE_EXMEM | type >>>> "SWITCHED/SHARED"). So it's not the zone that is crucial for me, it's >>>> the typing. That's what I meant with my initial response but I guess >>>> it wasn't clear enough, "_if_ ZONE_EXMEM had some typing mechanism, in >>>> my case, this is where you'd have kernel allocations on CXL.mem" >>>> >>> We have 2 choices here. >>> a) Put CXL.mem in a separate NUMA node, with an existing ZONE type >>> (normal or movable). Then you can migrate pages there with >>> move_pages(2) or migrate_pages(2). Or you can run your workload on the >>> CXL.mem with numactl. >>> b) Put CXL.mem in an existing NUMA node, with a new ZONE type. To >>> control your workloads in user space, you need a set of new ABIs. >>> Anything you cannot do in a)? >> >> I like the CXL.mem as a NUMA node approach, and also think it's best >> to do this with move/migrate_pages and numactl and those a & b are >> good choices. >> >> I think there is an option c too though, which is an amalgamation of a >> & b. Here is my thinking, and please do let me know what you think >> about this approach. >> >> If you think about CXL 3.0 shared/switched memory as a portal for a VM >> to move from one hypervisor to another, I think each switched memory >> should be represented by it's own node and have a distinct type so the >> migration path becomes more deterministic. I was thinking along the >> lines that there would be some kind of user space clustering/migration >> app/script that runs on all the hypervisors. Which would read, let's >> say /proc/pagetypeinfo to find these "portals": >> Node 4, zone Normal, type Switched .... >> Node 6, zone Normal, type Switched .... >> >> Then it would build a traversal Graph, find per hypervisor reach and >> critical connections, where critical connections are cross-rack or >> cross-pod, perhaps something along the lines of this pseudo/python code: >> class Graph: >> def __init__(self, mydict): >> self.dict = mydict >> self.visited = set() >> self.critical = list() >> self.reach = dict() >> self.id = 0 >> def depth_first_search(self, vertex, parent): >> self.visited.add(vertex) >> if vertex not in self.reach: >> self.reach[vertex] = {'id':self.id, 'reach':self.id} >> self.id += 1 >> for next_vertex in self.dict[vertex] - {parent}: >> if next_vertex not in self.visited: >> self.depth_first_search(next_vertex, vertex) >> if self.reach[next_vertex]['reach'] < self.reach[vertex]['reach']: >> self.reach[vertex]['reach'] = self.reach[next_vertex]['reach'] >> if parent != None and self.reach[vertex]['id'] == >> self.reach[vertex]['reach']: >> self.critical.append([parent, vertex]) >> return self.critical >> >> critical = mygraph.depth_first_search("hostname-foo4", None) >> >> that way you could have a VM migrate between only two hypervisors >> sharing switched memory, or pass through a subset of hypervisors (that >> don't necessarily share switched memory) to reach it's >> destination. This may be rack confined, or across a rack or even a pod >> using critical connections. >> >> Long way of saying that if you do a) then the clustering/migration >> script only sees a bunch of nodes and a bunch of normal zones it >> wouldn't know how to build the "flight-path" and where to send a >> VM. You'd probably have to add an additional interface in the kernel >> for the script to query the paths somehow, where on the other hand >> pulling things from proc/sys is easy. >> >> >> And then if you do b) and put it in an existing NUMA and with a >> "Switched" type, you could potentially end up with several "Switched" >> types under the same node. So when you numactl/move/migrate pages they >> could go in either direction and you could send some pages through one >> "portal" and others through another "portal", which is not what you >> want to do. >> >> That's why I think the c option might be the most optimal, where each >> switched memory has it's own node number. And then displaying type as >> "Switched" just makes it easier to detect and Graph the topology. >> >> >> And with regards to an ABI, I was referring to an ABI needed between >> the kernels running on separate hypervisors. When hypervisor B boots, >> it needs to detect through an ABI if this switched/shared memory is >> already initialized and if there are VMs in there which are used by >> another hypervisor, say A. Also during the migration, hypervisors A >> and B would have to use this ABI to synchronize the hand-off between >> the two physical hosts. Not an all-inclusive list, but I was referring >> to those types of scenarios. >> >> What do you think? > > It seems unnecessary to add a new zone type to mark a node with some > attribute. For example, in the following patch, a per-node attribute > can be added and shown in sysfs. > > https://lore.kernel.org/linux-mm/20220704135833.1496303-10-martin.fernandez@eclypsium.com/ That's a very good suggestion Ying, thank you I appreciate it. So perhaps having switched memory on it's own node(option a), and exporting a sysfs attribute like "switched". Might be a good place to also export hypervisor partners in there which share the same switched memory, for the script to build up a connection topology graph. -- Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla