From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7600FC282D0 for ; Tue, 4 Mar 2025 12:53:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBD5D6B0089; Tue, 4 Mar 2025 07:53:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D6D94280003; Tue, 4 Mar 2025 07:53:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3542280002; Tue, 4 Mar 2025 07:53:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A55816B0089 for ; Tue, 4 Mar 2025 07:53:19 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 12904C1C8F for ; Tue, 4 Mar 2025 12:53:19 +0000 (UTC) X-FDA: 83183859318.06.4F14573 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf26.hostedemail.com (Postfix) with ESMTP id 89C2C140003 for ; Tue, 4 Mar 2025 12:53:16 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf26.hostedemail.com: domain of honggyu.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=honggyu.kim@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741092797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s0eb1ZN5C3o7c56U0/LRN6BhPfKMNCwU1FloK5lcskA=; b=rpdsaVjCgA/MQyVzdTGpevju9Wvv93Ow/iP7URGCiVQBFZDfhlco6/EpEPgqtXL7w8bcs2 lpGKtU7M3ADY0f/GUONVTq6Zi6zgTof3TNDBQOuabHxAZUakiPNlOHDL25B6gO2qF3LHDj VsnpP3QmDnB1DYlw+qe0gNvZLKo6rjo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741092797; a=rsa-sha256; cv=none; b=egHsNk56RV9AOP6VZ+uFbhZJgiRgpOIpGh6wh4ZDDNisPhM/zFEHdUH0N/2VJkbTF8wZbN NCo+B/sMYP7DSS85o6UwFlA/dh2Z4TjW5TMXe6QeHXEKHuIsN7rrXxwaeAbFX4MTRYcYo1 yU6tdPlQmHBeBN7T5tOUumlJKa+4uqU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf26.hostedemail.com: domain of honggyu.kim@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=honggyu.kim@sk.com X-AuditID: a67dfc5b-3c9ff7000001d7ae-95-67c6f7bab570 Message-ID: Date: Tue, 4 Mar 2025 21:53:13 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: kernel_team@skhynix.com, gourry@gourry.net, harry.yoo@oracle.com, ying.huang@linux.alibaba.com, gregkh@linuxfoundation.org, rakie.kim@sk.com, akpm@linux-foundation.org, rafael@kernel.org, lenb@kernel.org, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, dave.jiang@intel.com, horen.chuang@linux.dev, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com, yunjeong.mun@sk.com Subject: Re: [PATCH 2/2 v6] mm/mempolicy: Don't create weight sysfs for memoryless nodes To: Joshua Hahn References: <20250303215638.317539-1-joshua.hahnjy@gmail.com> Content-Language: ko From: Honggyu Kim In-Reply-To: <20250303215638.317539-1-joshua.hahnjy@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNIsWRmVeSWpSXmKPExsXC9ZZnoe6u78fSDTYcFrGYs34Nm8X0qRcY LU7cbGSz+Hn3OLtF8+L1bBarN/la3F/2jMXidv85VotVC6+xWRzfOo/dYt9FoIadD9+yWSzf 189ocXnXHDaLe2v+s1rM/TKV2WL1mgwHQY/Db94ze+ycdZfdo7vtMrtHy5G3rB6L97xk8ti0 qpPNY9OnSeweJ2b8ZvHY+dDSY2HDVGaP/XPXsHucu1jh8fHpLRaPz5vkAviiuGxSUnMyy1KL 9O0SuDK+HT3HVLBRo+Jvzz6WBsblCl2MnBwSAiYSGx+sZ4Gx/9z5yQxi8wpYSsy+f58NxGYR UJF4tu8EC0RcUOLkzCdgtqiAvMT9WzPYuxi5OJgFzjJLnJ67hwkkISwQJbHz1V8wW0RAU+JE 6ySwoUICthLzHr4As5kFRCRmd7aB2WwCahJXXk4Cqufg4BSwk9jwVxGixEyia2sXI4QtL9G8 dTYzyC4JgUvsEqdWTGOCOFpS4uCKGywTGAVnIblvFpIVs5DMmoVk1gJGllWMQpl5ZbmJmTkm ehmVeZkVesn5uZsYgVG8rPZP9A7GTxeCDzEKcDAq8fCeWHA0XYg1say4MvcQowQHs5IIr+nn Y+lCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeY2+lacICaQnlqRmp6YWpBbBZJk4OKUaGCeopwXX bH79cSlnVt7M5sw9s/4qLNmn+Dhi1ppFSgVam8XnTHqkzf6rQ2GlGdduTfcpVaxT6wwfvmq7 W3Dn1VOx1MBtv33y3v9881cwa+O3eM09d/3b9uq/WOIVxfTg1ZzfTiKRa62Z7s+dJnn5Piv/ wyalLPNNLvY8JSITXqkwd05TzPNY812JpTgj0VCLuag4EQC5tL5S3gIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrPIsWRmVeSWpSXmKPExsXCNUNLT3fX92PpBmsLLOasX8NmMX3qBUaL Ezcb2Sx+3j3ObtG8eD2bxepNvhb3lz1jsbjdf47VYtXCa2wWx7fOY7fYdxGo4fDck6wWOx++ ZbNYvq+f0eLyrjlsFvfW/Ge1mPtlKrPFoWvPWS1Wr8mw+L1tBZuDiMfhN++ZPXbOusvu0d12 md2j5chbVo/Fe14yeWxa1cnmsenTJHaPEzN+s3jsfGjpsbBhKrPH/rlr2D3OXazw+Pj0FovH t9seHotffGDy+LxJLkAgissmJTUnsyy1SN8ugSvj29FzTAUbNSr+9uxjaWBcrtDFyMkhIWAi 8efOT2YQm1fAUmL2/ftsIDaLgIrEs30nWCDighInZz4Bs0UF5CXu35rB3sXIxcEscJZZ4vTc PUwgCWGBKImdr/6C2SICmhInWieBDRUSsJWY9/AFmM0sICIxu7MNzGYTUJO48nISUD0HB6eA ncSGv4oQJWYSXVu7GCFseYnmrbOZJzDyzUJyxiwkk2YhaZmFpGUBI8sqRpHMvLLcxMwcU73i 7IzKvMwKveT83E2MwGhdVvtn4g7GL5fdDzEKcDAq8fAa3D2WLsSaWFZcmXuIUYKDWUmE1/Qz UIg3JbGyKrUoP76oNCe1+BCjNAeLkjivV3hqgpBAemJJanZqakFqEUyWiYNTqoGRW2eWgGO8 /J7yRfkuK84J5Ot+Krl+bRXDo2eiAqKSXl11RSbx15cpFMaxLaxj/q64/Sv/83WfP7oe/aN/ OFzeTu+ditqp9rfvwxdcvsc/8aPyjNDdbk9sdzlrJZZfq+o70FYlEcJh3uAUZagh5flK8/kM jfP3Xl/8dcFr0TLBl/4tN86LL1JRYinOSDTUYi4qTgQAHB2/vdICAAA= X-CFilter-Loop: Reflected X-Stat-Signature: ij8mrke4i84bfjuidrt6wpg8zbk6bqui X-Rspamd-Queue-Id: 89C2C140003 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1741092796-463985 X-HE-Meta: U2FsdGVkX1/POI2LnjTr0K2NPnlzGI2hD0RC6/picSfv0wx+ppFDOabP24oBvUbFJLyM6ugDiPujaTRlZUuyuZYM5wvjwJAaPBQKs/g/uk+MqMcYpjA1jCNtU2P9bmwp83NrbOBaZbIjpzafIkzR0mIVz1GMVdg8RZYthw4uclmXasQbAWz9/DwsXBPgiW0TI7pWltbY1Sbz1QKcbOcvnblHtjejNjxFpBU8f3oC7lfbeVA7uGJvkN92oSkF4acsi5lTB1s7QA7VNiWlGneohiU0MTbrvi3W2wHfu0FH/DGlDmO/NRI+jyD51B751kFx7Od+wuRwDejHl/+HCQGL7vJmj1SbrQiOTUVGmcs9CRgjEW6aDF/trVhBYIVCI3JEGenUztDluX5x9cr8HPYfRK1rvD45b/hkh8iwKbI0HmduSCHkNx6Y8UJyspktlq9ueoIHl0+IhFcZvBbRoalHGzXypiBrMQkHgMQzLFiB7nUR4p6JYERpy1+K6eKRxbc/75lRIFXqzMDb0+HS/ZMG9f1t4eei5Bvg1tkZmawkNgaK4MUJtbugh2m4BhPQ0bMyG0OGZQdradPULueLDmi7zovewxWd1KsXuZBi1o4rTTvJt4pjtx5CmZFb5wJ65PwAOV2JzRk6AWMZZv/1u566HhEQazB1kI+NzW5RkAaOs+x8D6eZ3Whaq9qt2HOYNJ5+ihIPOoikNR300KU9aGMcKDhZaP6dVosT+UXZ+WrO0awp9ILyAxAn/L24n6XfJt78qpzcSSxa3N3QtI3rvY7WKdGTB3c5DkNDewJTawpOSSUvRJoZXNHgs5Jk2HP3mD19d7o1E8W4yaI+llPKEFkQDsLTTucT1UEA/uMXnH7BDTJaQstg392w+xNr07MBPibBCYopxREFpMMvuMOq1E8wKxyeT04Ba57sjTLSiudQXoZ0F9xmdspNKgajE/qUSHd7NDCTlL60gRqX6bwtiN/ 2T0IhdRq NOU88cTg447Kc1BlFGos1sfgyIyaAT0twTS+UxCNooTr5I2BfyH7jP0yD0k40wY9Mo1sGG1/go1trlxvYc9VhaF4voER2WYIP5TONGg5w601L8HBJsfGWwv4ne+s5w9jSYqHLrGPJc9PegZva0v154CL293DUccQ5ZpNQouhwmksSrXo6/TIab+RnqxcZSoZBh2CTl6td52erGK/79qzo8P91VaFTJvO7egJV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Joshua, On 3/4/2025 6:56 AM, Joshua Hahn wrote: > On Thu, 27 Feb 2025 12:20:03 +0900 Honggyu Kim wrote: > > Hi Honggyu, thank you for taking time to review my patch, as always! My pleasure! > I thought I had sent this, but it seems like it was left in my draft > without being sent. > > I will follow Gregory's advice and we will drop the patch from this series, > and send the first patch only (with Yunjeong's changes). Thanks again! It'd be great if you could add her with the following. Co-developed-by: Yunjeong Mun > >> >> On 2/27/2025 11:32 AM, Honggyu Kim wrote: >>> Hi Joshua, >>> >>> On 2/27/2025 6:35 AM, Joshua Hahn wrote: >>>> We should never try to allocate memory from a memoryless node. Creating a >>>> sysfs knob to control its weighted interleave weight does not make sense, >>>> and can be unsafe. >>>> >>>> Only create weighted interleave weight knobs for nodes with memory. >>>> >>>> Signed-off-by: Joshua Hahn >>>> --- >>>>   mm/mempolicy.c | 2 +- >>>>   1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >>>> index 4cc04ff8f12c..50cbb7c047fa 100644 >>>> --- a/mm/mempolicy.c >>>> +++ b/mm/mempolicy.c >>>> @@ -3721,7 +3721,7 @@ static int add_weighted_interleave_group(struct >>>> kobject *root_kobj) >>>>           return err; >>>>       } >>>> -    for_each_node_state(nid, N_POSSIBLE) { >>> >>> Actually, we're aware of this issue and currently trying to fix this. >>> In our system, we've attached 4ch of CXL memory for each socket as >>> follows. >>> >>>         node0             node1 >>>       +-------+   UPI   +-------+ >>>       | CPU 0 |-+-----+-| CPU 1 | >>>       +-------+         +-------+ >>>       | DRAM0 |         | DRAM1 | >>>       +---+---+         +---+---+ >>>           |                 | >>>       +---+---+         +---+---+ >>>       | CXL 0 |         | CXL 4 | >>>       +---+---+         +---+---+ >>>       | CXL 1 |         | CXL 5 | >>>       +---+---+         +---+---+ >>>       | CXL 2 |         | CXL 6 | >>>       +---+---+         +---+---+ >>>       | CXL 3 |         | CXL 7 | >>>       +---+---+         +---+---+ >>>         node2             node3 >>> >>> The 4ch of CXL memory are detected as a single NUMA node in each socket, >>> but it shows as follows with the current N_POSSIBLE loop. >>> >>> $ ls /sys/kernel/mm/mempolicy/weighted_interleave/ >>> node0 node1 node2 node3 node4 node5 >>> node6 node7 node8 node9 node10 node11 FYI, we used to set node2 and node3 only for weights for CXL memory here and ignored node{4-11}. That sounds silly but it worked. > > I see. For my education, would you mind explaining how the numbering works > here? I am not very familiar with this setup, and not sure how you would > figure out what node is which, just by looking at the numbering. Regarding the numbering, I'm not 100% sure, but I guess there could be a logical NUMA node that combines 4ch of CXL memory and 4 nodes for CXL memory so in total 5 nodes per socket. I don't have much knowledge on this but maybe this is related to PXM (Proximity Domain). > >>>> +    for_each_node_state(nid, N_MEMORY) { >> >> Thinking it again, we can leave it as a separate patch but add our patch >> on top of it. > > That sounds good to me. > >> The only concern I have is having only N_MEMORY patch hides weight >> setting knobs for CXL memory and it makes there is no way to set weight >> values to CXL memory in my system. > > You can use weighted interleave auto-tuning : -) Not possible because using N_MEMORY doesn't provide "node" knobs for CXL memory at all as follows. $ ls /sys/kernel/mm/mempolicy/weighted_interleave/ node0 node1 We need node2 and node3 for CXL memory here. > In all seriousness, this makes sense. It seems pretty problematic that > the knobs aren't created for the CXL channels, Yeah, it's even worse than the current status. > and I'm not sure that hiding> it is the correct approach here (it was not my intent, either). It isn't your problem but we shouldn't hide those nodes until it is correctly fixed with hot plugging event handler. > >> IMHO, this and our patch is better to be submitted together. > > That sounds good. We can hold off on this patch then, and just consider > the first patch of this series. Thank you for letting me know! The N_POSSIBLE and N_MEMORY stuffs should had been fixed earlier than this work. I will take a few days if we can submit it together. > > Thank you for always reviewing my patches. Have a great day! > Joshua Thanks for your work and have a great day you too! Kind regards, Honggyu > > Sent using hkml (https://github.com/sjp38/hackermail) >