From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68746E77188 for ; Wed, 8 Jan 2025 16:58:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB71C6B0082; Wed, 8 Jan 2025 11:58:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B681B6B0083; Wed, 8 Jan 2025 11:58:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A07F46B0085; Wed, 8 Jan 2025 11:58:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 82C2D6B0082 for ; Wed, 8 Jan 2025 11:58:36 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 10D1A161287 for ; Wed, 8 Jan 2025 16:58:35 +0000 (UTC) X-FDA: 82984893432.06.CB96EAE Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf05.hostedemail.com (Postfix) with ESMTP id 08135100016 for ; Wed, 8 Jan 2025 16:58:33 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IKY/7Z1z"; spf=pass (imf05.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736355514; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=te0tqfmNg1Pxd++YCQH+vCI4MIXZva+uen0jwvleMzg=; b=1tJoPoq5ChF2zOllXsZu1xwonZlbVYJDM6Uakj1MK/pcTuWMJRkt7EQXMPlo5BQmqLq7Pn DnLYB9CUa50w0hKX580WDc9jkXr7YUleKl7wTRF/NeveMkQnzSft/qiSnxUTvem8pONYKk D2df1czZeqwHHBRBP+/9WSJdhNTNzzs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IKY/7Z1z"; spf=pass (imf05.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736355514; a=rsa-sha256; cv=none; b=QKW2mBd9rWsJkmhRbgYsMEBs/tXOxC9DmKKWQMwLDwIVWOk5RmRDR4zmQwlA7mJbbf9nVq x7+2KvgaxBxvFZqIlyl8/hMQNlg/YWxLZrGeEl1EkKApLCz1WLhwbAm6u67mUgP0FosSzm OJ/AChi/kOIbyFKRXfHRF4wOmSyXodA= Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2ee8e8e29f6so36195a91.0 for ; Wed, 08 Jan 2025 08:58:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736355512; x=1736960312; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=te0tqfmNg1Pxd++YCQH+vCI4MIXZva+uen0jwvleMzg=; b=IKY/7Z1zs3v1pP0tsCydXwbljE1NhStDo4qzkVLc1gbr51/QzZFmrQGs3L9aBNquXC IqiTgXTVA6Qp1/1wxJ9DNs6mCoKrTH85KSbBkhR3mljf5+7FL+HaNlvANa2Udc8wV9Tk sDRmZKJ/Ihzf9mzOWB0p0oBiR+eb7KnO3dl1BNdarhD1faOIOBUmqiKn20dANCFrgFI/ M1G+f+5x2sTeR7RkJ/62+8uK3NEYUDecSwonoluLOOYzrPCX6S6U2G1HslVogtrafuwE SBYyxTAGF9/gXjr5fGqTQnrHpy8CM1VY0O0SSkg8Hft9f2MdwdYXFEBo9JiVp/SL+Whn Jxaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736355512; x=1736960312; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=te0tqfmNg1Pxd++YCQH+vCI4MIXZva+uen0jwvleMzg=; b=Fr0gQED8ZTMan1zT+RtHHbsWiJHnGGeyfae1EdN6uhhf7QL8OXGCddUVbxKSAlWFxL PkH+43DFT6GvgxTUzPBBkXF/OWDsZAyjXdKgKKc8GJmCVbOUwOaAqvUmh2rEyhNCXpb8 UE7UxaEqdM8lrydnPEb9gYjfS0hbmI1iVg8K0u7mgWibCmY8aQiP/8sbQnNcs+yym2Ap opXzcJafG4xORt4SCykPJRBe/4nhNJZjsX9ecSMMEsoIo19YXXTLTWkpr216RRNB/A2a VR2hH+KiinnJ9oaK2hAckiW3ZPVjdKNaMZyfGfOZlalFZBD9/a3woE50M1EWNIi+Hkhr T09Q== X-Forwarded-Encrypted: i=1; AJvYcCX30tzxyQBE5MT65sUBz1GW8qu8qAWaWbY5Hj7e2vVm0s71FnQcTDK+dUW4S9UaibYx41l9SrhA3A==@kvack.org X-Gm-Message-State: AOJu0Yw99eVedDdlRoYEDH8FdWBNWYyfymgVeOPQcfdQYZbwWCF3PsTG YXE9EgazClgvgvkCYTgc6sFB5tPbVN70EXJLoKUHDracw9D0QriU X-Gm-Gg: ASbGncs6LWr/i0Lb7hdJfONf52zSgAIsK7YCbTNKpcOqqSA2oHhtR5A8jH0Yhhy1nnN t7Z+iIVjWT+Kbt7pU+i8pYA+MmZ+W1G7LdDiisKusMxLp/ykzLg3CbZvSszfk9S017HcZg7btrB F+pU1Dv2XByWCCwOQXJMNcn969Nw65UfF0mrrWHYxGMn4MmU6BKViSiGWP1VHKwghyELngSy02N +ekBIam/AlpH2EPoO9cEz0PfKl5HrOnxFehJw+BzLWI2JCfrBBe0ep5yvffuhgvtF3d0ftlA+SL 8qPt8STC X-Google-Smtp-Source: AGHT+IGGdhNceu6iBGn/AzWihd4IO9LBhEVsyvoRvfFe45JrrXRM3/9+6nFk90LtKUgqkDWB3dt0aw== X-Received: by 2002:a05:6a00:9a7:b0:725:ebab:bb2e with SMTP id d2e1a72fcca58-72d21f64e2amr4470027b3a.11.1736355512456; Wed, 08 Jan 2025 08:58:32 -0800 (PST) Received: from joshuahahn-mbp.thefacebook.com ([2620:10d:c091:500::7:bdef]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad815835sm36776186b3a.27.2025.01.08.08.58.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 08 Jan 2025 08:58:31 -0800 (PST) From: Joshua Hahn X-Google-Original-From: Joshua Hahn To: Hyeonggon Yoo Cc: Joshua Hahn , "Huang, Ying" , Gregory Price , Joshua Hahn , kernel_team@skhynix.com, 42.hyeyoo@gmail.com, "rafael@kernel.org" , "lenb@kernel.org" , "gregkh@linuxfoundation.org" , "akpm@linux-foundation.org" , =?UTF-8?B?6rmA7ZmN6recKEtJTSBIT05HR1lVKSBTeXN0ZW0gU1c=?= , =?UTF-8?B?6rmA65296riwKEtJTSBSQUtJRSkgU3lzdGVtIFNX?= , "dan.j.williams@intel.com" , "Jonathan.Cameron@huawei.com" , "dave.jiang@intel.com" , "horen.chuang@linux.dev" , "hannes@cmpxchg.org" , "linux-kernel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "kernel-team@meta.com" Subject: Re: [External Mail] Re: [External Mail] [RFC PATCH] mm/mempolicy: Weighted interleave auto-tuning Date: Wed, 8 Jan 2025 11:56:32 -0500 Message-Id: <20250108165632.76746-1-joshuahahn@meta.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <769f98b3-f5e5-448c-966e-4dd5468e5041@sk.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 08135100016 X-Rspam-User: X-Stat-Signature: 6tdnc1t1njgxxcxjs9o13mxmitw9xn5g X-HE-Tag: 1736355513-675745 X-HE-Meta: U2FsdGVkX19lWR/3INKYKlG+Gzi9V7buDWIVquyU/JZ8fatyInokpvTvcSASMlldpsHAZnEfCPznoo1cKFkJGuvYLc9DIMMoPnhbk3VxzxB8xE/9V9NwiK+p9oKx2fxd0QAd7Zfl4cUtXOqeulmLi0hIbG0xUnryO4uBRgqOMtvUicdVA2dmGkEKF3Z/O5gU6N9Q5kODfsyE2QynYeetqKXovjYMUo+1VDgdLdWrYlozbvCLP7+eklu1kSO19WXX7TcVukAjGdP4JgcjwEepam1HDolV8j7aJhcI3o9I9+QaAcpCj4jNzlqnxRzHgImMaIXDa0BwdjGzDQeuCha0z9xZcnPuv2XMzvOGkfaXqlyOwypM7E6M07qBUZgkGLm/5NXeMlNfBLsNTVzf8dCKUhaNIbIeF8a1ucZvXB6oO5aZCZkeXV1FYhghUv2xU+WzN/5BzSab5UQFVrYRC6Hb5ItIBQ5yByQ5iFQ0fMB4vhZacVIzE2q8LqTZVOcSsSJTiBHYba/UNHvweUuKCqP0IIr4iNmF9xhMyElkl5seG1a7NB+JElCghGNJuY4gtNm3vqV6niZE7lsXU3ERULxIbgHk+n5dH/RY8pGWeo1U0f/jlrOetFpwJUvSDNFD2apdc0IkZTkxI3eaV0CudXJXDQBdnVhWxT+AS5zJiieplPi5nQJZBlhrlMxdPiOJ1Zx2nJb9W6FoD14kSu3LW4VXYwUcVkVQOUSPty2kNWVTOzRg+xEBaKS9OHSzvc8G4vzb2P+FX/KmWaLHTUs5cHp+2PE8Psw3tXTR70rNXPsdvkrr5PogTb31Q4mnpTYhE/K+CUQ7gSjNPNnIPM3+Mj3NklehirH8IbCT/W8jY+A8FKG/Pz+R9qfCOk/B7pJjVIoj7j9SldNDwBNA6UtfsuO7dVks1bknG3Yz2kQN3uhKcJyvSA/TILbvqRLXFcHFGekedrOfP8xWekESdiEluDE lxO9riGY 48LXPTN2nCSjcnQZKT655nZrwOEhOm8zQO3ks59mJMBnXtbfEv73OPIgLag4NBz/3ZNv3omEPA2GqE4oIKbaJHGdQsXDr8Zmxby311S/MLQuiCSE4vi+WTZHr8rpPuVUubDNsH6HCqMEdNy5kNKtI3wEKbI1wUezHYtFTRp+eMeTfkv4gGTmwHGTuzxqb0GJ6ks+h2LIeWT5enSrIz0lPp897MsVx+Hos0ZlM3R771rSpitENnuwPFsgQ5aR5Y0lk5hvAQef+1OYEvmxmJx0SuL14pNrTM5+POHJq7De3pJsc+VLAoUEtyACHvUQZMnTO/pKvLgi4HjpwYKAj52gQKTmnerFKGGL7t8zakDYivN6fRn9r0vn85/KUtG6M4O2yH+8W1t3AdB0FoDyVcMYGQsCwHmyf1Bo7zm+std8sinLMGXVMxIkjm5DQTIMxnANoxNNiKP0+RtGDcFA1gjP0xlY5mPHw18NKCfo9Zpfxz5iAiD5x+gv9sw8jfh2uKiA2bsU/e3+YBJlTzO7NS0h4duU9eyGWOmp9GAq5UR9OST61X2KKLaEjpUk1GpR+yjtF2DT3N0HIt2MZkEj70WizaBRMYYTN1sFaMCWJfObacLKA+HsVSM9ohc4ipG5IVMoJaYFvN5ZK/SBqKEUbVhEGzWUdKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 8 Jan 2025 10:19:19 +0900 Hyeonggon Yoo wrote: > > On 2024-12-30 3:48 AM, Huang, Ying wrote: > > Gregory Price writes: > > > >> On Fri, Dec 27, 2024 at 09:59:30AM +0800, Huang, Ying wrote: > >>> Gregory Price writes: > >>>> This still allows 0 to be a manual "reset specific node to default" > >>>> mechanism for a specific node, and gives us a clean override. > >>> > >>> The difficulty is that users don't know the default value when they > >>> reset a node's weight. We don't have an interface to show them. So, I > >>> suggest to disable the functionality: "reset specific node to default". > >>> They can still use "echo 1 > use_defaults" to reset all nodes to > >>> default. > >>> > >> > >> Good point, and agree. So lets just ditch 0. Since that "feature" > >> wasn't even functional in the current code (it just reset it to 1 at > >> this point), it's probably safe to just ditch it. Worst case scenario > >> if someone takes issues, we can just have it revert the weight to 1. > > > > Before implementing the new version, it's better to summarize the user > > space interface design first. So, we can check whether we have some > > flaws. > > Hi, hope you all had a nice year-end holiday :) Hi Hyeonggon, thank you for the review! I hope you have had a great start to 2025 as well : -) > Let me summarize the points we've discussed: > > - A new knob 'weightiness' is unnecessary until it's proven useful. > Just using an internal default weightiness value will be enough. Yup, and as Gregory mentioned, we're pretty confident that 32 gives a good balance of reduction aggression & error minimization. > - It will be counter-intuitive to update the value previously set by > user, even if the value will no longer be valid (e.g. due to CXL > memory hot-plug). User should update the weights accordingly in that > case, instead of the kernel updating automatically overwriting them. Agreed. I think we should lean on the side of keeping user-set values at the highest priority, and avoid overwriting them. > - Ditch the way of using 0 as 'system default' value because the user > won't know what will be the default when setting it anyway. 0 value > now means the kernel won't weight-interleave the node. Yes, and I think this is going to be important to convey to users, especially since the current expected behavior is that hotplugged nodes are expected to automatically be included in interleaving. > - Setting a node weight to default value (e.g. via the previous > semantic of '0') could be problematic because it's not atomic - > the system may be updating default values while the user's > trying to set a node weight to default value. > > To deal with that, Huang suggested 'use_defaults' to atomically update > all the weights to system default. One concern that I have here is that even for the use_defaults interface, the users don't know what the system default values are. From the user perspective, when they make the judgement between using user values and system defaults, one side of the choice is always going to be opaque, as long as we only have a "one-layer" interface. With that said, I think that in this new design where the choice is either to only use system defaults vs. manually set everything, I think the problem is much smaller, since there is no hybrid of using both. I think this will bucket users into two categories: those who expect weighted interleave to work out of the box with no configurations, and those who will want to manually tune node weights. I think this is a reasonable categorization, especially given Huang's earlier comment on how "setting one node means the user thinks all other node weights are reasonable". As long as we support these two user types, I think we will cover most use cases. Like mentioned previously by others, I think we don't have to support use cases where some nodes are fixed to user values and others are fixed to defaults, until an explicit use case comes up. > Please let me know if there's any point we discussed that I am missing. > > Additionally I would like to mention that within an internal discussion > my colleague Honggyu suggested introducing 'mode' parameter which can be > either 'manual' or 'auto' instead of 'use_defaults' to be provide more > intuitive interface. > > With Honggyu's suggestion and the points we've discussed, > I think the interface could be: > > # At booting, the mode is 'auto' where the kernel can automatically > # update any weights. > > mode auto # User hasn't specified any weight yet. > effective [2, 1, -, -] # Using system defaults for node 0-1, > # and node 2-3 not populated yet. > > # When a new NUMA node is added (e.g. via hotplug) in the 'auto' mode, > # all weights are re-calculated based on ACPI HMAT table, including the > # weight of the new node. > > mode auto # User hasn't specified weights yet. > effective [2, 1, 1, -] # Using system defaults for node 0-2, > # and node 3 not populated yet. > > # When user set at least one weight value, change the mode to 'manual' > # where the kernel does not update any weights automatically without > # user's consent. > > mode manual # User changed the weight of node 0 to 4, > # changing the mode to manual config mode. > effective [4, 1, 1, -] > > > # When a new NUMA node is added (e.g. via hotplug) in the manual mode, > # the new node's weight is zero because it's in manual mode and user > # did not specify the weight for the new node yet. > > mode manual > effective [4, 1, 1, 0] > > # When user changes the mode to 'auto', all weights are changed to > # system defaults based on the ACPI HMAT table. > > mode auto > effective [2, 1, 1, 1] # system defaults > > In the example I did not distinguish 'default weights' and 'user > weights' because it's not important where the weight values came from -- > but it's important to know 1) what's the effective weights now and 2) if > the kernel can update them. > > Any thoughts? Please let me know if I am missing anything, but the way I understand the "mode" interface, it follows the blanket use_defaults-like semantics of auto / manual, but uses the one-layer interface like the v2 of this patch. Personally, I think this makes a lot of sesne. I have a few thoughts: First, it seems like re-weighting is something that only happens when mode == auto. I think this makes sense, since the point of re-weighting is to reduce ugly bandwidth values into interleave weight values. However, I want to gather some thoughts on whether there are any scenarios that I am missing, for when a user would want to be in manual mode but also trigger a re-weighting. One lingering thought I have is what happens when a hotplug event happens in manual mode. The way I see it, there are 3 options on what to set the value as: - 0, since the user is in manual mode and has not set any value. - 1, since we still want the node to be in use, even though the user has not explicitly stated that they want it to be in use. 1 as a default value makes sense, since it is the minimum weight a node can have, and can only help by relieving bandwidth pressure from the other nodes. - A value >= 1, determined by what the default value should have been based on the bandwidths of the other nodes. I am personally leaning towards the second option (1), but I also want to hear everyone's thoughts on what makes the most sense. > --- > Best, > Hyeonggon Happy 2025 everyone! Thank you for all of your continued interest and feedback on this patch. I am confident that we will come up with a solution that makes sense for everyone, thanks to your time & efforts. Have a great day! Joshua