From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BACB1C282EC for ; Fri, 14 Mar 2025 15:03:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFEDB280002; Fri, 14 Mar 2025 11:03:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D873E280001; Fri, 14 Mar 2025 11:03:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0219280002; Fri, 14 Mar 2025 11:03:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9FD5F280001 for ; Fri, 14 Mar 2025 11:03:05 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 935821A1267 for ; Fri, 14 Mar 2025 15:03:07 +0000 (UTC) X-FDA: 83220474414.08.2A169FD Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf19.hostedemail.com (Postfix) with ESMTP id DDA351A001E for ; Fri, 14 Mar 2025 15:03:05 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=T+zDu8Z4; spf=pass (imf19.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741964585; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v8Mh+zNRbdESRoS2HW+2cc1xUCiitiPCLc+BEAGgj/s=; b=hRDYIIuWT/6pcWsWXUwgwGkWF8YpVwYM+YextWVei+sl+jdZjl3Sbi21QY4U47iEsekhy2 wkMYi8eHVjSVh5Dk9A8rGMPLsExnkvWp7D3Xgu/JSFN4NYUMjFAwBhN+ydHYWJjLm0GUie SrV6SuUmJosUcrSeZg3ykGabeNVdH/s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741964585; a=rsa-sha256; cv=none; b=J4k6/6yaXI9J1XHhhgLm3Kkwdpi3CVhSGoPs+JCTYlNBTYNnys/ScBvcFN9yxb1jbyVCHN rWP1AlT97pfhFj0vh/1u93l1nNr1R24MPWLSBVCG2bJSliEwUS+GwtEFTjc+Im6hTcbuFA AT9lOY2P5Dus8Il2sC8RjBRXNV5Oetg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=T+zDu8Z4; spf=pass (imf19.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-6fee50bfea5so21412227b3.1 for ; Fri, 14 Mar 2025 08:03:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741964585; x=1742569385; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v8Mh+zNRbdESRoS2HW+2cc1xUCiitiPCLc+BEAGgj/s=; b=T+zDu8Z4UZnxZkd+OkTOjV7wXf3NKzPUApmrOPoN31T3CDTkHsfWuOfYvGpz+QrxmE BFpEUOIitEPH6kC6CoOPxwGI+gsaQmkAoiqxZnHN9hLhBjNG59sEn48lgUE67XxUHnPU shmMuwuQ8wHxXZGsXPbCRq9XQ0JqQmqDWHzwKhSpsQ8eyct9/QtM9DO6qx6uH8bQ9E34 yTXQuH52pSJZ0WBaaVoBCNMKqpFV/6nEjXe4m000SSijaJ4HIdzaBB6BTL3ltPEMrQmg t3GES2jsyadpUPoB2Rl8YTm1U51ABa7v4JINPUNfuo37MDaD3LUqO2dLqEnnESIxP6pf 81jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741964585; x=1742569385; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v8Mh+zNRbdESRoS2HW+2cc1xUCiitiPCLc+BEAGgj/s=; b=bqVHwFtTb2LzYKYSh6Q75hVeVU/SPILSrfBFox5Jkm2DdANULcK+MJNZkMNoWlHDt/ Vf91QeYShIdnC1erflcN/s0RNsSzhp1uQ1HtKYXOOVir1LcLZuE6wLebhgJs7FBOZC/S hPp92cIgoG4RMIEp5lZdTeyHBNI6uvSSSa96rxDLvfr4gtFITdefInR+DP4dRx1hfx1p Jh3yHZmDrogSgqW8EcQiVY0JtZ0dMF7IrjLxsvq9aCfow+U+ipze59YBNOR2oe3iRPg9 emEHRti+T2AmUQD3XQPugQPNcB4BGX0vVgVCqnpH8d5XIuIFwTBK8gUyZNZQZ5RBkWwy HLhA== X-Forwarded-Encrypted: i=1; AJvYcCUcO7Wyj1oqlPn5a0VJ390Hz56CWG5mosgPfCWH8Qm80JS18jWB1FGaesRojDMWZc4fsc7/5tIPww==@kvack.org X-Gm-Message-State: AOJu0YyfuuXCwg23T/98TH434jJXo5ai7XkBH/8qC1WDP10yBBXLzY2a /drhfdaI6mnDxI4B2aefmUHE7DQRQQ2QxftAMgc482quCpRR/y2j X-Gm-Gg: ASbGnctJnBAS0wcULwt5AqpRvmy7++9y2pTDy2YX4U/KkXPKBwqNY0QCUOtVk+5n6TZ /wp0I/VSkzjkYdU5Hb1+/PDTFvTUCzWPzSQkAS/qRDK6nze8JiAM6+9ysfPPaSLNTVO65ucWU7H KU/tNnq3McKpQFnHwiq0ITjTrUej3/huA2BUEiRK+QAdX9PdK0/Ws6OY2NuU9m+uV2j8U2Gt0ot zHsqHa05E9FoDhfVkqw3gHdhpdYZs42hMLEb+BRgXMhN5Oxx551C/A2tNkDiy1/Dc/zGxbFxDmV seIyk+5BFNf6gMie1hwdOrCTVluyWx1EgcUs3pFr/zM= X-Google-Smtp-Source: AGHT+IE+D9FHDvmZwJbouWTnYO6AS9XZR2u1G0EsBqmlS3REqDyB2KtiDOFymqzhzy+NK6BpdQ6cxg== X-Received: by 2002:a05:690c:3513:b0:6fb:9c08:4980 with SMTP id 00721157ae682-6ff460cd3e7mr32953877b3.27.1741964584903; Fri, 14 Mar 2025 08:03:04 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:1::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6ff32840d76sm9253857b3.25.2025.03.14.08.03.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Mar 2025 08:03:04 -0700 (PDT) From: Joshua Hahn To: "Huang, Ying" Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, gourry@gourry.net, hyeonggon.yoo@sk.com, honggyu.kim@sk.com, kernel-team@meta.com Subject: Re: [LSF/MM/BPF TOPIC] Weighted interleave auto-tuning Date: Fri, 14 Mar 2025 08:02:46 -0700 Message-ID: <20250314150248.774524-1-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <87frjfx6u4.fsf@DESKTOP-5N7EMDA> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: DDA351A001E X-Stat-Signature: exywypzb699dkfdacqbjr8x1cdaijf68 X-HE-Tag: 1741964585-852033 X-HE-Meta: U2FsdGVkX18k8LruOmR3tEKlhiiCYgqUzlkdCo6fsD5BRhIze4G1hqHi72rR/CjzAI4ac1+f/Q48o3YznYkLSe14DvFD00eqaU+0eFQDlz2j8254n8dHOaNpopc9OBtmt1SPbOMWKnPUKj1RLOJhYOc1HLl0pg6qp2ULnb4C3MypuyHxT4p5zMVg6hhn5CXo9N7MyxfeVNwSTaK64s6ugRK//a9714Bfn6JhQAhnBMhCtSBqAtn0YznWao1xab7d9oVIBeELc+4wwTWkpECas3UsMefnngR20lWNBWpD4CoO9jjduYzPr34WUvcSfeQFV8HKbtiaZYeG9B6kSLl8v25RL1buMO1y5BBsisLz2iwOdUZHZXKIA1mxaORZnTZx4MBSyWyZiUuz3BlVNQVD9vagoQyxNFP7AQEp2PgqXXeWxCP85E7bMcSZ/c1thGUXQqOovfPO3S4aVqH58RF29rcX0SYHBhIoIx9Pa1AYLbazHO6tdvPlQjYoEg3qib7URvimtjQq842s0ozyzCXrMHtWYARDPI/uceQcvxVxA+wkNWDbscUfxKsM1tPSQi/lTQnNj2NuI6FztXYQL7e9VasB5v7j+3maa8zYQrChwEmSI4to4z7Bf57sA4RxL40tuLpW084tAu0iwfi6bhfB1w5wEggG/lBJhQ0t8Zdlrv8EEB59QxmbOvTCTDmJ4iSDF/yDZJIpRs4fypiq54uaVsjYKh4OghKxE1pqwq+LsD/qpqnTJ/ZPZ9++VgKe8c5NkinDQWNiLxspUsaGanZdeerz4yqFeYG4qyoIzRrpv+Eig4D9OLqwKkZ7Ay3UNQp14Oj1WCNlGQIPiDy6esSTsBJi7d/kZmcam5dafgye9UeiZ2T0bZydA5dTa8Z7rtzACsSe8R6IcKopSRPri8VnS9uhWBA/lHyYRC5fnwmS+7ZmfkBVbB0Lpr6dsu75LnKavykCyln6VTyiZ866cCB LWBxB+Bw i+156GB48LVXMa9PbFUBz9ExXWYXSqZ5GXLGjPDfguuQ7xd8cn+rP/SomGQdGX6b7maC/RmPm9A88emzNWMDIiJJ9mcA/YG8HSf7b X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 14 Mar 2025 18:08:35 +0800 "Huang, Ying" wrote: > Joshua Hahn writes: > > > On Thu, 9 Jan 2025 13:50:48 -0500 Joshua Hahn wrote: > > > >> Hello everyone, I hope everyone has had a great start to 2025! > >> > >> Recently, I have been working on a patch series [1] with > >> Gregory Price that provides new default interleave > >> weights, along with dynamic re-weighting on hotplug events and a series > >> of UAPIs that allow users to configure how they want the defaults to behave. > >> > >> In introducing these new defaults, discussions have opened up in the > >> community regarding how best to create a UAPI that can provide > >> coherent and transparent interactions for the user. In particular, consider > >> this scenario: when a hotplug event happens and a node comes online > >> with new bandwidth information (and therefore changing the bandwidth > >> distributions across the system), should user-set weights be overwritten > >> to reflect the new distributions? If so, how can we justify overwriting > >> user-set values in a sysfs interface? If not, how will users manually > >> adjust the node weights to the optimal weight? > >> > >> I would like to revisit some of the design choices made for this patch, > >> including how the defaults were derived, and open the conversation to > >> hear what the community believes is a reasonable way to allow users to > >> tune weighted interleave weights. More broadly, I hope to get gather > >> community insight on how they use weighted interleave, and do my best to > >> reflect those workflows in the patch. > > > > Weighted interleave has since moved onto v7 [1], and a v8 is currently being > > drafted. Through feedback from reviewers, we have landed on a coherent UAPI > > that gives users two options: auto mode, which leaves all weight calculation > > decisions to the system, and manual mode, which leaves weighted interleave > > the same as it is without the patch. > > > > Given that the patch's functionality is mostly concrete and that the questions > > I hoped to raise during this slot were answered via patch feedback, I hope to > > ask another question during the talk: > > > > Should the system dynamically change what metrics it uses to weight the nodes, > > based on what bottlenecks the system is currently facing? > > > > In the patch, min(read_bandwidth, write_bandwidth) is used as the heuristic > > to determine what a node's weight should be. However, what if the system is > > not bottlenecked by bandwidth, but by latency? A system could also be > > bottlenecked by read bandwidth, but not by write bandwidth. > > > > Consider a scenario where a system has many memory nodes with varying > > latencies and bandwidths. When the system is not bottlenecked by bandwidth, > > it might prefer to allocate memory from nodes with lower latency. Once the > > system starts feeling pressured by bandwidth, the weights for high bandwidth > > (but also high latency) nodes would slowly increase to alleviate pressure > > from the system. Once the system is back in a manageable state, weights for > > low latency nodes would start increasing again. Users would not have to be > > aware of any of this -- they would just see the system take control of the > > weight changes as the system's needs continue to change. > > IIUC, this assumes the capacity of all kinds of memory is large enough. > However, this may be not true in some cases. So, another possibility is > that, for a system with DRAM and CXL memory nodes. > > - There is free space on DRAM node, the bandwidth of DRAM node isn't > saturated, memory is allocated on DRAM node. > > - There is no free space on DRAM node, the bandwidth of DRAM node isn't > saturated, cold pages are migrated to CXL memory nodes, while hot > pages are migrated to DRAM memory nodes. > > - The bandwidth of DRAM node is saturated, hot pages are migrated to CXL > memory nodes. > > In general, I think that the real situation is complex and this makes it > hard to implement a good policy in kernel. So, I suspect that it's > better to start with the experiments in user space. Hi Ying, thank you so much for your feedback, as always! Yes, I agree. I brought up this idea out of curiosity, since I thought that there might be room to experiment with different configurations for weighted interleave auto-tuning. As you know, we use min(read_bw, write_bw), which I think is a good heuristic that works for the intent of the weighted interleave auto-tuning patch-- I wanted to know what a system might look like, that might use different heuristics given the system's state. But I think you are right that it is difficult to implement in kernel. Thanks again, Ying! Will you be attending LSFMMBPF this year? I would love to say hello in person : -) Have a great day! Joshua > > This proposal also has some concerns that need to be addressed: > > - How reactive should the system be, and how aggressively should it tune the > > weights? We don't want the system to overreact to short spikes in pressure. > > - Does dynamic weight adjusting lead to pages being "misplaced"? Should those > > "misplaced" pages be migrated? (probably not) > > - Does this need to be in the kernel? A userspace daemon that monitors kernel > > metrics has the ability to make the changes (via the nodeN interfaces). > > > > Thoughts & comments are appreciated! Thank you, and have a great day! > > Joshua > > > > [1] https://lore.kernel.org/all/20250305200506.2529583-1-joshua.hahnjy@gmail.com/ > > > > Sent using hkml (https://github.com/sjp38/hackermail) > > --- > Best Regards, > Huang, Ying Sent using hkml (https://github.com/sjp38/hackermail)