From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A72C282EC for ; Fri, 14 Mar 2025 15:11:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EF32280003; Fri, 14 Mar 2025 11:11:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79E44280001; Fri, 14 Mar 2025 11:11:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6679F280003; Fri, 14 Mar 2025 11:11:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4B55E280001 for ; Fri, 14 Mar 2025 11:11:46 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 34A7E56AFE for ; Fri, 14 Mar 2025 15:11:44 +0000 (UTC) X-FDA: 83220496128.03.09FA5E8 Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf21.hostedemail.com (Postfix) with ESMTP id BE7531C0014 for ; Fri, 14 Mar 2025 15:11:41 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GNZ8PUDR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741965101; a=rsa-sha256; cv=none; b=g5FRWCNlQHx3C912vxkuZDShdcdECdePhnuYFhxiW0yFY+fAAkmcH+q0cmvX7i7igqMjxP IZQjqgx58GjdAxjIg+OV3RJZrDZVGTN30bj9s07ImPPH/naRAo8xyyWVk0/wAqQKALFcS6 t/3n0VX3gYKjRRwUGziCxE0rT3UV0XA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GNZ8PUDR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741965101; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rVFdW0JzvmKVfiJgTP3XByL6LpPYI/HB61BI1n8otAI=; b=Y1FUtpukHnw0tCPLTzuW/zoHbfTAKWy9orGgfV7w8zGA+eGW+PQtZY7aBNEkIHI/jU/9qp LM26VbtA4vACyFVju2702IMccnqgSJj/hjtL4eA8BsWfww18zu2ACTA2LLCpheTpLe1zF5 lz6kftYwabxT23XdVi68Dnp2Hf/b9sg= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-e60aef2711fso1675661276.2 for ; Fri, 14 Mar 2025 08:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741965101; x=1742569901; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rVFdW0JzvmKVfiJgTP3XByL6LpPYI/HB61BI1n8otAI=; b=GNZ8PUDRB7UIsHmdGZR1L06ZtMsYDUkSBi1mYhBH36d5T4f37B7qr9R5UZ2/DpHjKI SWM5KXxySiyv3HAOOkyTs573mtiLsVC2j4ZWUYZeNFQwl5OHvKWWMpFYBA2SJHdmEjWG jgs5Nb90FFeN7ktxG7iJWr4+rTfsCJFBwuHG8IvLgbJaJELMgd6CD+lWVGTFp2uUDTQc Jb2DfixGc6FCR2XO+Q7aysL3fuLG3/p6EblU21mLjziFutGtONB/SHnJhF+I30JplhcV CG3eY7uzT7AR9ghqFHwPC+NWZ7hYVipDiYEL/SlHs6Rvhrp9M2ZuAL9grmMjgo4aWQXs j+mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741965101; x=1742569901; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rVFdW0JzvmKVfiJgTP3XByL6LpPYI/HB61BI1n8otAI=; b=w6WOw00PKAC3DZy8uBEW6n++rK7zAtdgRWBdioKhiHzVerfJr9cboebM+I5KZUzIlD LeGvH/eexqds811CamjyaX6JnNEAjNEz5Hy6GJ+HV2ifb2FJMyc57kOqegSshXCJPQD9 1EjNEXuLDYXqFHl6yacowtzem+8P1EOlP4JP8Sq29vIwfpQyMqH0IZgG//tATjsi16Fm KXBdA/VGPVVpP4i8DqBZuVxGI17Bxqu67aLnX9G6Jb88utnaNI5tBFfM9k18wBP8Anit dBUxF4JZguKkG2D5+N6D/D0R/vuR5bUx2AcuN2mrJAUd4+a52C0eF44bc/tWgQKKtHSV V8BA== X-Forwarded-Encrypted: i=1; AJvYcCU6lKbaFj5MUCGvwoJ40W0lSs/cZs1vvVgRh0RPcprxzkQpl/KPCI80dtgMlJBV0jzm8wz+cg/iug==@kvack.org X-Gm-Message-State: AOJu0YyaLl6QGDdO/yr/joKc4IcqzmQmu5UoK/rIOjIJODUFQpmxMpf2 xgnrQ8e+cBTd9waZJvH8NuVglxRRqPHg/H2xfj6J2tv6oaRHalvd X-Gm-Gg: ASbGncsCOIRFM7tWkeQ9rcWD6WxjRHtekuRH8kg8uv5NN+rWsHGbakathe0K/fWW+4o uC7zxkC1IFrnzG8t6u8I7+E2H6GSr/mmX+ONYMYygxdpQtfHxs+JTigEKHYAps4fpNVHWeXZWf7 RlnXklo9Cj/L4X2WL6umD3Nln1WH3U/7KkQy+XcpBUld9G7VR9U7jOCxrHsa0qOcYTiexDwN7Yw sGsSVRxgAbo6bPAUIzc/anoALU7BYsDuIhDvA6xGzyg1um9Qn3OKkp9te+5L6/QqQAS5xISO297 Ce2zgGwXJ/YqeNh9H1uDo+zNGYt146IT4VGlWXq1b40= X-Google-Smtp-Source: AGHT+IFirqbnNxGBvRNLFT2ybNgJhdhoVirc5Y8Fuawrx63X0CEMuMxGSQYqtGq8V4vAow53ZLKMjA== X-Received: by 2002:a05:6902:e03:b0:e5d:dd0a:7fdb with SMTP id 3f1490d57ef6-e63f6533cf8mr3008971276.28.1741965100614; Fri, 14 Mar 2025 08:11:40 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:1::]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e63e742889csm874660276.40.2025.03.14.08.11.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Mar 2025 08:11:39 -0700 (PDT) From: Joshua Hahn To: Jonathan Cameron Cc: "Huang, Ying" , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, gourry@gourry.net, hyeonggon.yoo@sk.com, honggyu.kim@sk.com, kernel-team@meta.com Subject: Re: [LSF/MM/BPF TOPIC] Weighted interleave auto-tuning Date: Fri, 14 Mar 2025 08:11:16 -0700 Message-ID: <20250314151137.892379-1-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250314141541.00003fad@huawei.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: giem8uy1owsq9ktrc783ja74doi3udfp X-Rspamd-Queue-Id: BE7531C0014 X-HE-Tag: 1741965101-865489 X-HE-Meta: U2FsdGVkX18ccN3tD3c9PMAqSuHdU0gjyEbFa/wmbRN1PiF75tr5QW8PX0b38czqfzeCcwEv75ozxwQbODH1/+ggIQ7r5dHzjpnb0LRYZsjC9YUE5EWF6u4Q+zHrFNR0vA0CGD+DXGb3FvlH/EaAaIMfnyMeMLlNnfXeRy/+B6EzjIs0uqKMsHIVAuD5lHWy5tP8ZkGzemjAnB4l/Kk978DJ4TYVun63kaESPEQ512XdIYfFTAhhfz1afvakPQKTHdUNNMpUyNb1M3ZZSGE1/JAXwi6OLQsl/WTLt0BDc0tIn1VjR3obOqk66OtwwVYwhBmvTJniV0NfqE8S5M0Wt60f7xSBwos7IcLcLIU8uKP5NSwWQNqB/e0HcnKpaUB9W1sqP8mzthtWlYtIFQeK0bgZjkwZaz13WmOnN6Ccrmo4PzQ8DwmirkF75SlZmY+HKbooXHDyisXDNkcf9Y9lb9wdY0acvJ99AGx7JGEtFYrHk4mwgdgnjoBdGjzzPY6nN7xKjeLfzlx58UZ2g8B4L9t/HINo6k2pj8OwR1ckIrpRfIXkGwbNVZf4A0FxoiIa/LNG8XcvJUAV5FR8PriXhRvxm8Qql6v6vg/CEd0nijMWQ8g+bPq1n1Izbj/FPgl9bmR32+34JzfZciavDY1/q8fniq+ESuUVZOJ26Ujg6AtO+FO1AxC/uRVOjmCZgyvI+NaKyHp2U1/1Exjz+jONGpbUS4OHxdzilZnyqx4noeoKFYRCwhsgE+bx6UM8u26ZKwLPX9MXloPzzgv9g9UBP9wunWOmVjQu76oAR8MaY03NTOv2eRBVMwLL7oRmCisUcpv9blzWhSE787GyEd9Eiwjk/viunioB8qEvp/x313IuFauujK8ltbI1Li8tVBwRqe7ff27GvMLptrV9+4cHPtTPTWFyOsgpbeefvy4aLAYgT8GALa5v/nwNXFTa0Y6D77mw445siI3JnHUzbpi zNFhyTSS Ue85cwTF6MDbWyn0tzKw+lX8K8+mTYER0atK81diy121DHIyULTkjxVi84vOebMcshRs/bFy6TXJsRW4JAsgEBuhyKaiC3dkgUvMS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 14 Mar 2025 14:15:41 +0000 Jonathan Cameron wrote: > On Fri, 14 Mar 2025 18:08:35 +0800 > "Huang, Ying" wrote: > > > Joshua Hahn writes: > > > > > On Thu, 9 Jan 2025 13:50:48 -0500 Joshua Hahn wrote: > > > > > >> Hello everyone, I hope everyone has had a great start to 2025! > > >> > > >> Recently, I have been working on a patch series [1] with > > >> Gregory Price that provides new default interleave > > >> weights, along with dynamic re-weighting on hotplug events and a series > > >> of UAPIs that allow users to configure how they want the defaults to behave. > > >> > > >> In introducing these new defaults, discussions have opened up in the > > >> community regarding how best to create a UAPI that can provide > > >> coherent and transparent interactions for the user. In particular, consider > > >> this scenario: when a hotplug event happens and a node comes online > > >> with new bandwidth information (and therefore changing the bandwidth > > >> distributions across the system), should user-set weights be overwritten > > >> to reflect the new distributions? If so, how can we justify overwriting > > >> user-set values in a sysfs interface? If not, how will users manually > > >> adjust the node weights to the optimal weight? > > >> > > >> I would like to revisit some of the design choices made for this patch, > > >> including how the defaults were derived, and open the conversation to > > >> hear what the community believes is a reasonable way to allow users to > > >> tune weighted interleave weights. More broadly, I hope to get gather > > >> community insight on how they use weighted interleave, and do my best to > > >> reflect those workflows in the patch. > > > > > > Weighted interleave has since moved onto v7 [1], and a v8 is currently being > > > drafted. Through feedback from reviewers, we have landed on a coherent UAPI > > > that gives users two options: auto mode, which leaves all weight calculation > > > decisions to the system, and manual mode, which leaves weighted interleave > > > the same as it is without the patch. > > > > > > Given that the patch's functionality is mostly concrete and that the questions > > > I hoped to raise during this slot were answered via patch feedback, I hope to > > > ask another question during the talk: > > > > > > Should the system dynamically change what metrics it uses to weight the nodes, > > > based on what bottlenecks the system is currently facing? > > > > > > In the patch, min(read_bandwidth, write_bandwidth) is used as the heuristic > > > to determine what a node's weight should be. However, what if the system is > > > not bottlenecked by bandwidth, but by latency? A system could also be > > > bottlenecked by read bandwidth, but not by write bandwidth. > > > > > > Consider a scenario where a system has many memory nodes with varying > > > latencies and bandwidths. When the system is not bottlenecked by bandwidth, > > > it might prefer to allocate memory from nodes with lower latency. Once the > > > system starts feeling pressured by bandwidth, the weights for high bandwidth > > > (but also high latency) nodes would slowly increase to alleviate pressure > > > from the system. Once the system is back in a manageable state, weights for > > > low latency nodes would start increasing again. Users would not have to be > > > aware of any of this -- they would just see the system take control of the > > > weight changes as the system's needs continue to change. > > > > IIUC, this assumes the capacity of all kinds of memory is large enough. > > However, this may be not true in some cases. So, another possibility is > > that, for a system with DRAM and CXL memory nodes. > > > > - There is free space on DRAM node, the bandwidth of DRAM node isn't > > saturated, memory is allocated on DRAM node. > > > > - There is no free space on DRAM node, the bandwidth of DRAM node isn't > > saturated, cold pages are migrated to CXL memory nodes, while hot > > pages are migrated to DRAM memory nodes. > > > > - The bandwidth of DRAM node is saturated, hot pages are migrated to CXL > > memory nodes. > > > > In general, I think that the real situation is complex and this makes it > > hard to implement a good policy in kernel. So, I suspect that it's > > better to start with the experiments in user space. > > > > > This proposal also has some concerns that need to be addressed: > > > - How reactive should the system be, and how aggressively should it tune the > > > weights? We don't want the system to overreact to short spikes in pressure. > > > - Does dynamic weight adjusting lead to pages being "misplaced"? Should those > > > "misplaced" pages be migrated? (probably not) > > > - Does this need to be in the kernel? A userspace daemon that monitors kernel > > > metrics has the ability to make the changes (via the nodeN interfaces). > > If this was done in kernel, what metrics would make sense to drive this? > Similar to hot page tracking we may run into contention with PMUs or similar and > their other use cases. Hello Jonathan, thank you for your interest in this proposal! Yes, I think you and Ying both bring up great points about how this is probably something more suitable for a userspace program. Usespace probably has more information about the characteristics of the workload, and I agree with your point about contention. If the kernel thread doesn't probe frequently, then it would be making poor allocation decisions based on stale data, but if it does probe frequently, it would incur lots of overhead from the contention (and make other contending threads slower as well). Not to mention, there is also the overhead of probing itself : -) I will keep thinking about these questions, and see if I can come up with any interesting ideas to discuss during LSFMMBPF. Thank you again for your interest, I hope you have a great day! Joshua > > > > > > Thoughts & comments are appreciated! Thank you, and have a great day! > > > Joshua > > > > > > [1] https://lore.kernel.org/all/20250305200506.2529583-1-joshua.hahnjy@gmail.com/ > > > > > > Sent using hkml (https://github.com/sjp38/hackermail) > > > > --- > > Best Regards, > > Huang, Ying > > Sent using hkml (https://github.com/sjp38/hackermail)