From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF62DC71136 for ; Fri, 13 Jun 2025 16:46:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5775B6B007B; Fri, 13 Jun 2025 12:46:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 527F26B0089; Fri, 13 Jun 2025 12:46:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 416976B008A; Fri, 13 Jun 2025 12:46:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 221E36B007B for ; Fri, 13 Jun 2025 12:46:38 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B5835160631 for ; Fri, 13 Jun 2025 16:46:37 +0000 (UTC) X-FDA: 83550956034.05.C91FC5A Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf30.hostedemail.com (Postfix) with ESMTP id F287280011 for ; Fri, 13 Jun 2025 16:46:35 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nk7dqhJj; spf=pass (imf30.hostedemail.com: domain of bijan311@gmail.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=bijan311@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749833196; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8b+XF1R6IDkuZBmPOk+AjM7WkGgjUBKP1V8hx4ecgO8=; b=69v2VYcamDOYC85NbQy/Ud5vR19kO9QcnrAsuxWzjj35Jgvwpo4++9wLu4/RxH5lkU4PqX E5Y6rTxErz/5IPSBjbsYgsCCRftYODIMTGf+77l/nBChubzgfT3/5Ak+3BSub8/Jiq/Qe4 /j0v3Hr2wVUCPrkxKv5FB0z1sL2sKGo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nk7dqhJj; spf=pass (imf30.hostedemail.com: domain of bijan311@gmail.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=bijan311@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749833196; a=rsa-sha256; cv=none; b=8JqRZdFwg2nCqhUjJ6RAOPEmj3V0ifCv03l19+W4zlfqueFoh7tfkZ34ws023aktZukaUM 8yF2h+Vvdg0gJ66j5CWmAnZNvytr+XDq5+3hvNOO26lLwHUCEV6MaSNI6HMZg+k3525H/c m77iFTRLF2AxObgwB0I/4Zn8/G1W0jE= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-ad8a6c202ffso429934766b.3 for ; Fri, 13 Jun 2025 09:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749833194; x=1750437994; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8b+XF1R6IDkuZBmPOk+AjM7WkGgjUBKP1V8hx4ecgO8=; b=Nk7dqhJjJAuw0NpLYj6UfdpABr3awhoxhtJkOunnffvLJEVW79rSvfF0BhG0Y64FX+ gh8j1Ji18kmSOIU2OUa/61h1NSUb4Qcs8N3dxK90j+rvSH13IZqFWVn3ESsdgGpTMrm7 QXJOpadGKgGna6lUsTrpgwe0HzSqrBS/ap4qZW9Bm2BiSjKWEytc9gWLXYKzo6+2oQ/J LtYwP4mLIo3G1G5liKO22neM0KmGu94G0BC1v7Zb5L8p/zG1kQgFHIHpbVtFtXk0ZYuu 0BK9qlwA/OeCltWcqXzfybOD4aQSnSQpe8xpBW+yb80YtEMLfF25y4Mi5t3Snlbf9PcF /yVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749833194; x=1750437994; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8b+XF1R6IDkuZBmPOk+AjM7WkGgjUBKP1V8hx4ecgO8=; b=mrmnDqnq9yXTOJ2acUV5RKwfSwmX6iuskdg+aoaiycBBg7XerqVzQyUHA3x9LevEQj uydJMMET1S+P8Qb2vdz4ql8WRIOcu84GtGm1Abp9MY9zjg+di4iZTsXt2VqZNPurNshR hM1UjH6KZKvYWjd97qtf+Q+lsStTPUhIr79eAnv/q4GAgGWToDz9otXM6hEvVPwdsbL5 lgxZ2RjRKFJw6Dsog7cs491YrrF/T9RlRw/WsXdLrXn1ozsNrlEMk4tHH0v3zYki0Rmw NpWtQWkX1pMCntFpOwJ7cP8dW1AatjvAr+QwrYBYQgu3gfkHETmQK6wCKOepY7hVujm/ fmLA== X-Gm-Message-State: AOJu0Yxww3HUiPE6xyIWJfbsxfKMCpgL+xgCQ6pxCeRJg50GQdH+PeG8 jV7qiq6cEzmGWmjmtA4DGTht0grM/8psvBL7ZnuRKygtZ3ASD55oCGypyI8uURNur20SuJ7nQHV Cv+kzo2cFTN0ni66NiXl9AbxRjcPTuTs= X-Gm-Gg: ASbGncuRTLVbg7gUK1uiYZ2gaZhVfyPYBxtrjCq678RyTYuztaNs0JL7NTBsRlDaLYi apZBA/wNv7a7geIDZXKkfoQC8VBiswtrOxJlh1VnbiFPCFKG2j+So5G8G9gFXyl1AvCBZasj9oT YU+c16V1tIKITS8qPQMopFAVEQXt+lshPzHVfpgt5p2ozL5NrDU1AyJV69uHKF/1rzGnnHUe75D lfw X-Google-Smtp-Source: AGHT+IFAM+0ARMBkOl378yE7TD1p2iruVPP2VsDOQczbz9maxLgWC1Hy5JyiV73B37TmNy5naByi7YIX04Dwzauu6Js= X-Received: by 2002:a17:907:971e:b0:ade:32fa:739e with SMTP id a640c23a62f3a-adec54dac66mr411183266b.2.1749833194107; Fri, 13 Jun 2025 09:46:34 -0700 (PDT) MIME-Version: 1.0 References: <20250612181330.31236-1-bijan311@gmail.com> <20250613152517.225529-1-joshua.hahnjy@gmail.com> In-Reply-To: <20250613152517.225529-1-joshua.hahnjy@gmail.com> From: Bijan Tabatabai Date: Fri, 13 Jun 2025 11:46:22 -0500 X-Gm-Features: AX0GCFsgz60G6jm7BUq-axwbbcXxdq8WAFNMq5rp79wrCHhDW3-XBuJijPYHKiw Message-ID: Subject: Re: [RFC PATCH 0/4] mm/damon: Add DAMOS action to interleave data across nodes To: Joshua Hahn Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, sj@kernel.org, akpm@linux-foundation.org, corbet@lwn.net, david@redhat.com, ziy@nvidia.com, matthew.brost@intel.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bijantabatab@micron.com, venkataravis@micron.com, emirakhur@micron.com, ajayjoshi@micron.com, vtavarespetr@micron.com, damon@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: F287280011 X-Rspamd-Server: rspam03 X-Rspam-User: X-Stat-Signature: t81k41yyp9b1e9izqmr5nsmm5gpuccxe X-HE-Tag: 1749833195-153736 X-HE-Meta: U2FsdGVkX18+4mX7yyCLqns7D9RhRHH00YeKDM0S3IPGg6rUAkWG2pi8ZsFWjKqkfDegPlcYvpT0K/UEOQ/kfxoJWxKUqL+a+Xj0JMkgHMaZK4k8+ep/mWL9j36yVoBK/uB9dtMs7A9lM+UULx9UqdN0E/+LXr31e/GRILtewiEZbPne6B68ntO7fXrfDhaNBzDiyuVZK3w58+u3AtvY7AziF1Ec/m6K407JrWC3BHxqKU6lrvnqKPgi9cuwdJ0erYKYBxWTBW+OeS+u8DF5hwcuOr2q4/EtmmaG7/ll7DS+flmYBkMOIBwK1klzUNqlk+2wzHzKYya9W0yms6Ao4MgCm19zfO7PZC7OiiiUmLC7kvS9m99ompfUYR3lHxcDzAyRbmV/MWSeDiwfJ847/4115I7y2qy6nziO+kPzJLptLPcwh8leja3saY2rG0QICQrWjYD+MHkYgAnLCB/CWlt+1z7m76y3fgoIMjvnQiQKZ21MkfyfZq4V+CIvKttIgjaUXonJ73b9Wxjaarj5gaaYwtZfbJReg1clk/zjuQcvfBZQmOiZAOew91mOKfBOU3mjGZ3ko6FI/7nkt3m4Co99ZDpIOacsd3yLWk3SPhD3y9CJVHjriEshONnQS3uvrGhUsnDCgMlYGaJend1MWmnqOtDWChR1yf8xBBKnBCdvRspUfb+v3zWVrTD08+OLIjfpAy+3b6bZdc438wWJeCO4NkqOWXeKH4NJKqi3pqJEtG77gpu1yO0nW9SoCU3oA/l0dSTi0bECEgV0Y907IBAfDz8HVjCRIYEzEb4yLas1EgG0jzjEE9mnyPjK7a03OthWTN1Cc5QS55WD/EsCYU/66stWvnFMdwnTyl8LTiunAl9DrlEICdLf5Nk8PTPP2Sy94KOp/oV6+8C6qNILU4QRx9acxwYjUX/4e+poTLDoi+yMXs8Uka2kGbYmQQU50iI5ayfI56k4cF58dOS 1QYbcOpT V2nEliRdQeBfG7E1w/8gughBqRe39UrSoqbEzKkB7b57NJAuK7W/N5J9mVA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Joshua, On Fri, Jun 13, 2025 at 10:25=E2=80=AFAM Joshua Hahn wrote: > > On Thu, 12 Jun 2025 13:13:26 -0500 Bijan Tabatabai w= rote: > > > From: Bijan Tabatabai > > > > A recent patch set automatically set the interleave weight for each nod= e > > according to the node's maximum bandwidth [1]. In another thread, the p= atch > > set's author, Joshua Hahn, wondered if/how these weights should be chan= ged > > if the bandwidth utilization of the system changes [2]. > > Hi Bijan, > > Thank you for this patchset, and thank you for finding interest in my > question! > > > This patch set adds the mechanism for dynamically changing how applicat= ion > > data is interleaved across nodes while leaving the policy of what the > > interleave weights should be to userspace. It does this by adding a new > > DAMOS action: DAMOS_INTERLEAVE. We implement DAMOS_INTERLEAVE with both > > paddr and vaddr operations sets. Using the paddr version is useful for > > managing page placement globally. Using the vaddr version limits tracki= ng > > to one process per kdamond instance, but the va based tracking better > > captures spacial locality. > > > > DAMOS_INTERLEAVE interleaves pages within a region across nodes using t= he > > interleave weights at /sys/kernel/mm/mempolicy/weighted_interleave/node= > > and the page placement algorithm in weighted_interleave_nid via > > policy_nodemask. We chose to reuse the mempolicy weighted interleave > > infrastructure to avoid reimplementing code. However, this has the awkw= ard > > side effect that only pages that are mapped to processes using > > MPOL_WEIGHTED_INTERLEAVE will be migrated according to new interleave > > weights. This might be fine because workloads that want their data to b= e > > dynamically interleaved will want their newly allocated data to be > > interleaved at the same ratio. > > I think this is generally true. Maybe until a user says that they have a > usecase where they would like to have a non-weighted-interleave policy > to allocate pages, but would like to place them according to a set weight= , > we can leave support for other mempolicies out for now. > > > If exposing policy_nodemask is undesirable, we have two alternative met= hods > > for having DAMON access the interleave weights it should use. We would > > appreciate feedback on which method is preferred. > > 1. Use mpol_misplaced instead > > pros: mpol_misplaced is already exposed publically > > cons: Would require refactoring mpol_misplaced to take a struct vm_ar= ea > > instead of a struct vm_fault, and require refactoring mpol_misplaced = and > > get_vma_policy to take in a struct task_struct rather than just using > > current. Also requires processes to use MPOL_WEIGHTED_INTERLEAVE. > > 2. Add a new field to struct damos, similar to target_nid for the > > MIGRATE_HOT/COLD schemes. > > pros: Keeps changes contained inside DAMON. Would not require process= es > > to use MPOL_WEIGHTED_INTERLEAVE. > > cons: Duplicates page placement code. Requires discussion on the sysf= s > > interface to use for users to pass in the interleave weights. > > Here I agree with SJ's sentiment -- I think mpol_misplaced runs with the > context of working with current / fault contexts, like you pointed out. > Perhaps it is best to keep the scope of the changes as local as possible = : -) > As for duplicating page placement code, I think that is something we can > refine over iterations of this patchset, and maybe SJ will have some grea= t > ideas on how this can best be done as well. David Hildenbrand responded to this and proposed adding a new function that just returns the nid a folio should go on based on its mempolicy. I think t= hat's probably the best way to go for now. I think the common case would want the weights used by this and mempolicy to be the same. However, if there is a use case where different weights are desired, I don't mind coming back an= d adding that functionality. > > This patchset was tested on an AMD machine with a NUMA node with CPUs > > attached to DDR memory and a cpu-less NUMA node attached to CXL memory. > > However, this patch set should generalize to other architectures and nu= mber > > of NUMA nodes. > > I think moving the test results to the cover letter will help reviewers > better understand the intent of the work. Also, I think it will also be > very helpful to include some potential use-cases in here as well. That is= , > what workloads would benefit from placing pages according to a set ratio, > rather than using existing migration policies that adjust this based on > hotness / coldness? Noted. I will be sure to include that in the next revision. > One such use case that I can think of is using this patchset + weighted > interleave auto-tuning, which would help alleviate bandwidth limitations > by ensuring that past the allocation stage, pages are being accessed > in a way that maximizes the bandwidth usage of the system (at the cost of > latency, which may or may not even be true based on how bandwidth-bound > the workload is). This was the exact use case I envisioned for this patch. I talk about it in= more detail in my reply to SeongJae. > Thank you again for the amazing patchset! Have a great day : -) > Joshua I appreciate you taking the time to respond, Bijan