From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A32C2C7115B for ; Fri, 20 Jun 2025 18:06:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46C326B009C; Fri, 20 Jun 2025 14:06:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 442A16B009D; Fri, 20 Jun 2025 14:06:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 358186B009E; Fri, 20 Jun 2025 14:06:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 272F86B009C for ; Fri, 20 Jun 2025 14:06:10 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8D35CC108C for ; Fri, 20 Jun 2025 18:06:09 +0000 (UTC) X-FDA: 83576558058.08.D804B6F Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf23.hostedemail.com (Postfix) with ESMTP id B21EB140018 for ; Fri, 20 Jun 2025 18:06:07 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BUGX4maE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of bijan311@gmail.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=bijan311@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750442767; a=rsa-sha256; cv=none; b=Axk0LVMTqmK6sNvzDCqo5MZopg/yHsPmH+6Md5DyRLc8AfOLVRF8wUxfgHslpeohpjh7bk dFMeSWAg/XL7zpatzspTkKd3mDOKKWaTeYn7ahC3YLdFDTqlgf7ovmkI7MYICgXDlDsh3a QSOn4j3/X7BixW0g9G71oSfxG2GGnUI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BUGX4maE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of bijan311@gmail.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=bijan311@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750442767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ThTQstncnZmUa26ZltlYLWyz+7Cb8KdI0EV0/HFtDbE=; b=V310VGpI3vIr5cfIsvXvLgNmaFDowToWOrBPcQJ3g8H23+PxY/uEh4/1G0A7CCXTEC+rPg F/gWuHHwceNN24ZTEE8QQeLYkkuUC2hRhOWJWNcxLKdCaivH3F3XTEJFJIUJEsTjjccqsJ RSVT6IjBNs8+9HykQ1cve1PRRaA0gzo= Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-e731a56e111so1994518276.1 for ; Fri, 20 Jun 2025 11:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750442767; x=1751047567; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ThTQstncnZmUa26ZltlYLWyz+7Cb8KdI0EV0/HFtDbE=; b=BUGX4maEHD9nHIyeHSsGuDIN3kiOamKvgRRzCfRIWvI/MiC92PWNn1ypdi49Rvlloz ys7Z+rs6PbvNYnfcioKtkOKgiB9A7jLFO65PgmdcDWLZLTu6IVGZVAiZRpJzAlLj1Mfh Cx57pSJcGzJL49BOlyC57jRL7ccG9EtE+dlBUxteMRf72+z1olX5NLoJrrChjJlg9ItA GXhzgCmU4o7TP4g0FRJsnqx9ANvXLdw+dNEuQtkTYKTw6KmimYhP/RffnYosAdpMnhaM GWjaF1PuvoBfNl5VfY/mNJaHEgayU3y6N5oqFeawaSlthD4RquUj9Coqu8zcGDDJR7bg M+OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750442767; x=1751047567; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ThTQstncnZmUa26ZltlYLWyz+7Cb8KdI0EV0/HFtDbE=; b=kRNB28Vf+52b4ERurcSeCnbq5DJI4t2cvwv1/CQ1DY/SZbiyTth/6u542KR6BcsAuk Ukfv7xgAzQej9eACcFJe0No+rBhzvO7HR/1ZIhVZZrIde6e9p2jPUx9+MSMT1lhWgiDF PUZNj/+ARF3oj6m5SeWpiM4ESg2bkwEGlShw7RUe0m8VwGEKTacwGS53cFz5W7Nu7a4B nvtHL1hyg8J6NdqETWkuLgQ/Mq+8nq7WFj0N8amHOR1maRV5Nn5EDHevnOgYJFjbRJMB kppE1mmjZJF9ae6S7DpviuQ4rSKvcjdjhN9t1XCZcKjg0UvTFrL39HdFwD1a+VtD/LV/ cOJQ== X-Forwarded-Encrypted: i=1; AJvYcCWQvfK7epcbfQ/vqV5G93+5f76MwaQuwbQdpVw3jqgf43HrmJko7qJE7lKW+mPj2oUj/26HSaUQVg==@kvack.org X-Gm-Message-State: AOJu0YyyadTgGfr/zIgterm0bMirUbt03T674nR1qwW96NCXnrNZoujm 7AhfHXrN3tJmlKcK9tXaBzulQnb8ERE2OSl4jScu6mgeGoXiSxFOe8Yq X-Gm-Gg: ASbGnctJEhxorLZTDUF9Inakv1G3hAEV8pI6hNOalhPm9zF+jbDPFP+cMCqwViF7BE/ n0hIDA4Ta7+onjJ+IDoebAB623mB73qnNvgotg/zrbTLYcOg644XeIOpwzYrsRsDsT5jybFfc6n lIPrA4Ty7UiQZBoWgeTy9d61OEi6MZJk9xvCB39QfvzwQD34PqnI8rP/23+f/YRqAR3XrFj85mp qR/7AwKhJVDyFlnGsUCoD8fKDpRmxBwW9m9Y9VH1U54g7WSJBtuAV5iQc5G49AAtQ2OYavdNbFD eYdSPB/qdxGI5kUGqvL0HvnrWxaqS4Qp1+neWQIkSktH9JdAVSHMQqNwkZpks7hHbP2fgD5mzua kwOOhUg== X-Google-Smtp-Source: AGHT+IGpzNTvUQe9mxEVIptHo4i/QXbhpfSOs71TRKNPlVJ7HkRTdPi5RFoc3K0VnlMA3ZySDgFsug== X-Received: by 2002:a05:6902:1541:b0:e82:24b3:7ec3 with SMTP id 3f1490d57ef6-e842bcdf7fbmr4728707276.23.1750442766512; Fri, 20 Jun 2025 11:06:06 -0700 (PDT) Received: from bijan-laptop.attlocal.net ([2600:1700:680e:c000:227e:8527:3def:3ad]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e842aaed6e4sm769304276.20.2025.06.20.11.06.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Jun 2025 11:06:06 -0700 (PDT) From: Bijan Tabatabai To: damon@lists.linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: sj@kernel.org, akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bijantabatab@micron.com, venkataravis@micron.com, emirakhur@micron.com, ajayjoshi@micron.com, vtavarespetr@micron.com Subject: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions Date: Fri, 20 Jun 2025 13:04:56 -0500 Message-ID: <20250620180458.5041-1-bijan311@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: B21EB140018 X-Rspamd-Server: rspam10 X-Stat-Signature: 5bfku54jq7zgy5o8zpdquq35k7c67iww X-HE-Tag: 1750442767-387085 X-HE-Meta: U2FsdGVkX1+hjWjWTIAj9uuStGeGjq625qpii58QPkewSA2eiX0y+6ZPBE3Qvm6vn8D97IIgk/yNpqW10mRKD7ArgMf+mEip2Ptmhem6SpYVpKIntyC9a6gW/Ob1BVTa9Jhy0nUpoZfYnRdtbVElZbgmkYiOeMAhdA4JymyRI8j6bk/Ui+4OMIHzqanLNCB1utz5BdjFG8Wik+MNMsyvtvCQuWXEZRMrDpJ0oAmLFvr98OwsWmeoMM2Id9NiPFCLAT/Vi39yfjkMDUOHlyIrSedWd1GsrpjUV7KIbsyhZPX5ugo7V4hrrTOMw81iL9V2DNlrXZsEphMjOpZiH6p5gjD6rtfhoaUgU4ac14mtph4mcNXnOiAoDZNoquT7/Qy6OzeC0zMjYZ56OnRE3hu8j6Nl1/IdvE3p39Oox5wH9mhKo/yqcpB/9wRVcBXcNhc9IBR1Eath+eSkH5UXKf6izZd7q9hpiEOg2yeU7ill60zZgBN+a8Z6jyqDETqkgtSYPkeu3DIl+eZm5bvZecTA43B4nkpHy7G43mQ70Y1PxZ5aY4ySKGGrUFDXjWQfaAP1cjYgtfOHQmnuT60JiszJIbfg+dcXGDZ1hBJh90Yv0MN8J/daadVcTPscfnuLoThsd90+hO5vfwbafPbG5Ze5Z2KzmHH8Tp+EkyLgbqGe3bOZJYmwi6FWymNUU6qoTeydDwku6UjIMP/tMk8WInZaKoSICTzTH8LjzerBtxuye0BodJvh+WN3RrZiDvM207x3irZWvzJhf/l6ob0eTM5QSRQmn/AOq3m4p9P/AaDm/xo3teRch67kj5/+kIFIjMe8NIuRGR+S91g1S9V47P08WWqA+v5yUSDnEV6BxIceQjO+vQ9RX4Tku6SqYVJSy1NeoowPclAW3sdIFo/ccYlGFe7zLc0hoICuoAwY8jAA4dAACYFiKWNpZMHcw5UPhnWhkcCDNRKtw9aF3sj5MZw tKnfAnfD v0Qz/twEV4WUsLGuZ9tffcNmIwh8wosM9jWW72peOu0BzQCOgYdhLEbIL9nU7MrOCB58752LbeNOH1C/uEJ17q5K6sx5IZuSuFSDkdRrT1cKCtC/gx13GINFAYVeMd/EkEWbJu3tRPhwLHMqpMp8MLDWWqVVdmNwiQGTFvOrkgK8JO5CngQvpriVJ3ntwpdwocQaIr1qoDsXOfU0PiB/uDpy5MlsEjlFCkzkQ9Bgx2sEUhpD1xalRWtPuV3Sv3fVn+jLjzsUTiMgte4WEFC6NtSngy200txzb4IP7vNDkDWiULqOaQBND9JEPbU3yqBlzCIve7K9mO3jFHP3aKOyBZK7c1K4v8ZdE1JAUrjxVopYHZ5rmFnK4tHKwKyEkBYCBMziW32yVbD63/FbLI6MWi3tqKEhi943LhFzSHX819bJ69NzhXv/pPYLiAOD8xr/ajLvPaYSJzrfWy9+0G0acliijGijRQQXD/QQ6CFV1Jy8q7O/5Hqo1GVSlANrSX+OY8G6g6AB9ihbWo2fpqtCb/iibcXgCkAOYr2R5lATv7uJGaP4D4uwllTY0zsLvGMROv+cSn8Rf1awX2ETPaEmnJ23tmK6Hu9fNCA7dzG6ioc96Qyg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Bijan Tabatabai A recent patch set automatically sets the interleave weight for each node according to the node's maximum bandwidth [1]. In another thread, the patch set's author, Joshua Hahn, wondered if/how thes weights show be changed if the bandwidth utilization of the system changes [2] This patch set adds the mechanism for dynamically changing how application data is interleaved across nodes while leaving the policy of what the interleave weights should be to userspace. It does this by modifying the migrate_{hot,cold} DAMOS actions to allow passing in a list of migration targets to their target_nid parameter. When this is done, the migrate_{hot,cold} actions will migrate pages between the specified nodes using the global interleave weights found at /sys/kernel/mm/mempolicy/weighted_interleave/node. This functionality can be used to dynamically adjust how pages are interleaved by changing the global weights. When only a single migration target is passed to target_nid, the migrate_{hot,cold} actions will act the same as before. There have been prior discussions about how changing the interleave weights in response to the system's bandwidth utilization can be beneficial [2]. However, currently the interleave weights only are applied when data is allocated. Migrating already allocated pages according to the dynamically changing weights will better help balance the bandwidth utilization across nodes. As a toy example, imagine some application that uses 75% of the local bandwidth. Assuming sufficient capacity, when running alone, we want to keep that application's data in local memory. However, if a second instance of that application begins, using the same amount of bandwidth, it would be best to interleave the data of both processes to alleviate the bandwidth pressure from the local node. Likewise, when one of the processes ends, the data should be moves back to local memory. We imagine there would be a userspace application that would monitor system performance characteristics, such as bandwidth utilization or memory access latency, and uses that information to tune the interleave weights. Others seem to have come to a similar conclusion in previous discussions [3]. We are currently working on a userspace program that does this, but it is not quite ready to be published yet. We believe DAMON is the correct venue for the interleaving mechanism for a few reasons. First, we noticed that we don't ahve to migrate all of the application's pages to improve performance. we just need to migrate the frequently accessed pages. DAMON's existing hotness traching is very useful for this. Second, DAMON's quota system can be used to ensure we are not using too much bandwidth for migrations. Finally, as Ying pointed out [4], a complete solution must also handle when a memory node is at capacity. The existing migrate_cold action can be used in conjunction with the functionality added in this patch set to provide that complete solution. Functionality Test ================== Below is an example of this new functionality in use to confirm that these patches behave as intended. In this example, the user initially sets the interleave weights to interleave the pages at a 1:1 ratio and start an application, alloc_data, using those weights that allocates 1GB of data then sleeps. Afterwards, the weights are changes to interleave pages at a 2:1 ratio. Using numastat, we show that DAMON has migrated the application's data to match the new interleave weights. $ # Show that the migrate_hot action is used with multiple targets $ cd /sys/kernel/mm/damon/admin/kdamonds/0 $ sudo cat ./contexts/0/schemes/0/action migrate_hot $ sudo cat ./contexts/0/schemes/0/target_nid 0-1 $ # Initially interleave at a 1:1 ratio $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0 $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1 $ # Start alloc_data with the initial interleave ratio $ numactl -w 0,1 ~/alloc_data 1G & $ # Verify the initial allocation $ numastat -c -p alloc_data Per-node process memory usage (in MBs) for PID 12224 (alloc_data) Node 0 Node 1 Total ------ ------ ----- Huge 0 0 0 Heap 0 0 0 Stack 0 0 0 Private 514 514 1027 ------- ------ ------ ----- Total 514 514 1027 $ # Start interleaving at a 2:1 ratio $ echo 2 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0 $ # Verify that DAMON has migrated data to match the new ratio $ numastat -c -p alloc_data Per-node process memory usage (in MBs) for PID 12224 (alloc_data) Node 0 Node 1 Total ------ ------ ----- Huge 0 0 0 Heap 0 0 0 Stack 0 0 0 Private 684 343 1027 ------- ------ ------ ----- Total 684 343 1027 Performance Test ================ Below is a simple example showing that interleaving application data using these patches can improve application performance. To do this, we run a bandwidth intensive embedding reduction application [5]. This workload is useful for this test because it reports the time it takes each iteration to run and reuses its buffers between allocation, allowing us to clearly see the benefits of the migration. We evaluate this a 128 core/256 thread AMD CPU, with 72 GB/s of local DDR bandwidth and 26 GB/s of CXL memory. Before we start the workload, the system bandwidth utilization is low, so we start with interleave weights biased as much as possible to the local node. When the workload begins, it saturates the local bandwidth, making the page placement suboptimal. To alleviate this, we modify the interleave weights, triggering DAMON to migrate the workload's data. $ cd /sys/kernel/mm/damon/admin/kdamonds/0/ $ sudo cat ./contexts/0/schemes/0/action migrate_hot $ sudo cat ./contexts/0/schemes/0/target_nid 0-1 $ echo 255 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0 $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1 $ /eval_baseline -d amazon_All -c 255 -r 100 Eval Phase 3: Running Baseline... REPEAT # 0 Baseline Total time : 9043.24 ms REPEAT # 1 Baseline Total time : 7307.71 ms REPEAT # 2 Baseline Total time : 7301.4 ms REPEAT # 3 Baseline Total time : 7312.44 ms REPEAT # 4 Baseline Total time : 7282.43 ms # Interleave weights changed to 3:1 REPEAT # 5 Baseline Total time : 6754.78 ms REPEAT # 6 Baseline Total time : 5322.38 ms REPEAT # 7 Baseline Total time : 5359.89 ms REPEAT # 8 Baseline Total time : 5346.14 ms REPEAT # 9 Baseline Total time : 5321.98 ms Updating the interleave weights, and having DAMON migrate the workload data according to the weights resulted in an approximately 25% speedup. Questions for Reviewers ======================= 1. Are you happy with the changes to the DAMON sysfs interface? 2. Setting an interleave weight to 0 is currently not allowed. This makes sense when the weights are only used for allocation. Does it make sense to allow 0 weights now? Patches Sequence ================ This first patch exposes get_il_weight() in ./mm/internal.h to let DAMON access the interleave weights. The second patch implements the interleaving mechanism in the migrate_{hot/cold} actions. Revision History ================ Changes from v1 (https://lore.kernel.org/linux-mm/20250612181330.31236-1-bijan311@gmail.com/) - Reuse migrate_{hot,cold} actions instead of creating a new action - Remove vaddr implementation - Remove most of the use of mempolicy, instead duplicate the interleave logic and access interleave weights directly - Write more about the use case in the cover letter - Write about why DAMON was used for this in the cover letter - Add correctness test to the cover letter - Add performance test [1] https://lore.kernel.org/linux-mm/20250520141236.2987309-1-joshua.hahnjy@gmail.com/ [2] https://lore.kernel.org/linux-mm/20250313155705.1943522-1-joshua.hahnjy@gmail.com/ [3] https://lore.kernel.org/linux-mm/20250314151137.892379-1-joshua.hahnjy@gmail.com/ [4] https://lore.kernel.org/linux-mm/87frjfx6u4.fsf@DESKTOP-5N7EMDA/ [5] https://github.com/SNU-ARC/MERCI Bijan Tabatabai (2): mm/mempolicy: Expose get_il_weight() to MM mm/damon/paddr: Allow multiple migrate targets include/linux/damon.h | 8 +-- mm/damon/core.c | 9 ++-- mm/damon/lru_sort.c | 2 +- mm/damon/paddr.c | 108 +++++++++++++++++++++++++++++++++++++-- mm/damon/reclaim.c | 2 +- mm/damon/sysfs-schemes.c | 14 +++-- mm/internal.h | 6 +++ mm/mempolicy.c | 2 +- samples/damon/mtier.c | 6 ++- samples/damon/prcl.c | 2 +- 10 files changed, 138 insertions(+), 21 deletions(-) -- 2.43.5