From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69F9FC77B7C for ; Tue, 24 Jun 2025 16:02:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB7B86B00A3; Tue, 24 Jun 2025 12:02:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E902D6B00A6; Tue, 24 Jun 2025 12:02:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCCFE6B00A7; Tue, 24 Jun 2025 12:02:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CCA526B00A3 for ; Tue, 24 Jun 2025 12:02:03 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 66393160541 for ; Tue, 24 Jun 2025 16:02:03 +0000 (UTC) X-FDA: 83590760526.04.B5345AE Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf06.hostedemail.com (Postfix) with ESMTP id 4151F180015 for ; Tue, 24 Jun 2025 16:02:01 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BYSqssvf; spf=pass (imf06.hostedemail.com: domain of bijan311@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=bijan311@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750780921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Lj9ti25DVp/MGP0VBReireOtkSjPAThxOLJ5ng8BnTs=; b=whMQxkkuEZhRoTn+9YOicpV0qJPKcNg56F8UltPArMtwML4bmAqRhY70wrCrx/cn7O76hc SN3h3a8Nmv+WKGQ8h0C6yZr5SuIGavfbx401hrdI9JIAos5j5CnjQQRTCI4LAD9/VRsajr Ue8oUe4oy+rbByzRWyVkfDHOB9E5+tk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750780921; a=rsa-sha256; cv=none; b=hg86h250pr09PUAkKCeyGsHgknA51pAPJpaY9HFNu4AbDs4x7Yy/0UtwtxovC2ZYm423UI X6BcCuLb7Fr8DCoKvbknna+Zq5RjKR2vwkgIx9Yymrmgwv3k9+HhmaCuuySy9hUtts3n2E bNWUrUxsctg6f8IbFtpMZCSYqd/jZ1c= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BYSqssvf; spf=pass (imf06.hostedemail.com: domain of bijan311@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=bijan311@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-ad89c32a7b5so112870966b.2 for ; Tue, 24 Jun 2025 09:02:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750780919; x=1751385719; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Lj9ti25DVp/MGP0VBReireOtkSjPAThxOLJ5ng8BnTs=; b=BYSqssvffQgMVbrQgM8y62FvRX888DPEmJVTJOU4ufaOKTiFvcdidTu0wHrvdPitRM X6gKU5xsXb2/tn2G/67EGmhTMVJlSjYw524lHgYhotdqM3ICrbph6sKPQp5ivRoUbUss oT/+e57IDOr/VIP7qxzxXGbFYst4KG4C1Uj6hvat3i8FjadDAXTJlBKBtokavh+yrzF5 dUTQmpAz4gM5ZhKdiPizIksj1XlGpnrJjmfjX1P8FA4GUGp0hfIuqCO9hrOI8JwpKIkU prkdYTSE7wDf3BLHSTu4Ty7Mv6GrD+PTjTSVsYmH/NdfRabNmDEtnVadhQH5Sc/pZxvK EwLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750780919; x=1751385719; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Lj9ti25DVp/MGP0VBReireOtkSjPAThxOLJ5ng8BnTs=; b=mNhdUaETkWUaZxTZXvjJOxjaqTSKn9gbUPXRKlDk8DyBWwQdnjXTsDHOqpeLXHB4K6 O6ImjcvBEv2zft2gr+PveZilr1fPAJp09CY91hibPS3aNTsdKHC6hNQJ3a2TTNFmDnbA 789spbPXgYbatA7OlTB05HhRgeyExblJPjIf909TGc+2Lr2ahdNXMDkqqkmGl1ogKftC FPH/NCk551XyagoIvOywfEnJZan7Nxr/NBSv016J3QcbiN0EinTR10E65VGXdsOp0kGg 8CV/Y/BldPm4thDdTkGmATH2Fo8r/elSVWyScfmAFsPw0SOE9xkpqPvIAsaEzLRYzM3V yMtA== X-Forwarded-Encrypted: i=1; AJvYcCXKlaq7oaIU3K0GnTkF0bMkpa5+lbL98aZBuC3eUbpPy0G23apvU6G2S8QxcofZ4zk9PpTg8/xt0g==@kvack.org X-Gm-Message-State: AOJu0YzKCN5/hxKZQMV012VGutxLs5X79DSzVA2CQKBBo2d/aQ1cejsQ y3/l9IlIBdsn1UqkvCQ1iu8XSxSTzazCsQx/0HRYlYNCSiUtFPVmehwT9Vba2FwA7BFm8QHAwHd mKTsSDVYq/lEZ1r/D6Cv0n71qp8ASWSg= X-Gm-Gg: ASbGnctiICEiYqNSMSMtHb7p9Wab/8sIE9X5iMfTIXny/eREItcdStfpLpkrMJtDthB rS3v881skGUNgkJ06N6J7nDczR13SYFQTpqAqjqtHVlhfsv6JyDMUxjB6FEASZn81SOu0rs9Z0y DwMq/MKR86rGzK2QmtjZ4gjvrqh3yAEwHZmRFJjlSByio2s0GmmdPYk37U1Z48rn1rmPkL+vGJB CBw X-Google-Smtp-Source: AGHT+IE/rWo4LuQGQejkWuVAPfLkl7Xd0TELdgXEOKjve26xpJrrIkyPL0Z+tx9IZv45YFFgvtKu4whEyIbe0FKZE6w= X-Received: by 2002:a17:907:3f1c:b0:ae0:a2c8:9e48 with SMTP id a640c23a62f3a-ae0a2c8a829mr561593466b.46.1750780917604; Tue, 24 Jun 2025 09:01:57 -0700 (PDT) MIME-Version: 1.0 References: <20250624003408.47807-1-sj@kernel.org> In-Reply-To: <20250624003408.47807-1-sj@kernel.org> From: Bijan Tabatabai Date: Tue, 24 Jun 2025 11:01:46 -0500 X-Gm-Features: AX0GCFtU5OmQ3KbpVJJoMB-ovknWAN2BFV8iD6U8wxsZVmH28WGU-r0OqL4pEVg Message-ID: Subject: Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets To: SeongJae Park Cc: damon@lists.linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bijantabatab@micron.com, venkataravis@micron.com, emirakhur@micron.com, ajayjoshi@micron.com, vtavarespetr@micron.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4151F180015 X-Stat-Signature: 3x39sbhsd6n8pys78ks4ict3wh6co6xt X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1750780921-816042 X-HE-Meta: U2FsdGVkX19FBcCgC5+ZQZMWS8oabUulX9STFLCAlzrV3cXylbco4H/caUBBfE2Yh1Qms8QFJugYXrsC5ksDg9lqHi8ukEhSssS32UlKYKbSujPtCEqjGAljYhuCwVqVrZe/FiBg1TDI3Z14/Rmuc4onCZiRbQ7MtwyC8qGyHLYkDhC2NfVVMaSQDDDsgFlX6uP6w0iMBmqv5yotZKZMVMWzdeefMH3sgbh/h/J+ueid78gwiqhPfQJVyo1hIvISFRoNk5XOvBWG44c9hp0EacZ/ymhw00lZ3B1MCwCr9hdg4lLAjKjxmu4vJEeiJbnN1Dgkh48ibiBxlNOIWe4caOiVYV9iEEYWDahwy/KMYU/FhxFJNPXIXHcuxV82uVWHSgK/ZWqV52r0Qy92bxyiov6rMox2tFYyqUH6I6pEmZ5r2FbKXKjCTw8RbQkRqVK/nCTVci/2KruX3ZxhVdfSd4lj0VHTxf7Nf2wrXGVDk3HT7MYzu9b7Kmt/S02Bubh6kqUzMIQV9g6P/yvjTRr+mKi51SzImBBFKFDTvzeinZYZsgTDOhBVj1sh190XSf64FuthedU45WOtkFbeWM31BWEXQ1MUYwzEqVp7PvFVX443Pyfv7ezzd1ukodKEkvpdBKyvVjrfKdzfGP+1QoQkBa/piQ+EZFsBSDylVfzgzwWd5gYwph90ODJTSYvg9pLUuZSjEVh9R1RI4nrLBZTU+T666Qr0oi17FwKoanydMvCITBp6X1b0Vg0jozkcDpLr1vjOL8uo2zMmfCDWanTsGO8U3mWjFNGChHgR5Vls0F9o1O37M4hCSSd3Xs7p7bnCFykhbdfWnW1qxxiMlWbihcv1VKQJdKC/29h5XNY0Vv+E4UeELXVEWMkc8U+yxSDpGOLRENEVdAHHPYTH7mgjBJsq0O6bWSjeDvrcOZG5kAhfYBdSDomIO+mtPJWRMYcUi6gskX/FNz0+D4UTeol XBPi0uTb qcrrXqODwxkTawHwjrCDoU/h04oxgm/MRenffoDSUrw2DtZNkBvQV6sRy6tInXiokw1j5u3Jp/EUCgmJJ5gZPaf+LA/EHe3duoj1nCTcdauXKIQiawccpCNILTlqApGSxzScJRxG9hjjMDoqMvIST4Xi513p3Ep3sgpQOdofyOrJyg5pWf2F3hbJuk30iJA2g/gEj0sYz5uNOxoxrh+L3EgvhsLY6H6XCLmmPhx8IhAs7ZtoxwltHvvVTr++TRJrRVcwFB6TS8NYqJkhBd12NzYiK0UMqqRWuD60gUazRDP1UmRQfbC74AnAJ7R8kkoNz8p8r41ZliXKZWDQ/xQkl9LrKlEPiBnE7bgTJBPQAmvdsPHzODJ8FQ/VZv3E2HlqBnCSYXSBkUeDWHj2Rm7lwo1NEIv02LOJKeoBCfeNIoigf4mtYXroNcNMIP+ClKmWuMfPY2nnnMgbfmIN3iQCgNecJxhiRuksvafHaAREZvY/K15VD5prt6SSrDg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 23, 2025 at 7:34=E2=80=AFPM SeongJae Park wrote= : > > On Mon, 23 Jun 2025 18:15:00 -0500 Bijan Tabatabai w= rote: > > [...] > > Hi SeongJae, > > > > I really appreciate your detailed response. > > The quota auto-tuning helps, but I feel like it's still not exactly > > what I want. For example, I think a quota goal that stops migration > > based on the memory usage balance gets quite a bit more complicated > > when instead of interleaving all data, we are just interleaving *hot* > > data. I haven't looked at it extensively, but I imagine it wouldn't be > > easy to identify how much data is hot in the paddr setting, > > I don't think so, and I don't see why you think so. Could you please > elaborate? Elaborated below. > > especially > > because the regions can contain a significant amount of unallocated > > data. > > In the case, unallocated data shouldn't be accessed at all, so the region= will > just look cold to DAMON. "Significant" was too strong of a word, but if physical memory is fragmented, couldn't there be a non-negligible amount of unallocated memory in a hot region? If so, I think it would mean that you cannot simply take a sum of the sizes of the hot regions in each node to compute how the hot data is interleaved because those regions may contain unallocated memory that shouldn't count for that calculation. Does that make sense? It's very possible I might be overthinking this and it won't be an issue in practice. It might be best to not worry about it until it becomes an issue in practice. > > Also, if the interleave weights changed, for example, from 11:9 > > to 10:10, it would be preferable if only 5% of data is migrated; > > however, with the round robin approach, 50% would be. Elaborating more on this: Imagine a process begins with a weights of 3 and 2 for node 0 and 1 respectively in both DAMON and the weighted interleave policy. If you looked at the which node a page resides in for a group of contiguous pages, it would be something like this (using letters to represent the virtual addresses): A -> node 0 B -> node 0 C -> node 0 D -> node 1 E -> node 1 F -> node 0 G -> node 0 H -> node 0 I -> node 1 J -> node 1 If we use a user defined quota autotuning mechanism like you described in [1] to stop DAMON interleaving when we detect that the data is interleaved correctly, no interleaving would happen, which is good. However, let's say we change the DAMON weights to be 4:1. My understanding is that DAMON applies the scheme to regions in ascending order of physical address (for paddr schemes), so if using the round-robin algorithm you provided in [2], the interleaving would apply to the pages in node 0 first, then node 1. For the sake of simplicity, let's say in this scenario the pages in the same node are sorted by their virtual address, so the interleaving would be applied in the order ABCFGHDEIJ. This would result in the following page placement A -> node 0 B -> node 0 C -> node 0 D -> node 0 E -> node 0 F -> node 0 G -> node 1 H -> node 0 I -> node 0 J -> node 1 So, four pages, D, E, F, G, and I, have been migrated. However, if they were interleaved using their virtual addresses*, only pages D and I would have been migrated. * Technically, the mempolicy code interleaves based on the offset from the start of the VMA, but that difference doesn't change this example. > > Finally, and I > > forgot to mention this in my last message, the round-robin approach > > does away with any notion of spatial locality, which does help the > > effectiveness of interleaving [1]. Elaborating more on this. As implied by the comment in [3], interleaving works better the finer grained it is done in virtual memory. As an extreme example, if you had weights of 3:2, putting the first 60% of a process's data in node 0 and the remaining 40% in node 1 would satisfy the ratio globally, but you would likely not see the benefits of interleaving. We can see in the example above that your round-robin approach does not maintain the desired interleave ratio locally, even though it does globally. > We could use the probabilistic interleaving, if this is the problem? I don't think so. In the above example, with probabilistic interleaving, you would still migrate, on average, 20% of the pages in node 0 and 80% of the pages in node 1. Similarly, the probablistic interleaving also does not consider the virtual address, so it wouldn't maintain the interleave ratio locally in the virtual address space either. > > I don't think anything done with > > quotas can get around that. > > I think I'm not getting your points well, sorry. More elaboration of you= r > concern would be helpful. I elaborated more above. Hopefully that clears up any confusion. If you still have questions, maybe it would be easier to e-meet and have a live discussion about it? I see you have a DAMON chat slot open tomorrow at 9:30 PT [4]. If you have nothing else scheduled, maybe that would be a good time to chat? [...] > > I see where you're coming from. I think the crux of this difference is > > that in my use case, the set of nodes we are monitoring is the same as > > the set of nodes we are migrating to, while in the use case you > > describe, the set of nodes being monitored is disjoint from the set of > > migration target nodes. > > I understand and agree this difference. > > > I think this in particular makes ping ponging > > more of a problem for my use case, compared to promotion/demotion > > schemes. > > But again I'm failing at understanding this, sorry. Could I ask more > elaborations? Sure, and sorry for needing to elaborate so much. What I was trying to say is that in the case where a scheme is monitoring the same nodes it is migrating to, when it detects a hot region, it will interleave the pages in the region between the nodes. If there are two nodes, and assuming the access pattern was uniform across the region, we have now turned one hot region into two. Using the algorithms you provided earlier, the next time the scheme is applied, it will interleave both of those regions again because the only information it has about where to place pages is how many pages it has previously interleaved. Using virtual addresses to interleave solves this problem by providing one and only one location a page should be in given a set of interleave weights. When a scheme is monitoring one set of nodes and migrating to another disjoint set of nodes, you don't have this problem because once the pages are migrated, they won't be considered by the scheme until some other scheme moves those pages back into the monitored nodes. Does that make sense? > > > > > If you really need this virtual address space based > > > deterministic behavior, it would make more sense to use virtual addre= ss spaces > > > monitoring (damon-vaddr). > > > > Maybe it does make sense for me to implement vaddr versions of the > > migrate actions for my use case. > > Yes, that could also be an option. Given how much my explanations here stressed that having access to the virtual addresses solves the problems I mentioned, I think the path forward for the next revision should be: 1) Have the paddr migration scheme use the round-robin interleaving that you provided - This would be good for the use case you described where you promote pages from a node into multiple nodes of the same tier. 2) Implement a vaddr migration scheme that uses the virtual address based interleaving - This is useful for my target use case of balancing bandwidth utilization between nodes? If the vaddr scheme proves insufficient down the line for my use case, we can have another discussion at that time. How does this sound to you? > > One thing that gives me pause about > > this, is that, from what I understand, it would be harder to have > > vaddr schemes apply to processes that start after damon begins. I > > think to do that, one would have to detect when a process starts, and > > then do a damon tune to upgrade the targets list? It would be nice if, > > say, you could specify a cgroup as a vaddr target and track all > > processes in that cgroup, but that would be a different patchset for > > another day. > > I agree that could be a future thing to do. Note that DAMON user-space t= ool > implements[1] a similar feature. Thanks, I'll take a look at that. [...] Thanks again for the time you are spending on these discussions. I do appreciate it, and I hope I'm not taking up too much of your time. Bijan [1] https://lore.kernel.org/damon/20250623175204.43917-1-sj@kernel.org/ [2] https://lore.kernel.org/damon/20250621180215.36243-1-sj@kernel.org/ [3] https://elixir.bootlin.com/linux/v6.16-rc3/source/mm/mempolicy.c#L213 [4] https://lore.kernel.org/damon/20250620205819.98472-1-sj@kernel.org/T/#t