From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 69F9FC77B7C
	for <linux-mm@archiver.kernel.org>; Tue, 24 Jun 2025 16:02:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EB7B86B00A3; Tue, 24 Jun 2025 12:02:03 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E902D6B00A6; Tue, 24 Jun 2025 12:02:03 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id DCCFE6B00A7; Tue, 24 Jun 2025 12:02:03 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id CCA526B00A3
	for <linux-mm@kvack.org>; Tue, 24 Jun 2025 12:02:03 -0400 (EDT)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 66393160541
	for <linux-mm@kvack.org>; Tue, 24 Jun 2025 16:02:03 +0000 (UTC)
X-FDA: 83590760526.04.B5345AE
Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45])
	by imf06.hostedemail.com (Postfix) with ESMTP id 4151F180015
	for <linux-mm@kvack.org>; Tue, 24 Jun 2025 16:02:01 +0000 (UTC)
Authentication-Results: imf06.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=BYSqssvf;
	spf=pass (imf06.hostedemail.com: domain of bijan311@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=bijan311@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1750780921;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Lj9ti25DVp/MGP0VBReireOtkSjPAThxOLJ5ng8BnTs=;
	b=whMQxkkuEZhRoTn+9YOicpV0qJPKcNg56F8UltPArMtwML4bmAqRhY70wrCrx/cn7O76hc
	SN3h3a8Nmv+WKGQ8h0C6yZr5SuIGavfbx401hrdI9JIAos5j5CnjQQRTCI4LAD9/VRsajr
	Ue8oUe4oy+rbByzRWyVkfDHOB9E5+tk=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750780921; a=rsa-sha256;
	cv=none;
	b=hg86h250pr09PUAkKCeyGsHgknA51pAPJpaY9HFNu4AbDs4x7Yy/0UtwtxovC2ZYm423UI
	X6BcCuLb7Fr8DCoKvbknna+Zq5RjKR2vwkgIx9Yymrmgwv3k9+HhmaCuuySy9hUtts3n2E
	bNWUrUxsctg6f8IbFtpMZCSYqd/jZ1c=
ARC-Authentication-Results: i=1;
	imf06.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=BYSqssvf;
	spf=pass (imf06.hostedemail.com: domain of bijan311@gmail.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=bijan311@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-ad89c32a7b5so112870966b.2
        for <linux-mm@kvack.org>; Tue, 24 Jun 2025 09:02:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1750780919; x=1751385719; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Lj9ti25DVp/MGP0VBReireOtkSjPAThxOLJ5ng8BnTs=;
        b=BYSqssvffQgMVbrQgM8y62FvRX888DPEmJVTJOU4ufaOKTiFvcdidTu0wHrvdPitRM
         X6gKU5xsXb2/tn2G/67EGmhTMVJlSjYw524lHgYhotdqM3ICrbph6sKPQp5ivRoUbUss
         oT/+e57IDOr/VIP7qxzxXGbFYst4KG4C1Uj6hvat3i8FjadDAXTJlBKBtokavh+yrzF5
         dUTQmpAz4gM5ZhKdiPizIksj1XlGpnrJjmfjX1P8FA4GUGp0hfIuqCO9hrOI8JwpKIkU
         prkdYTSE7wDf3BLHSTu4Ty7Mv6GrD+PTjTSVsYmH/NdfRabNmDEtnVadhQH5Sc/pZxvK
         EwLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1750780919; x=1751385719;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Lj9ti25DVp/MGP0VBReireOtkSjPAThxOLJ5ng8BnTs=;
        b=mNhdUaETkWUaZxTZXvjJOxjaqTSKn9gbUPXRKlDk8DyBWwQdnjXTsDHOqpeLXHB4K6
         O6ImjcvBEv2zft2gr+PveZilr1fPAJp09CY91hibPS3aNTsdKHC6hNQJ3a2TTNFmDnbA
         789spbPXgYbatA7OlTB05HhRgeyExblJPjIf909TGc+2Lr2ahdNXMDkqqkmGl1ogKftC
         FPH/NCk551XyagoIvOywfEnJZan7Nxr/NBSv016J3QcbiN0EinTR10E65VGXdsOp0kGg
         8CV/Y/BldPm4thDdTkGmATH2Fo8r/elSVWyScfmAFsPw0SOE9xkpqPvIAsaEzLRYzM3V
         yMtA==
X-Forwarded-Encrypted: i=1; AJvYcCXKlaq7oaIU3K0GnTkF0bMkpa5+lbL98aZBuC3eUbpPy0G23apvU6G2S8QxcofZ4zk9PpTg8/xt0g==@kvack.org
X-Gm-Message-State: AOJu0YzKCN5/hxKZQMV012VGutxLs5X79DSzVA2CQKBBo2d/aQ1cejsQ
	y3/l9IlIBdsn1UqkvCQ1iu8XSxSTzazCsQx/0HRYlYNCSiUtFPVmehwT9Vba2FwA7BFm8QHAwHd
	mKTsSDVYq/lEZ1r/D6Cv0n71qp8ASWSg=
X-Gm-Gg: ASbGnctiICEiYqNSMSMtHb7p9Wab/8sIE9X5iMfTIXny/eREItcdStfpLpkrMJtDthB
	rS3v881skGUNgkJ06N6J7nDczR13SYFQTpqAqjqtHVlhfsv6JyDMUxjB6FEASZn81SOu0rs9Z0y
	DwMq/MKR86rGzK2QmtjZ4gjvrqh3yAEwHZmRFJjlSByio2s0GmmdPYk37U1Z48rn1rmPkL+vGJB
	CBw
X-Google-Smtp-Source: AGHT+IE/rWo4LuQGQejkWuVAPfLkl7Xd0TELdgXEOKjve26xpJrrIkyPL0Z+tx9IZv45YFFgvtKu4whEyIbe0FKZE6w=
X-Received: by 2002:a17:907:3f1c:b0:ae0:a2c8:9e48 with SMTP id
 a640c23a62f3a-ae0a2c8a829mr561593466b.46.1750780917604; Tue, 24 Jun 2025
 09:01:57 -0700 (PDT)
MIME-Version: 1.0
References: <CAMvvPS4CNzc7gSF8Z+6ogB212V+GDJyW9PXrrrP+wMyDNfXKqg@mail.gmail.com>
 <20250624003408.47807-1-sj@kernel.org>
In-Reply-To: <20250624003408.47807-1-sj@kernel.org>
From: Bijan Tabatabai <bijan311@gmail.com>
Date: Tue, 24 Jun 2025 11:01:46 -0500
X-Gm-Features: AX0GCFtU5OmQ3KbpVJJoMB-ovknWAN2BFV8iD6U8wxsZVmH28WGU-r0OqL4pEVg
Message-ID: <CAMvvPS5kgOJtb6U+9TwEqSDYn0R2FG6rAPWjS98hAdHr4jkKbg@mail.gmail.com>
Subject: Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, 
	matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, 
	byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, 
	apopple@nvidia.com, bijantabatab@micron.com, venkataravis@micron.com, 
	emirakhur@micron.com, ajayjoshi@micron.com, vtavarespetr@micron.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 4151F180015
X-Stat-Signature: 3x39sbhsd6n8pys78ks4ict3wh6co6xt
X-Rspam-User: 
X-Rspamd-Server: rspam01
X-HE-Tag: 1750780921-816042
X-HE-Meta: U2FsdGVkX19FBcCgC5+ZQZMWS8oabUulX9STFLCAlzrV3cXylbco4H/caUBBfE2Yh1Qms8QFJugYXrsC5ksDg9lqHi8ukEhSssS32UlKYKbSujPtCEqjGAljYhuCwVqVrZe/FiBg1TDI3Z14/Rmuc4onCZiRbQ7MtwyC8qGyHLYkDhC2NfVVMaSQDDDsgFlX6uP6w0iMBmqv5yotZKZMVMWzdeefMH3sgbh/h/J+ueid78gwiqhPfQJVyo1hIvISFRoNk5XOvBWG44c9hp0EacZ/ymhw00lZ3B1MCwCr9hdg4lLAjKjxmu4vJEeiJbnN1Dgkh48ibiBxlNOIWe4caOiVYV9iEEYWDahwy/KMYU/FhxFJNPXIXHcuxV82uVWHSgK/ZWqV52r0Qy92bxyiov6rMox2tFYyqUH6I6pEmZ5r2FbKXKjCTw8RbQkRqVK/nCTVci/2KruX3ZxhVdfSd4lj0VHTxf7Nf2wrXGVDk3HT7MYzu9b7Kmt/S02Bubh6kqUzMIQV9g6P/yvjTRr+mKi51SzImBBFKFDTvzeinZYZsgTDOhBVj1sh190XSf64FuthedU45WOtkFbeWM31BWEXQ1MUYwzEqVp7PvFVX443Pyfv7ezzd1ukodKEkvpdBKyvVjrfKdzfGP+1QoQkBa/piQ+EZFsBSDylVfzgzwWd5gYwph90ODJTSYvg9pLUuZSjEVh9R1RI4nrLBZTU+T666Qr0oi17FwKoanydMvCITBp6X1b0Vg0jozkcDpLr1vjOL8uo2zMmfCDWanTsGO8U3mWjFNGChHgR5Vls0F9o1O37M4hCSSd3Xs7p7bnCFykhbdfWnW1qxxiMlWbihcv1VKQJdKC/29h5XNY0Vv+E4UeELXVEWMkc8U+yxSDpGOLRENEVdAHHPYTH7mgjBJsq0O6bWSjeDvrcOZG5kAhfYBdSDomIO+mtPJWRMYcUi6gskX/FNz0+D4UTeol
 XBPi0uTb
 qcrrXqODwxkTawHwjrCDoU/h04oxgm/MRenffoDSUrw2DtZNkBvQV6sRy6tInXiokw1j5u3Jp/EUCgmJJ5gZPaf+LA/EHe3duoj1nCTcdauXKIQiawccpCNILTlqApGSxzScJRxG9hjjMDoqMvIST4Xi513p3Ep3sgpQOdofyOrJyg5pWf2F3hbJuk30iJA2g/gEj0sYz5uNOxoxrh+L3EgvhsLY6H6XCLmmPhx8IhAs7ZtoxwltHvvVTr++TRJrRVcwFB6TS8NYqJkhBd12NzYiK0UMqqRWuD60gUazRDP1UmRQfbC74AnAJ7R8kkoNz8p8r41ZliXKZWDQ/xQkl9LrKlEPiBnE7bgTJBPQAmvdsPHzODJ8FQ/VZv3E2HlqBnCSYXSBkUeDWHj2Rm7lwo1NEIv02LOJKeoBCfeNIoigf4mtYXroNcNMIP+ClKmWuMfPY2nnnMgbfmIN3iQCgNecJxhiRuksvafHaAREZvY/K15VD5prt6SSrDg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, Jun 23, 2025 at 7:34=E2=80=AFPM SeongJae Park <sj@kernel.org> wrote=
:
>
> On Mon, 23 Jun 2025 18:15:00 -0500 Bijan Tabatabai <bijan311@gmail.com> w=
rote:
>
> [...]
> > Hi SeongJae,
> >
> > I really appreciate your detailed response.
> > The quota auto-tuning helps, but I feel like it's still not exactly
> > what I want. For example, I think a quota goal that stops migration
> > based on the memory usage balance gets quite a bit more complicated
> > when instead of interleaving all data, we are just interleaving *hot*
> > data. I haven't looked at it extensively, but I imagine it wouldn't be
> > easy to identify how much data is hot in the paddr setting,
>
> I don't think so, and I don't see why you think so.  Could you please
> elaborate?

Elaborated below.

> > especially
> > because the regions can contain a significant amount of unallocated
> > data.
>
> In the case, unallocated data shouldn't be accessed at all, so the region=
 will
> just look cold to DAMON.

"Significant" was too strong of a word, but if physical memory is
fragmented, couldn't there be a non-negligible amount of unallocated
memory in a hot region? If so, I think it would mean that you cannot
simply take a sum of the sizes of the hot regions in each node to
compute how the hot data is interleaved because those regions may
contain unallocated memory that shouldn't count for that calculation.
Does that make sense?

It's very possible I might be overthinking this and it won't be an
issue in practice. It might be best to not worry about it until it
becomes an issue in practice.

> > Also, if the interleave weights changed, for example, from 11:9
> > to 10:10, it would be preferable if only 5% of data is migrated;
> > however, with the round robin approach, 50% would be.

Elaborating more on this:
Imagine a process begins with a weights of 3 and 2 for node 0 and 1
respectively in both DAMON and the weighted interleave policy. If you
looked at the which node a page resides in for a group of contiguous
pages, it would be something like this (using letters to represent the
virtual addresses):

A -> node 0
B -> node 0
C -> node 0
D -> node 1
E -> node 1
F -> node 0
G -> node 0
H -> node 0
I -> node 1
J -> node 1

If we use a user defined quota autotuning mechanism like you described
in [1] to stop DAMON interleaving when we detect that the data is
interleaved correctly, no interleaving would happen, which is good.
However, let's say we change the DAMON weights to be 4:1. My
understanding is that DAMON applies the scheme to regions in ascending
order of physical address (for paddr schemes), so if using the
round-robin algorithm you provided in [2], the interleaving would
apply to the pages in node 0 first, then node 1. For the sake of
simplicity, let's say in this scenario the pages in the same node are
sorted by their virtual address, so the interleaving would be applied
in the order ABCFGHDEIJ. This would result in the following page
placement

A -> node 0
B -> node 0
C -> node 0
D -> node 0
E -> node 0
F -> node 0
G -> node 1
H -> node 0
I -> node 0
J -> node 1

So, four pages, D, E, F, G,  and I, have been migrated. However, if
they were interleaved using their virtual addresses*, only pages D and
I would have been migrated.

* Technically, the mempolicy code interleaves based on the offset from
the start of the VMA, but that difference doesn't change this example.

> > Finally, and I
> > forgot to mention this in my last message, the round-robin approach
> > does away with any notion of spatial locality, which does help the
> > effectiveness of interleaving [1].

Elaborating more on this.
As implied by the comment in [3], interleaving works better the finer
grained it is done in virtual memory. As an extreme example, if you
had weights of 3:2, putting the first 60% of a process's data in node
0 and the remaining 40% in node 1 would satisfy the ratio globally,
but you would likely not see the benefits of interleaving. We can see
in the example above that your round-robin approach does not maintain
the desired interleave ratio locally, even though it does globally.

> We could use the probabilistic interleaving, if this is the problem?

I don't think so. In the above example, with probabilistic
interleaving, you would still migrate, on average, 20% of the pages in
node 0 and 80% of the pages in node 1. Similarly, the probablistic
interleaving also does not consider the virtual address, so it
wouldn't maintain the interleave ratio locally in the virtual address
space either.

> > I don't think anything done with
> > quotas can get around that.
>
> I think I'm not getting your points well, sorry.  More elaboration of you=
r
> concern would be helpful.

I elaborated more above. Hopefully that clears up any confusion. If
you still have questions, maybe it would be easier to e-meet and have
a live discussion about it? I see you have a DAMON chat slot open
tomorrow at 9:30 PT [4]. If you have nothing else scheduled, maybe
that would be a good time to chat?

[...]

> > I see where you're coming from. I think the crux of this difference is
> > that in my use case, the set of nodes we are monitoring is the same as
> > the set of nodes we are migrating to, while in the use case you
> > describe, the set of nodes being monitored is disjoint from the set of
> > migration target nodes.
>
> I understand and agree this difference.
>
> > I think this in particular makes ping ponging
> > more of a problem for my use case, compared to promotion/demotion
> > schemes.
>
> But again I'm failing at understanding this, sorry.  Could I ask more
> elaborations?

Sure, and sorry for needing to elaborate so much.

What I was trying to say is that in the case where a scheme is
monitoring the same nodes it is migrating to, when it detects a hot
region, it will interleave the pages in the region between the nodes.
If there are two nodes, and assuming the access pattern was uniform
across the region, we have now turned one hot region into two. Using
the algorithms you provided earlier, the next time the scheme is
applied, it will interleave both of those regions again because the
only information it has about where to place pages is how many pages
it has previously interleaved. Using virtual addresses to interleave
solves this problem by providing one and only one location a page
should be in given a set of interleave weights.

When a scheme is monitoring one set of nodes and migrating to another
disjoint set of nodes, you don't have this problem because once the
pages are migrated, they won't be considered by the scheme until some
other scheme moves those pages back into the monitored nodes.

Does that make sense?

> >
> > > If you really need this virtual address space based
> > > deterministic behavior, it would make more sense to use virtual addre=
ss spaces
> > > monitoring (damon-vaddr).
> >
> > Maybe it does make sense for me to implement vaddr versions of the
> > migrate actions for my use case.
>
> Yes, that could also be an option.

Given how much my explanations here stressed that having access to the
virtual addresses solves the problems I mentioned, I think the path
forward for the next revision should be:

1) Have the paddr migration scheme use the round-robin interleaving
that you provided - This would be good for the use case you described
where you promote pages from a node into multiple nodes of the same
tier.
2) Implement a vaddr migration scheme that uses the virtual address
based interleaving - This is useful for my target use case of
balancing bandwidth utilization between nodes?

If the vaddr scheme proves insufficient down the line for my use case,
we can have another discussion at that time.
How does this sound to you?

> > One thing that gives me pause about
> > this, is that, from what I understand, it would be harder to have
> > vaddr schemes apply to processes that start after damon begins. I
> > think to do that, one would have to detect when a process starts, and
> > then do a damon tune to upgrade the targets list? It would be nice if,
> > say, you could specify a cgroup as a vaddr target and track all
> > processes in that cgroup, but that would be a different patchset for
> > another day.
>
> I agree that could be a future thing to do.  Note that DAMON user-space t=
ool
> implements[1] a similar feature.

Thanks, I'll take a look at that.

[...]

Thanks again for the time you are spending on these discussions. I do
appreciate it, and I hope I'm not taking up too much of your time.

Bijan

[1] https://lore.kernel.org/damon/20250623175204.43917-1-sj@kernel.org/
[2] https://lore.kernel.org/damon/20250621180215.36243-1-sj@kernel.org/
[3] https://elixir.bootlin.com/linux/v6.16-rc3/source/mm/mempolicy.c#L213
[4] https://lore.kernel.org/damon/20250620205819.98472-1-sj@kernel.org/T/#t