From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com
Subject: Re: (sashiko review) [PATCH v6 1/1] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
Date: Tue, 7 Apr 2026 19:33:43 -0700 [thread overview]
Message-ID: <CALa+Y17YnrOe=UXWBMKJ1U6seKJuauDqAdTDYo1cCYnrP_vSFg@mail.gmail.com> (raw)
In-Reply-To: <20260407160546.52220-1-sj@kernel.org>
On Tue, Apr 7, 2026 at 9:05 AM SeongJae Park <sj@kernel.org> wrote:
>
> Adding another thought at the end of the mail without cutting the previous
> unrelated questions, so that Ravi can answer all my questions at once.
>
> On Mon, 6 Apr 2026 17:13:08 -0700 SeongJae Park <sj@kernel.org> wrote:
>
> > On Mon, 6 Apr 2026 12:47:56 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> >
> > > On Sun, Apr 5, 2026 at 3:45 PM SeongJae Park <sj@kernel.org> wrote:
> > > >
> > > >
> > > > Ravi, thank you for reposting this patch after the rebase. This time sashiko
> > > > was able to review this, and found good points including things that deserve
> > > > another revision of this patch.
> > > >
> > > > Forwarding full sashiko review in a reply format with my inline comments below,
> > > > for sharing details of my view and doing followup discussions via mails. Ravi,
> > > > could you please reply?
> > > >
> > >
> > > Thanks SJ, providing your comments on top of sashiko's review is very helpful.
> >
> > I'm glad to hear that it is working for you :)
> >
> > [...]
> > > > > +static unsigned long damos_calc_eligible_bytes(struct damon_ctx *c,
> > > > > > + struct damos *s, int nid, unsigned long *total)
> > > > > > +{
> > [...]
> > > > > > + struct folio *folio;
> > > > > > + unsigned long folio_sz, counted;
> > > > > > +
> > > > > > + folio = damon_get_folio(PHYS_PFN(addr));
> > > > >
> > > > > What happens if this metric is assigned to a DAMON context configured for
> > > > > virtual address space monitoring? If the context uses DAMON_OPS_VADDR,
> > > > > passing a user-space virtual address to PHYS_PFN() might cause invalid
> > > > > memory accesses or out-of-bounds page struct reads. Should this code
> > > > > explicitly verify the operations type first?
> > > >
> > > > Good finding. We intend to support only paddr ops. But there is no guard for
> > > > using this on vaddr ops configuration. Ravi, could we add underlying ops
> > > > check? I think damon_commit_ctx() is a good place to add that. The check
> > > > could be something like below?
> > > >
> > >
> > > I plan to add the ops type check directly in the metric functions
> > > (damos_get_node_eligible_mem_bp and its counterpart) rather than in
> > > damon_commit_ctx(). The functions will return 0 early
> > > if c->ops.id != DAMON_OPS_PADDR.
> > >
> > > That said, if you prefer the damon_commit_ctx() validation approach to
> > > reject the configuration outright, I can implement it that way instead.
> > > Please let me know your preference.
> >
> > I'd prefer damon_commit_ctx() validation approach since it would give users
> > more clear message of the failure.
> >
> > >
> > > > '''
> > > > --- a/mm/damon/core.c
> > > > +++ b/mm/damon/core.c
> > > > @@ -1515,10 +1515,23 @@ static int damon_commit_sample_control(
> > > > int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
> > > > {
> > > > int err;
> > > > + struct damos *scheme;
> > > > + struct damos_quota_goal *goal;
> > > >
> > > > dst->maybe_corrupted = true;
> > > > if (!is_power_of_2(src->min_region_sz))
> > > > return -EINVAL;
> > > > + if (src->ops.id != DAMON_OPS_PADDR) {
> > > > + damon_for_each_scheme(scheme, src) {
> > > > + damos_for_each_quota_goal(goal, &scheme->quota) {
> > > > + switch (goal->metric) {
> > > > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> > > > + case DAMOS_QUOTA_NODE_INELIGIBLE_MEMPBP:
> > > > + return -EINVAL;
> > > > + }
> > > > + }
> > > > + }
> > > > + }
> > > >
> > > > err = damon_commit_schemes(dst, src);
> > > > if (err)
> > > > '''
> > [...]
> > > > > > + /* Compute ineligible ratio directly: 10000 - eligible_bp */
> > > > > > + return 10000 - mult_frac(node_eligible, 10000, total_eligible);
> > > > > > +}
> > > > >
> > > > > Does this return value match the documented metric? The formula computes the
> > > > > percentage of the system's eligible memory located on other NUMA nodes,
> > > > > rather than the amount of actual ineligible (filtered out) memory residing
> > > > > on the target node. Could this semantic mismatch cause confusion when
> > > > > configuring quota policies?
> > > >
> > > > Nice catch. The name and the documentation are confusing. We actually
> > > > confused a few times in previous revisions, and I'm again confused now. IIUC,
> > > > the current implementation is the intended and right one for the given use
> > > > case, though. If my understanding is correct, how about renaming
> > > > DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP to
> > > > DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP_COMPLEMENT, and updating the documentation
> > > > together? Ravi, what do you think?
> > > >
> > >
> > > Agreed, the current name is confusing. How about
> > > DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP_OFFNODE?
> > >
> > > The rationale is that this metric measures "eligible memory that is off
> > > this node" (i.e., on other nodes).
> > >
> > > I think "offnode" conveys the physical meaning more directly than "complement".
> > > That said, I'm happy to go with "complement" if you prefer.
> > > both are clearer than "ineligible".
> >
> > Thank you for the nice suggestion. I like "offnode" term. But I think having
> > "node" twice on the name is not really efficient for people who print code on
> > papers. What about DAMOS_QUOTA_OFFNODE_ELIGIBLE_MEM_BP?
> >
> > But... Maybe more importantly... Now I realize this means that
> > offnode_eligible_mem_bp with target nid 0 is just same to node_eligible_mem_bp
> > with target nid 1, on your test setup. Maybe we don't really need
> > offnode_eligible_mem_bp? That is, your test setup could be like below.
> >
> > '''
> > For maintaining hot memory on DRAM (node 0) and CXL (node 1) in a 7:3
> > ratio:
> >
> > PUSH scheme: migrate_hot from node 0 -> node 1
> > goal: node_eligible_mem_bp, nid=1, target=3000
> > "Move hot pages from DRAM to CXL if less thatn 30% of hot data is
> > in CXL"
> >
> > PULL scheme: migrate_hot from node 1 -> node 0
> > goal: node_eligible_mem_bp, nid=0, target=7000
> > "Move hot pages from CXL to DRAM if less than 70% of hot data is
> > in DRAM"
> > '''
> >
> > And the schemes are more easy to read and understand for me. This seems even
> > straightforward to scale for >2 nodes. For example, if we want hot memory
> > distribution of 5:3:2 to nodes 0:1:2,
> >
> > Two schemes for migrating hot pages out of node 0
> > - migrate_hot from node 0 -> node 1
> > - goal: node_eligible_mem_bp, nid=1, target=3000
> > - migrate_hot from node 0 -> node 2
> > - goal: node_eligible_mem_bp, nid=2, target=2000
> >
> > Two schemes for migrating hot pages out of node 1
> > - migrate_hot from node 1 -> node 0
> > - goal: node_eligible_mem_bp, nid=0, target=5000
> > - migrate_hot from node 1 -> node 2
> > - goal: node_eligible_mem_bp, nid=2, target=2000
> >
> > Two schemes for migrating hot pages out of node 2
> > - migrate_hot from node 2 -> node 0
> > - goal: node_eligible_mem_bp, nid=0, target=5000
> > - migrate_hot from node 2 -> node 1
> > - goal: node_eligible_mem_bp, nid=1, target=3000
> >
> > Do you think this makes sense? If it makes sense and works for your use case,
> > what about dropping the offnode goal type?
>
> Now I recall I suggested the offnode metric because I suggested to run a
> kdamond per node. That is, having one kdamond that monitors only node 0 and
> migrate hot memory to node 1, and another kdamond that monitors only node 1 and
> migrate hot memory to node 0. And I suggested to do so because I knew it is
> suboptimal to run DAMOS schemes with node filter.
>
> We made a change [1] for making that more optimum, though. The change is now
> in mm-stable, so hopefully it will be available from 7.1-rc1. So I believe the
> single quota goal metric should work now. Ravi, could you share what you
> think?
>
Yes SJ. I think we can make it work with single goal now that the
below commit is part of mainline. will give it a try and post an
update.
> [1] commit e1ace69c33ec ("mm/damon/core: set quota-score histogram with core filters")
>
>
> Thanks,
> SJ
>
> [...]
next prev parent reply other threads:[~2026-04-08 2:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-05 18:42 [PATCH v6 0/1] " Ravi Jonnalagadda
2026-04-05 18:42 ` [PATCH v6 1/1] " Ravi Jonnalagadda
2026-04-05 22:45 ` (sashiko review) " SeongJae Park
2026-04-06 19:47 ` Ravi Jonnalagadda
2026-04-07 0:13 ` SeongJae Park
2026-04-07 16:05 ` SeongJae Park
2026-04-08 2:33 ` Ravi Jonnalagadda [this message]
2026-04-08 13:54 ` SeongJae Park
2026-04-05 22:51 ` [PATCH v6 0/1] " SeongJae Park
2026-04-06 0:20 ` Ravi Jonnalagadda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALa+Y17YnrOe=UXWBMKJ1U6seKJuauDqAdTDYo1cCYnrP_vSFg@mail.gmail.com' \
--to=ravis.opensrc@gmail.com \
--cc=ajayjoshi@micron.com \
--cc=akpm@linux-foundation.org \
--cc=bijan311@gmail.com \
--cc=corbet@lwn.net \
--cc=damon@lists.linux.dev \
--cc=honggyu.kim@sk.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sj@kernel.org \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox