Re: (sashiko review) [PATCH v6 1/1] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	akpm@linux-foundation.org, corbet@lwn.net,  bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com,  yunjeong.mun@sk.com
Subject: Re: (sashiko review) [PATCH v6 1/1] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
Date: Mon, 6 Apr 2026 12:47:56 -0700	[thread overview]
Message-ID: <CALa+Y14oWqu5+DbkENy7GgBjc=dCbFTaoOCr1i4=9CN-ZNRgEA@mail.gmail.com> (raw)
In-Reply-To: <20260405224550.76218-1-sj@kernel.org>

On Sun, Apr 5, 2026 at 3:45 PM SeongJae Park <sj@kernel.org> wrote:
>
>
> Ravi, thank you for reposting this patch after the rebase.  This time sashiko
> was able to review this, and found good points including things that deserve
> another revision of this patch.
>
> Forwarding full sashiko review in a reply format with my inline comments below,
> for sharing details of my view and doing followup discussions via mails.  Ravi,
> could you please reply?
>

Thanks SJ, providing your comments on top of sashiko's review is very helpful.

> > # review url: https://sashiko.dev/#/patchset/20260405184247.2690-2-ravis.opensrc@gmail.com
> > # start of sashiko.dev inline review
> > commit 5f6ea8da06d136b84d97827777acf2b07047d42c
> > Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> > mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
> >
> > This commit adds quota goal metrics for tracking scheme-eligible and
> > ineligible memory distribution across NUMA nodes. It iterates over
> > scheme-eligible regions and uses damon_get_folio() to determine placement.
> >
> > > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > > --- a/mm/damon/core.c
> > > +++ b/mm/damon/core.c
> >
> > [ ... ]
> >
> > > +static unsigned long damos_calc_eligible_bytes(struct damon_ctx *c,
> > > +           struct damos *s, int nid, unsigned long *total)
> > > +{
> > > +   struct damon_target *t;
> > > +   struct damon_region *r;
> > > +   unsigned long total_eligible = 0;
> > > +   unsigned long node_eligible = 0;
> >
> > Can total_eligible and node_eligible overflow on 32-bit architectures?
> > Since these accumulators track raw physical bytes, regions spanning multiple
> > gigabytes might wrap around the 32-bit unsigned long limit. Would it be
> > safer to use u64 or track the counts in pages instead?
>
> Makes sense.  Ravi, could we update these to use phys_addr_t?

Agreed. I'll update total_eligible, node_eligible, and the function
return type to use phys_addr_t throughout damos_calc_eligible_bytes().

>
> >
> > > +
> > > +   damon_for_each_target(t, c) {
> > > +           damon_for_each_region(r, t) {
> > > +                   phys_addr_t addr, end_addr;
> > > +
> > > +                   if (!__damos_valid_target(r, s))
> > > +                           continue;
> >
> > Should this loop also check damos_core_filter_out()? By only checking
> > __damos_valid_target(), it seems memory explicitly excluded by user-defined
> > region filters (like target ID or memcg limits) is still counted as
> > eligible, which might inflate the total and affect the quota tuning loop.
>
> No, we intentionally defined "eligible" based on only the access pattern.  We
> may be able to extend this to apply filters in future, if we find a use case.
> But, for now, I'd suggest keeping this as is for now.
>

Got it. Will keep it as is.

> >
> > > +
> > > +                   /* Convert from core address units to physical bytes */
> > > +                   addr = r->ar.start * c->addr_unit;
> > > +                   end_addr = r->ar.end * c->addr_unit;
> > > +                   while (addr < end_addr) {
> >
> > Could this unbounded loop cause CPU soft lockups? Since DAMON regions can
> > span hundreds of gigabytes, executing millions of folio lookups sequentially
> > without a cond_resched() could monopolize the CPU.
>
> Good point.  Most DAMOS action implementations are also doing cond_resched()
> per region.  Ravi, could we add cond_resched() per region to this function,
> too?

Will add cond_resched() at the end of each region's processing.

>
> >
> > > +                           struct folio *folio;
> > > +                           unsigned long folio_sz, counted;
> > > +
> > > +                           folio = damon_get_folio(PHYS_PFN(addr));
> >
> > What happens if this metric is assigned to a DAMON context configured for
> > virtual address space monitoring? If the context uses DAMON_OPS_VADDR,
> > passing a user-space virtual address to PHYS_PFN() might cause invalid
> > memory accesses or out-of-bounds page struct reads. Should this code
> > explicitly verify the operations type first?
>
> Good finding.  We intend to support only paddr ops.  But there is no guard for
> using this on vaddr ops configuration.  Ravi, could we add underlying ops
> check?  I think damon_commit_ctx() is a good place to add that.  The check
> could be something like below?
>

I plan to add the ops type check directly in the metric functions
(damos_get_node_eligible_mem_bp and its counterpart) rather than in
damon_commit_ctx(). The functions will return 0 early
if c->ops.id != DAMON_OPS_PADDR.

That said, if you prefer the damon_commit_ctx() validation approach to
reject the configuration outright, I can implement it that way instead.
Please let me know your preference.

> '''
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1515,10 +1515,23 @@ static int damon_commit_sample_control(
>  int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
>  {
>         int err;
> +       struct damos *scheme;
> +       struct damos_quota_goal *goal;
>
>         dst->maybe_corrupted = true;
>         if (!is_power_of_2(src->min_region_sz))
>                 return -EINVAL;
> +       if (src->ops.id != DAMON_OPS_PADDR) {
> +               damon_for_each_scheme(scheme, src) {
> +                       damos_for_each_quota_goal(goal, &scheme->quota) {
> +                               switch (goal->metric) {
> +                               case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> +                               case DAMOS_QUOTA_NODE_INELIGIBLE_MEMPBP:
> +                                       return -EINVAL;
> +                               }
> +                       }
> +               }
> +       }
>
>         err = damon_commit_schemes(dst, src);
>         if (err)
> '''
>
> >
> > > +                           if (!folio) {
> > > +                                   addr += PAGE_SIZE;
> > > +                                   continue;
> > > +                           }
> > > +
> > > +                           folio_sz = folio_size(folio);
> > > +                           /*
> > > +                            * Clip to region boundaries to avoid counting
> > > +                            * bytes outside the region when folio spans
> > > +                            * region boundaries.
> > > +                            */
> > > +                           counted = min(folio_sz, (unsigned long)(end_addr - addr));
> >
> > Does this accurately calculate the remaining bytes if the region starts at an
> > unaligned offset within a large folio? folio_size() returns the full size
> > of the folio, so this calculation might overestimate the eligible bytes by
> > not accounting for the initial offset within the folio.
> >
> > > +                           total_eligible += counted;
> > > +                           if (folio_nid(folio) == nid)
> > > +                                   node_eligible += counted;
> > > +
> > > +                           addr += folio_sz;
> >
> > Can this pointer advancement overshoot the end of the current folio? If
> > addr was unaligned, adding the full folio_sz will skip past the beginning
> > of the next contiguous physical memory folio.
>
> Above two points make sense to me.  Ravi, could we fix these?
>

 Will fix it. Will calculate the exact overlap between the
 region [addr, end_addr) and the folio's actual range.

> >
> > > +                           folio_put(folio);
> > > +                   }
> > > +           }
> > > +   }
> > > +
> > > +   *total = total_eligible;
> > > +   return node_eligible;
> > > +}
> >
> > [ ... ]
> >
> > > +static unsigned long damos_get_node_ineligible_mem_bp(struct damon_ctx *c,
> > > +           struct damos *s, int nid)
> > > +{
> > > +   unsigned long total_eligible = 0;
> > > +   unsigned long node_eligible;
> > > +
> > > +   if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> > > +           return 0;
> > > +
> > > +   node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> > > +
> > > +   /* No eligible memory anywhere - ratio is undefined, return 0 */
> > > +   if (!total_eligible)
> > > +           return 0;
> > > +
> > > +   /* Compute ineligible ratio directly: 10000 - eligible_bp */
> > > +   return 10000 - mult_frac(node_eligible, 10000, total_eligible);
> > > +}
> >
> > Does this return value match the documented metric? The formula computes the
> > percentage of the system's eligible memory located on other NUMA nodes,
> > rather than the amount of actual ineligible (filtered out) memory residing
> > on the target node. Could this semantic mismatch cause confusion when
> > configuring quota policies?
>
> Nice catch.  The name and the documentation are confusing.  We actually
> confused a few times in previous revisions, and I'm again confused now.  IIUC,
> the current implementation is the intended and right one for the given use
> case, though.  If my understanding is correct, how about renaming
> DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP to
> DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP_COMPLEMENT, and updating the documentation
> together?  Ravi, what do you think?
>

Agreed, the current name is confusing. How about
DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP_OFFNODE?

The rationale is that this metric measures "eligible memory that is off
this node" (i.e., on other nodes).

 I think "offnode" conveys the physical meaning more directly than "complement".
That said, I'm happy to go with "complement" if you prefer.
both are clearer than "ineligible".

> >
> >
> > # end of sashiko.dev inline review
> > # review url: https://sashiko.dev/#/patchset/20260405184247.2690-2-ravis.opensrc@gmail.com
>
>
> Thanks,
> SJ
>

Best Regards,
Ravi.
> # hkml [1] generated a draft of this mail.  You can regenerate
> # this using below command:
> #
> #     hkml patch sashiko_dev --for_forwarding \
> #             20260405184247.2690-2-ravis.opensrc@gmail.com
> #

next prev parent reply	other threads:[~2026-04-06 19:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-05 18:42 [PATCH v6 0/1] " Ravi Jonnalagadda
2026-04-05 18:42 ` [PATCH v6 1/1] " Ravi Jonnalagadda
2026-04-05 22:45   ` (sashiko review) " SeongJae Park
2026-04-06 19:47     ` Ravi Jonnalagadda [this message]
2026-04-07  0:13       ` SeongJae Park
2026-04-07 16:05         ` SeongJae Park
2026-04-08  2:33           ` Ravi Jonnalagadda
2026-04-08 13:54             ` SeongJae Park
2026-04-05 22:51 ` [PATCH v6 0/1] " SeongJae Park
2026-04-06  0:20   ` Ravi Jonnalagadda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALa+Y14oWqu5+DbkENy7GgBjc=dCbFTaoOCr1i4=9CN-ZNRgEA@mail.gmail.com' \
    --to=ravis.opensrc@gmail.com \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox