linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	akpm@linux-foundation.org, corbet@lwn.net,  bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com,  yunjeong.mun@sk.com
Subject: Re: [RFC PATCH v3 3/4] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
Date: Wed, 25 Feb 2026 10:46:56 -0800	[thread overview]
Message-ID: <CALa+Y16T-yNPq75s_PWwEJAEZ=r1YVdd7AyRMA2DtS=BdkZ3-w@mail.gmail.com> (raw)
In-Reply-To: <20260224042734.57666-1-sj@kernel.org>

On Mon, Feb 23, 2026 at 8:27 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Mon, 23 Feb 2026 12:32:31 +0000 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > Add new quota goal metrics for memory tiering that track scheme-eligible
> > (hot) memory distribution across NUMA nodes:
> >
> > - DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: ratio of hot memory on a node
> > - DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP: ratio of hot memory NOT on a node
> >
> > These complementary metrics enable push-pull migration schemes that
> > maintain a target hot memory distribution. For example, to keep 30%
> > of hot memory on CXL node 1:
> >
> > - PUSH scheme (DRAM→CXL): node_eligible_mem_bp, nid=1, target=3000
> >   Activates when node 1 has less than 30% hot memory
> > - PULL scheme (CXL→DRAM): node_ineligible_mem_bp, nid=1, target=7000
> >   Activates when node 1 has more than 30% hot memory
> >
> > Together with the TEMPORAL goal tuner, the schemes converge to
> > equilibrium at the target distribution.
> >
> > The metrics use detected eligible bytes per node, calculated by summing
> > the size of regions that match the scheme's access pattern (size,
> > nr_accesses, age) on each NUMA node.
>
> Looks good in general!  I have some comments about trivials and the design
> below, though.
>

Thank you for the detailed review!

> >
> > Suggested-by: SeongJae Park <sj@kernel.org>
> > Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> > ---
> >  include/linux/damon.h    |   6 ++
> >  mm/damon/core.c          | 123 ++++++++++++++++++++++++++++++++++++++-
> >  mm/damon/sysfs-schemes.c |  10 ++++
> >  3 files changed, 137 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/damon.h b/include/linux/damon.h
> > index ee2d0879c292..6df716533fbf 100644
> > --- a/include/linux/damon.h
> > +++ b/include/linux/damon.h
> > @@ -191,6 +191,8 @@ enum damos_action {
> >   * @DAMOS_QUOTA_NODE_MEM_FREE_BP:    MemFree ratio of a node.
> >   * @DAMOS_QUOTA_NODE_MEMCG_USED_BP:  MemUsed ratio of a node for a cgroup.
> >   * @DAMOS_QUOTA_NODE_MEMCG_FREE_BP:  MemFree ratio of a node for a cgroup.
> > + * @DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:        Scheme-eligible memory ratio of a node.
> > + * @DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP:      Scheme-ineligible memory ratio of a node.
>
> Nit.  Let's wrap the line for 80 columns limit.

Will fix it.

>
> >   * @DAMOS_QUOTA_ACTIVE_MEM_BP:               Active to total LRU memory ratio.
> >   * @DAMOS_QUOTA_INACTIVE_MEM_BP:     Inactive to total LRU memory ratio.
> >   * @NR_DAMOS_QUOTA_GOAL_METRICS:     Number of DAMOS quota goal metrics.
> > @@ -204,6 +206,8 @@ enum damos_quota_goal_metric {
> >       DAMOS_QUOTA_NODE_MEM_FREE_BP,
> >       DAMOS_QUOTA_NODE_MEMCG_USED_BP,
> >       DAMOS_QUOTA_NODE_MEMCG_FREE_BP,
> > +     DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
> > +     DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP,
> >       DAMOS_QUOTA_ACTIVE_MEM_BP,
> >       DAMOS_QUOTA_INACTIVE_MEM_BP,
> >       NR_DAMOS_QUOTA_GOAL_METRICS,
> > @@ -555,6 +559,7 @@ struct damos_migrate_dests {
> >   * @ops_filters:     ops layer handling &struct damos_filter objects list.
> >   * @last_applied:    Last @action applied ops-managing entity.
> >   * @stat:            Statistics of this scheme.
> > + * @eligible_bytes_per_node: Scheme-eligible bytes per NUMA node.
> >   * @max_nr_snapshots:        Upper limit of nr_snapshots stat.
> >   * @list:            List head for siblings.
> >   *
> > @@ -644,6 +649,7 @@ struct damos {
> >       struct list_head ops_filters;
> >       void *last_applied;
> >       struct damos_stat stat;
> > +     unsigned long eligible_bytes_per_node[MAX_NUMNODES];
>
> I understand this could make it time-efficient.  That is, without this, you
> will need to iterate the regions for number of node_[in]eligible_mem_bp goals
> per scheme.  By having this you need to iterate regions only once per scheme.
> I'm bit worried about the increased size of 'struct damos', though.
>
> Do you think the overhead is really significant?  If not, what about making it
> simply iterates the regions per goal, and add optimization later if it turns
> out really needs?
>

Got it. I'll change it to simply iterate regions per goal.

> If this optimization is really needed right now, I'd like it to at least be
> dynamically allocated, for only num_online_nodes() or num_possible_nodes() at
> least.

Got it. If we ever need to implement the optimization, will follow the
dynamic allocation path.

>
> >       unsigned long max_nr_snapshots;
> >       struct list_head list;
> >  };
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index b438355ab54a..3e1cb850f067 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> > @@ -2544,6 +2544,111 @@ static unsigned long damos_get_node_memcg_used_bp(
> >  }
> >  #endif
> >
> > +#ifdef CONFIG_NUMA
> > +/*
> > + * damos_scheme_uses_eligible_metrics() - Check if scheme uses eligible metrics.
> > + * @s: The scheme
> > + *
> > + * Returns true if any quota goal uses node_eligible_mem_bp or
> > + * node_ineligible_mem_bp metrics, which require eligible bytes calculation.
> > + */
> > +static bool damos_scheme_uses_eligible_metrics(struct damos *s)
> > +{
> > +     struct damos_quota_goal *goal;
> > +     struct damos_quota *quota = &s->quota;
> > +
> > +     damos_for_each_quota_goal(goal, quota) {
> > +             if (goal->metric == DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP ||
> > +                 goal->metric == DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP)
> > +                     return true;
> > +     }
> > +     return false;
> > +}
> > +
> > +/*
> > + * damos_calc_eligible_bytes_per_node() - Calculate eligible bytes per node.
> > + * @c: The DAMON context
> > + * @s: The scheme
> > + *
> > + * Calculates scheme-eligible bytes per NUMA node based on access pattern
> > + * matching. A region is eligible if it matches the scheme's access pattern
> > + * (size, nr_accesses, age).
> > + */
> > +static void damos_calc_eligible_bytes_per_node(struct damon_ctx *c,
> > +             struct damos *s)
> > +{
> > +     struct damon_target *t;
> > +     struct damon_region *r;
> > +     phys_addr_t paddr;
> > +     int nid;
> > +
> > +     memset(s->eligible_bytes_per_node, 0,
> > +                     sizeof(s->eligible_bytes_per_node));
> > +
> > +     damon_for_each_target(t, c) {
> > +             damon_for_each_region(r, t) {
> > +                     if (!__damos_valid_target(r, s))
> > +                             continue;
> > +                     paddr = (phys_addr_t)r->ar.start * c->addr_unit;
> > +                     nid = pfn_to_nid(PHYS_PFN(paddr));
> > +                     if (nid >= 0 && nid < MAX_NUMNODES)
> > +                             s->eligible_bytes_per_node[nid] +=
> > +                                     damon_sz_region(r) * c->addr_unit;
> > +             }
> > +     }
>
> Seems the above code assumes entire region will belong in the same node.  But
> the region might be laying over a nodes boundary.  In the case, miscalculations
> could happen.
>
> What about getting start/end addresses of the node, and checking the crossing
> boundary case?

Got it. I'll add handling for regions that span node boundaries by
checking the node's address range and splitting the calculation accordingly.

>
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damos *s, int nid)
> > +{
> > +     unsigned long total_eligible = 0;
> > +     unsigned long node_eligible;
> > +     int n;
> > +
> > +     if (nid < 0 || nid >= MAX_NUMNODES)
> > +             return 0;
> > +
> > +     for_each_online_node(n)
> > +             total_eligible += s->eligible_bytes_per_node[n];
> > +
> > +     if (!total_eligible)
> > +             return 0;
> > +
> > +     node_eligible = s->eligible_bytes_per_node[nid];
> > +
> > +     return mult_frac(node_eligible, 10000, total_eligible);
> > +}
> > +
> > +static unsigned long damos_get_node_ineligible_mem_bp(struct damos *s, int nid)
> > +{
> > +     unsigned long eligible_bp = damos_get_node_eligible_mem_bp(s, nid);
> > +
> > +     if (eligible_bp == 0)
> > +             return 10000;
> > +
> > +     return 10000 - eligible_bp;
> > +}
> > +#else
> > +static bool damos_scheme_uses_eligible_metrics(struct damos *s)
> > +{
> > +     return false;
> > +}
> > +
> > +static void damos_calc_eligible_bytes_per_node(struct damon_ctx *c,
> > +             struct damos *s)
> > +{
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damos *s, int nid)
> > +{
> > +     return 0;
> > +}
> > +
> > +static unsigned long damos_get_node_ineligible_mem_bp(struct damos *s, int nid)
> > +{
> > +     return 0;
> > +}
> > +#endif
> > +
> >  /*
> >   * Returns LRU-active or inactive memory to total LRU memory size ratio.
> >   */
> > @@ -2562,7 +2667,8 @@ static unsigned int damos_get_in_active_mem_bp(bool active_ratio)
> >       return mult_frac(inactive, 10000, total);
> >  }
> >
> > -static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
> > +static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal,
> > +             struct damos *s)
> >  {
> >       u64 now_psi_total;
> >
> > @@ -2584,6 +2690,14 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
> >       case DAMOS_QUOTA_NODE_MEMCG_FREE_BP:
> >               goal->current_value = damos_get_node_memcg_used_bp(goal);
> >               break;
> > +     case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> > +             goal->current_value = damos_get_node_eligible_mem_bp(s,
> > +                             goal->nid);
> > +             break;
> > +     case DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP:
> > +             goal->current_value = damos_get_node_ineligible_mem_bp(s,
> > +                             goal->nid);
> > +             break;
> >       case DAMOS_QUOTA_ACTIVE_MEM_BP:
> >       case DAMOS_QUOTA_INACTIVE_MEM_BP:
> >               goal->current_value = damos_get_in_active_mem_bp(
> > @@ -2597,11 +2711,12 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
> >  /* Return the highest score since it makes schemes least aggressive */
> >  static unsigned long damos_quota_score(struct damos_quota *quota)
> >  {
> > +     struct damos *s = container_of(quota, struct damos, quota);
>
> I'd prefer passing 's' from the caller.

Will change to pass 's' from the caller instead of using container_of().

>
> >       struct damos_quota_goal *goal;
> >       unsigned long highest_score = 0;
> >
> >       damos_for_each_quota_goal(goal, quota) {
> > -             damos_set_quota_goal_current_value(goal);
> > +             damos_set_quota_goal_current_value(goal, s);
> >               highest_score = max(highest_score,
> >                               mult_frac(goal->current_value, 10000,
> >                                       goal->target_value));
> > @@ -2693,6 +2808,10 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
> >       if (!quota->ms && !quota->sz && list_empty(&quota->goals))
> >               return;
> >
> > +     /* Calculate eligible bytes per node for quota goal metrics */
> > +     if (damos_scheme_uses_eligible_metrics(s))
> > +             damos_calc_eligible_bytes_per_node(c, s);
> > +
> >       /* First charge window */
> >       if (!quota->total_charged_sz && !quota->charged_from) {
> >               quota->charged_from = jiffies;
> > diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> > index fe2e3b2db9e1..232b33f5cbfb 100644
> > --- a/mm/damon/sysfs-schemes.c
> > +++ b/mm/damon/sysfs-schemes.c
> > @@ -1079,6 +1079,14 @@ struct damos_sysfs_qgoal_metric_name damos_sysfs_qgoal_metric_names[] = {
> >               .metric = DAMOS_QUOTA_NODE_MEMCG_FREE_BP,
> >               .name = "node_memcg_free_bp",
> >       },
> > +     {
> > +             .metric = DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
> > +             .name = "node_eligible_mem_bp",
> > +     },
> > +     {
> > +             .metric = DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP,
> > +             .name = "node_ineligible_mem_bp",
> > +     },
> >       {
> >               .metric = DAMOS_QUOTA_ACTIVE_MEM_BP,
> >               .name = "active_mem_bp",
> > @@ -2669,6 +2677,8 @@ static int damos_sysfs_add_quota_score(
> >                       break;
> >               case DAMOS_QUOTA_NODE_MEM_USED_BP:
> >               case DAMOS_QUOTA_NODE_MEM_FREE_BP:
> > +             case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> > +             case DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP:
> >                       goal->nid = sysfs_goal->nid;
> >                       break;
> >               case DAMOS_QUOTA_NODE_MEMCG_USED_BP:
> > --
> > 2.43.0
>
> So, the overall concept and definition of the new goal metrics sound good to
> me.  But I'd prefer having less optimized but simpler code, and nodes boundary
> crossing regions handling.
>

I'll prepare a v4 addressing all these points:
  1. Fix 80-column wrapping
  2. Remove eligible_bytes_per_node[] array, iterate regions per goal
instead
  3. Handle regions crossing node boundaries
  4. Pass scheme pointer from caller

I'm currently on a break and will send the updated patch after March
10th.

Thanks,
Ravi.
>
> Thanks,
> SJ


  reply	other threads:[~2026-02-25 18:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 12:32 [RFC PATCH v3 0/4] mm/damon: Introduce node_eligible_mem_bp and node_ineligible_mem_bp Quota Goal Metrics Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 1/4] mm/damon/sysfs: set goal_tuner after scheme creation Ravi Jonnalagadda
2026-02-24  1:40   ` SeongJae Park
2026-02-25 18:23     ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 2/4] mm/damon: fix esz=0 quota bypass allowing unlimited migration Ravi Jonnalagadda
2026-02-24  1:54   ` SeongJae Park
2026-02-25 18:28     ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 3/4] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics Ravi Jonnalagadda
2026-02-24  4:27   ` SeongJae Park
2026-02-25 18:46     ` Ravi Jonnalagadda [this message]
2026-02-23 12:32 ` [RFC PATCH v4 4/4] mm/damon: add PA-mode cache for eligible memory detection lag Ravi Jonnalagadda
2026-02-24  5:54   ` SeongJae Park
2026-02-25 18:58     ` Ravi Jonnalagadda
2026-02-24  5:36 ` [RFC PATCH v3 0/4] mm/damon: Introduce node_eligible_mem_bp and node_ineligible_mem_bp Quota Goal Metrics SeongJae Park
2026-02-25 18:19   ` Ravi Jonnalagadda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALa+Y16T-yNPq75s_PWwEJAEZ=r1YVdd7AyRMA2DtS=BdkZ3-w@mail.gmail.com' \
    --to=ravis.opensrc@gmail.com \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox