From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com
Subject: Re: [RFC PATCH v3 3/4] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics
Date: Wed, 25 Feb 2026 10:46:56 -0800 [thread overview]
Message-ID: <CALa+Y16T-yNPq75s_PWwEJAEZ=r1YVdd7AyRMA2DtS=BdkZ3-w@mail.gmail.com> (raw)
In-Reply-To: <20260224042734.57666-1-sj@kernel.org>
On Mon, Feb 23, 2026 at 8:27 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Mon, 23 Feb 2026 12:32:31 +0000 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > Add new quota goal metrics for memory tiering that track scheme-eligible
> > (hot) memory distribution across NUMA nodes:
> >
> > - DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: ratio of hot memory on a node
> > - DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP: ratio of hot memory NOT on a node
> >
> > These complementary metrics enable push-pull migration schemes that
> > maintain a target hot memory distribution. For example, to keep 30%
> > of hot memory on CXL node 1:
> >
> > - PUSH scheme (DRAM→CXL): node_eligible_mem_bp, nid=1, target=3000
> > Activates when node 1 has less than 30% hot memory
> > - PULL scheme (CXL→DRAM): node_ineligible_mem_bp, nid=1, target=7000
> > Activates when node 1 has more than 30% hot memory
> >
> > Together with the TEMPORAL goal tuner, the schemes converge to
> > equilibrium at the target distribution.
> >
> > The metrics use detected eligible bytes per node, calculated by summing
> > the size of regions that match the scheme's access pattern (size,
> > nr_accesses, age) on each NUMA node.
>
> Looks good in general! I have some comments about trivials and the design
> below, though.
>
Thank you for the detailed review!
> >
> > Suggested-by: SeongJae Park <sj@kernel.org>
> > Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> > ---
> > include/linux/damon.h | 6 ++
> > mm/damon/core.c | 123 ++++++++++++++++++++++++++++++++++++++-
> > mm/damon/sysfs-schemes.c | 10 ++++
> > 3 files changed, 137 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/damon.h b/include/linux/damon.h
> > index ee2d0879c292..6df716533fbf 100644
> > --- a/include/linux/damon.h
> > +++ b/include/linux/damon.h
> > @@ -191,6 +191,8 @@ enum damos_action {
> > * @DAMOS_QUOTA_NODE_MEM_FREE_BP: MemFree ratio of a node.
> > * @DAMOS_QUOTA_NODE_MEMCG_USED_BP: MemUsed ratio of a node for a cgroup.
> > * @DAMOS_QUOTA_NODE_MEMCG_FREE_BP: MemFree ratio of a node for a cgroup.
> > + * @DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: Scheme-eligible memory ratio of a node.
> > + * @DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP: Scheme-ineligible memory ratio of a node.
>
> Nit. Let's wrap the line for 80 columns limit.
Will fix it.
>
> > * @DAMOS_QUOTA_ACTIVE_MEM_BP: Active to total LRU memory ratio.
> > * @DAMOS_QUOTA_INACTIVE_MEM_BP: Inactive to total LRU memory ratio.
> > * @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics.
> > @@ -204,6 +206,8 @@ enum damos_quota_goal_metric {
> > DAMOS_QUOTA_NODE_MEM_FREE_BP,
> > DAMOS_QUOTA_NODE_MEMCG_USED_BP,
> > DAMOS_QUOTA_NODE_MEMCG_FREE_BP,
> > + DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
> > + DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP,
> > DAMOS_QUOTA_ACTIVE_MEM_BP,
> > DAMOS_QUOTA_INACTIVE_MEM_BP,
> > NR_DAMOS_QUOTA_GOAL_METRICS,
> > @@ -555,6 +559,7 @@ struct damos_migrate_dests {
> > * @ops_filters: ops layer handling &struct damos_filter objects list.
> > * @last_applied: Last @action applied ops-managing entity.
> > * @stat: Statistics of this scheme.
> > + * @eligible_bytes_per_node: Scheme-eligible bytes per NUMA node.
> > * @max_nr_snapshots: Upper limit of nr_snapshots stat.
> > * @list: List head for siblings.
> > *
> > @@ -644,6 +649,7 @@ struct damos {
> > struct list_head ops_filters;
> > void *last_applied;
> > struct damos_stat stat;
> > + unsigned long eligible_bytes_per_node[MAX_NUMNODES];
>
> I understand this could make it time-efficient. That is, without this, you
> will need to iterate the regions for number of node_[in]eligible_mem_bp goals
> per scheme. By having this you need to iterate regions only once per scheme.
> I'm bit worried about the increased size of 'struct damos', though.
>
> Do you think the overhead is really significant? If not, what about making it
> simply iterates the regions per goal, and add optimization later if it turns
> out really needs?
>
Got it. I'll change it to simply iterate regions per goal.
> If this optimization is really needed right now, I'd like it to at least be
> dynamically allocated, for only num_online_nodes() or num_possible_nodes() at
> least.
Got it. If we ever need to implement the optimization, will follow the
dynamic allocation path.
>
> > unsigned long max_nr_snapshots;
> > struct list_head list;
> > };
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index b438355ab54a..3e1cb850f067 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> > @@ -2544,6 +2544,111 @@ static unsigned long damos_get_node_memcg_used_bp(
> > }
> > #endif
> >
> > +#ifdef CONFIG_NUMA
> > +/*
> > + * damos_scheme_uses_eligible_metrics() - Check if scheme uses eligible metrics.
> > + * @s: The scheme
> > + *
> > + * Returns true if any quota goal uses node_eligible_mem_bp or
> > + * node_ineligible_mem_bp metrics, which require eligible bytes calculation.
> > + */
> > +static bool damos_scheme_uses_eligible_metrics(struct damos *s)
> > +{
> > + struct damos_quota_goal *goal;
> > + struct damos_quota *quota = &s->quota;
> > +
> > + damos_for_each_quota_goal(goal, quota) {
> > + if (goal->metric == DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP ||
> > + goal->metric == DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP)
> > + return true;
> > + }
> > + return false;
> > +}
> > +
> > +/*
> > + * damos_calc_eligible_bytes_per_node() - Calculate eligible bytes per node.
> > + * @c: The DAMON context
> > + * @s: The scheme
> > + *
> > + * Calculates scheme-eligible bytes per NUMA node based on access pattern
> > + * matching. A region is eligible if it matches the scheme's access pattern
> > + * (size, nr_accesses, age).
> > + */
> > +static void damos_calc_eligible_bytes_per_node(struct damon_ctx *c,
> > + struct damos *s)
> > +{
> > + struct damon_target *t;
> > + struct damon_region *r;
> > + phys_addr_t paddr;
> > + int nid;
> > +
> > + memset(s->eligible_bytes_per_node, 0,
> > + sizeof(s->eligible_bytes_per_node));
> > +
> > + damon_for_each_target(t, c) {
> > + damon_for_each_region(r, t) {
> > + if (!__damos_valid_target(r, s))
> > + continue;
> > + paddr = (phys_addr_t)r->ar.start * c->addr_unit;
> > + nid = pfn_to_nid(PHYS_PFN(paddr));
> > + if (nid >= 0 && nid < MAX_NUMNODES)
> > + s->eligible_bytes_per_node[nid] +=
> > + damon_sz_region(r) * c->addr_unit;
> > + }
> > + }
>
> Seems the above code assumes entire region will belong in the same node. But
> the region might be laying over a nodes boundary. In the case, miscalculations
> could happen.
>
> What about getting start/end addresses of the node, and checking the crossing
> boundary case?
Got it. I'll add handling for regions that span node boundaries by
checking the node's address range and splitting the calculation accordingly.
>
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damos *s, int nid)
> > +{
> > + unsigned long total_eligible = 0;
> > + unsigned long node_eligible;
> > + int n;
> > +
> > + if (nid < 0 || nid >= MAX_NUMNODES)
> > + return 0;
> > +
> > + for_each_online_node(n)
> > + total_eligible += s->eligible_bytes_per_node[n];
> > +
> > + if (!total_eligible)
> > + return 0;
> > +
> > + node_eligible = s->eligible_bytes_per_node[nid];
> > +
> > + return mult_frac(node_eligible, 10000, total_eligible);
> > +}
> > +
> > +static unsigned long damos_get_node_ineligible_mem_bp(struct damos *s, int nid)
> > +{
> > + unsigned long eligible_bp = damos_get_node_eligible_mem_bp(s, nid);
> > +
> > + if (eligible_bp == 0)
> > + return 10000;
> > +
> > + return 10000 - eligible_bp;
> > +}
> > +#else
> > +static bool damos_scheme_uses_eligible_metrics(struct damos *s)
> > +{
> > + return false;
> > +}
> > +
> > +static void damos_calc_eligible_bytes_per_node(struct damon_ctx *c,
> > + struct damos *s)
> > +{
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damos *s, int nid)
> > +{
> > + return 0;
> > +}
> > +
> > +static unsigned long damos_get_node_ineligible_mem_bp(struct damos *s, int nid)
> > +{
> > + return 0;
> > +}
> > +#endif
> > +
> > /*
> > * Returns LRU-active or inactive memory to total LRU memory size ratio.
> > */
> > @@ -2562,7 +2667,8 @@ static unsigned int damos_get_in_active_mem_bp(bool active_ratio)
> > return mult_frac(inactive, 10000, total);
> > }
> >
> > -static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
> > +static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal,
> > + struct damos *s)
> > {
> > u64 now_psi_total;
> >
> > @@ -2584,6 +2690,14 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
> > case DAMOS_QUOTA_NODE_MEMCG_FREE_BP:
> > goal->current_value = damos_get_node_memcg_used_bp(goal);
> > break;
> > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> > + goal->current_value = damos_get_node_eligible_mem_bp(s,
> > + goal->nid);
> > + break;
> > + case DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP:
> > + goal->current_value = damos_get_node_ineligible_mem_bp(s,
> > + goal->nid);
> > + break;
> > case DAMOS_QUOTA_ACTIVE_MEM_BP:
> > case DAMOS_QUOTA_INACTIVE_MEM_BP:
> > goal->current_value = damos_get_in_active_mem_bp(
> > @@ -2597,11 +2711,12 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
> > /* Return the highest score since it makes schemes least aggressive */
> > static unsigned long damos_quota_score(struct damos_quota *quota)
> > {
> > + struct damos *s = container_of(quota, struct damos, quota);
>
> I'd prefer passing 's' from the caller.
Will change to pass 's' from the caller instead of using container_of().
>
> > struct damos_quota_goal *goal;
> > unsigned long highest_score = 0;
> >
> > damos_for_each_quota_goal(goal, quota) {
> > - damos_set_quota_goal_current_value(goal);
> > + damos_set_quota_goal_current_value(goal, s);
> > highest_score = max(highest_score,
> > mult_frac(goal->current_value, 10000,
> > goal->target_value));
> > @@ -2693,6 +2808,10 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
> > if (!quota->ms && !quota->sz && list_empty("a->goals))
> > return;
> >
> > + /* Calculate eligible bytes per node for quota goal metrics */
> > + if (damos_scheme_uses_eligible_metrics(s))
> > + damos_calc_eligible_bytes_per_node(c, s);
> > +
> > /* First charge window */
> > if (!quota->total_charged_sz && !quota->charged_from) {
> > quota->charged_from = jiffies;
> > diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> > index fe2e3b2db9e1..232b33f5cbfb 100644
> > --- a/mm/damon/sysfs-schemes.c
> > +++ b/mm/damon/sysfs-schemes.c
> > @@ -1079,6 +1079,14 @@ struct damos_sysfs_qgoal_metric_name damos_sysfs_qgoal_metric_names[] = {
> > .metric = DAMOS_QUOTA_NODE_MEMCG_FREE_BP,
> > .name = "node_memcg_free_bp",
> > },
> > + {
> > + .metric = DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
> > + .name = "node_eligible_mem_bp",
> > + },
> > + {
> > + .metric = DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP,
> > + .name = "node_ineligible_mem_bp",
> > + },
> > {
> > .metric = DAMOS_QUOTA_ACTIVE_MEM_BP,
> > .name = "active_mem_bp",
> > @@ -2669,6 +2677,8 @@ static int damos_sysfs_add_quota_score(
> > break;
> > case DAMOS_QUOTA_NODE_MEM_USED_BP:
> > case DAMOS_QUOTA_NODE_MEM_FREE_BP:
> > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
> > + case DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP:
> > goal->nid = sysfs_goal->nid;
> > break;
> > case DAMOS_QUOTA_NODE_MEMCG_USED_BP:
> > --
> > 2.43.0
>
> So, the overall concept and definition of the new goal metrics sound good to
> me. But I'd prefer having less optimized but simpler code, and nodes boundary
> crossing regions handling.
>
I'll prepare a v4 addressing all these points:
1. Fix 80-column wrapping
2. Remove eligible_bytes_per_node[] array, iterate regions per goal
instead
3. Handle regions crossing node boundaries
4. Pass scheme pointer from caller
I'm currently on a break and will send the updated patch after March
10th.
Thanks,
Ravi.
>
> Thanks,
> SJ
next prev parent reply other threads:[~2026-02-25 18:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 12:32 [RFC PATCH v3 0/4] mm/damon: Introduce node_eligible_mem_bp and node_ineligible_mem_bp Quota Goal Metrics Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 1/4] mm/damon/sysfs: set goal_tuner after scheme creation Ravi Jonnalagadda
2026-02-24 1:40 ` SeongJae Park
2026-02-25 18:23 ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 2/4] mm/damon: fix esz=0 quota bypass allowing unlimited migration Ravi Jonnalagadda
2026-02-24 1:54 ` SeongJae Park
2026-02-25 18:28 ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 3/4] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics Ravi Jonnalagadda
2026-02-24 4:27 ` SeongJae Park
2026-02-25 18:46 ` Ravi Jonnalagadda [this message]
2026-02-23 12:32 ` [RFC PATCH v4 4/4] mm/damon: add PA-mode cache for eligible memory detection lag Ravi Jonnalagadda
2026-02-24 5:54 ` SeongJae Park
2026-02-25 18:58 ` Ravi Jonnalagadda
2026-02-24 5:36 ` [RFC PATCH v3 0/4] mm/damon: Introduce node_eligible_mem_bp and node_ineligible_mem_bp Quota Goal Metrics SeongJae Park
2026-02-25 18:19 ` Ravi Jonnalagadda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALa+Y16T-yNPq75s_PWwEJAEZ=r1YVdd7AyRMA2DtS=BdkZ3-w@mail.gmail.com' \
--to=ravis.opensrc@gmail.com \
--cc=ajayjoshi@micron.com \
--cc=akpm@linux-foundation.org \
--cc=bijan311@gmail.com \
--cc=corbet@lwn.net \
--cc=damon@lists.linux.dev \
--cc=honggyu.kim@sk.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sj@kernel.org \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox