From: SeongJae Park <sj@kernel.org>
To: Quanmin Yan <yanquanmin1@huawei.com>
Cc: SeongJae Park <sj@kernel.org>, zuoze <zuoze1@huawei.com>,
akpm@linux-foundation.org, damon@lists.linux.dev,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
wangkefeng.wang@huawei.com
Subject: Re: [RFC PATCH] mm/damon: add full LPAE support for memory monitoring above 4GB
Date: Sat, 26 Jul 2025 10:16:16 -0700 [thread overview]
Message-ID: <20250726171616.53704-1-sj@kernel.org> (raw)
In-Reply-To: <527714dd-0e33-43ab-bbbd-d89670ba79e7@huawei.com>
Hi Quanmin,
On Sat, 26 Jul 2025 11:14:19 +0800 Quanmin Yan <yanquanmin1@huawei.com> wrote:
>
> 在 2025/7/26 4:22, SeongJae Park 写道:
> > On Fri, 25 Jul 2025 11:15:22 +0800 zuoze <zuoze1@huawei.com> wrote:
> >
> >>
> >> 在 2025/4/23 1:43, SeongJae Park 写道:
> >>> On Tue, 22 Apr 2025 19:50:11 +0800 zuoze <zuoze1@huawei.com> wrote:
> >>>
> >>> [...]
> >>>> Thanks for the patches - I’ve noted the RFC series and user-space
> >>>> updates. Apologies for the delay; I’ll prioritize reviewing these soon
> >>>> to verify they meet the intended tracking goals. Appreciate your
> >>>> patience.
> >>> No worry. Please take your time and let me know if there is anything I can
> >>> help.
> >>>
> >>> I think we can improve the user-space tool support better for usability. For
> >>> example, it could find LPAE case, set addr_unit parameter, and convert
> >>> user-input and output address ranges on its own. But hopefully the current
> >>> support allows simple tests of the kernel side change, and we could do such
> >>> improvement after the kernel side change is made.
> >>>
> >>>
> >> Hi SJ,
> >>
> >> Apologies for the delayed response. We've verified your patch in our
> >> environment and confirmed it supports LPAE address monitoring.
> > No worry, thank you for testing that :)
> >
> >> However,
> >> we observed some anomalies in the reclaim functionality. During code
> >> review, we identified a few issues:
> >>
> >> The semantic meaning of damon_region changed after addr_unit was
> >> introduced. The units in damon_addr_range may no longer represent bytes
> >> directly.
> > You're right, and this is an intended change.
> >
> >> The size returned by damon_sz_region() now requires multiplication by
> >> addr_unit to get the actual byte count.
> > Again, this is an intended change. damon_sz_region() callers should aware this
> > semantic and updated accordingly, if it could make a real problem otherwise.
> > If you found such changes required cases that this patch series is missing,
> > could you please list up?
> >
> >> Heavy usage of damon_sz_region() and DAMON_MIN_REGION likely requires
> >> addr_unit-aware adjustments throughout the codebase. While this approach
> >> works, it would involve considerable changes.
> > It has been a while since I wrote this patch series, but at least while writing
> > it, I didn't find such required changes. Of course I should missed something,
> > though. As I mentioned above, could you please list such changes required
> > parts that makes problem? That would be helpful at finding the path forward.
> >
> >> What's your perspective on
> >> how we should proceed?
> > Let's see the list of required additional changes with why those are required
> > (what problems can happen if such chadnges are not made), and discuss.
>
> Hi SJ,
>
> Thank you for your email reply. Let's discuss the impacts introduced after
> incorporating addr_unit. First of all, it's essential to clarify that the
> definition of damon_addr_range (in damon_region) has changed, we will now use
> damon_addr_range * addr_unit to calculate physical addresses.
>
> I've noticed some issues, in mm/damon/core.c:
>
> damos_apply_scheme()
> ...
> unsigned long sz = damon_sz_region(r); // the unit of 'sz' is no longer bytes.
> ...
> if (c->ops.apply_scheme)
> if (quota->esz && quota->charged_sz + sz > quota->esz)
> sz = ALIGN_DOWN(quota->esz - quota->charged_sz,
> DAMON_MIN_REGION); // the core issue lies here.
> ...
> quota->charged_sz += sz; // note the units.
> ...
> update_stat:
> // 'sz' should be multiplied by addr_unit:
> damos_update_stat(s, sz, sz_applied, sz_ops_filter_passed);
>
> Currently, DAMON_MIN_REGION is defined as PAGE_SIZE, therefore aligning
> sz downward to DAMON_MIN_REGION is likely unreasonable. Meanwhile, the unit
> of sz in damos_quota is also not bytes, which necessitates updates to comments
> and user documentation. Additionally, the calculation involving DAMON_MIN_REGION
> requires reconsideration. Here are a few examples:
>
> damos_skip_charged_region()
> ...
> sz_to_skip = ALIGN_DOWN(quota->charge_addr_from -
> r->ar.start, DAMON_MIN_REGION);
> ...
> if (damon_sz_region(r) <= DAMON_MIN_REGION)
> return true;
> sz_to_skip = DAMON_MIN_REGION;
>
> damon_region_sz_limit()
> ...
> if (sz < DAMON_MIN_REGION)
> sz = DAMON_MIN_REGION;
Thank you for this kind and detailed explanation of the issue! I understand
adopting addr_unit would make DAMON_MINREGION 'addr_unit * 4096' bytes, and it
is not a desired result when 'addr_unit' is large. For example, if 'addr_unit'
is set as 4096, the access monitoring and operation schemes will work in only
>16 MiB granularity at the best.
>
> Now I can think of two approaches, one is to keep sz in bytes, this requires
> modifications to many other call sites that use these two functions (at least
> passing the corresponding ctx->addr_unit. By the way, should we restrict the
> input of addr_unit?):
>
> damos_apply_scheme()
> ...
> - unsigned long sz = damon_sz_region(r);
> + unsigned long sz = damon_sz_region(r) * c->addr_unit;
> ...
> - damon_split_region_at(t, r, sz);
> + damon_split_region_at(t, r, sz / c->addr_unit);
>
> The second approach is to divide by addr_unit when applying DAMON_MIN_REGION,
> and revert to byte units for statistics, this approach seems to involve
> significant changes as well:
>
> damos_apply_scheme()
> ...
> sz = ALIGN_DOWN(quota->esz - quota->charged_sz,
> - DAMON_MIN_REGION);
> + DAMON_MIN_REGION / c->addr_unit);
> ...
> update_stat:
> - damos_update_stat(s, sz, sz_applied, sz_ops_filter_passed);
> + damos_update_stat(s, sz, sz_applied * c->addr_unit, sz_ops_filter_passed);
>
> These are my observations. What's your perspective on how we should proceed? Looking
> forward to your reply.
I think the second approach is better. But I want to avoid changing every
DAMON_MIN_REGION usage. What about changing DAMON_MIN_REGION as 'max(4096 /
addr_unit, 1)' instead? Specifically, we can change DAMON_MIN_REGION from a
global macro value to per-context variable (a field of damon_ctx), and set it
accordingly when the parameters are set.
For stats, I think the users should aware of the fact DAMON is working with the
addr_unit, so they should multiply addr_unit to the stats to get bytes
information. So, I think the stats update in kernel is not really required.
DAMON user-space tool may need to be updated accordingly, though.
I didn't take time to think about all corner cases, so I may missing something.
Please let me knwo if you find such missing things.
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-07-26 17:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-08 7:55 Ze Zuo
2025-04-09 2:50 ` SeongJae Park
2025-04-09 7:01 ` zuoze
2025-04-09 17:36 ` SeongJae Park
2025-04-10 6:28 ` zuoze
2025-04-10 22:25 ` SeongJae Park
2025-04-11 6:30 ` zuoze
2025-04-11 16:35 ` SeongJae Park
2025-04-14 1:21 ` zuoze
2025-04-20 16:27 ` SeongJae Park
2025-04-22 11:50 ` zuoze
2025-04-22 17:43 ` SeongJae Park
2025-07-25 3:15 ` zuoze
2025-07-25 20:22 ` SeongJae Park
2025-07-26 3:14 ` Quanmin Yan
2025-07-26 17:16 ` SeongJae Park [this message]
2025-07-29 13:52 ` Quanmin Yan
2025-07-29 15:53 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250726171616.53704-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=damon@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wangkefeng.wang@huawei.com \
--cc=yanquanmin1@huawei.com \
--cc=zuoze1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox