linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Stepanov Anatoly <stepanov.anatoly@huawei.com>
To: SeongJae Park <sj@kernel.org>, <gutierrez.asier@huawei-partners.com>
Cc: <artem.kuzin@huawei.com>, <wangkefeng.wang@huawei.com>,
	<yanquanmin1@huawei.com>, <zuoze1@huawei.com>,
	<damon@lists.linux.dev>, <akpm@linux-foundation.org>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v1 1/1] mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
Date: Tue, 31 Mar 2026 13:50:21 +0300	[thread overview]
Message-ID: <716989f5-78d3-4e78-98ae-2bf3caa447a9@huawei.com> (raw)
In-Reply-To: <15f60a0e-a21b-472a-ae76-8437e9859e15@huawei.com>

On 3/31/2026 1:46 PM, Stepanov Anatoly wrote:
> On 3/31/2026 4:31 AM, SeongJae Park wrote:
>> Hello Asier,
>>
>> On Mon, 30 Mar 2026 14:57:58 +0000 <gutierrez.asier@huawei-partners.com> wrote:
>>
>>> From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>>>
>>> This patch set introces a new action:  DAMOS_COLLAPSE.
>>>
>>> For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
>>> working, since it relies on hugepage_madvise to add a new slot. This
>>> slot should be picked up by khugepaged and eventually collapse (or
>>> not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not
>>> enabled, khugepaged will not be working, and therefore no collapse
>>> will happen.
>>
>> I should raised this in a previous version, sorry.  But, that is only a half of
>> the picture.  That is, khugepaged is not the single THP allocator for
>> MADV_HUGEPAGE.  IIUC, MADV_HUGEPAGE-applied region also allocates huge pages in
>> page fault time.  According to the man page,
>>
>>     The kernel will regularly scan the areas marked as huge page  candidates
>>     to replace  them with huge pages.  The kernel will also allocate huge pages
>>     directly when the region is naturally aligned to the huge page size (see
>>     posix_memalign(2)).
>>
> I think key difference between DAMOS_HUGEPAGE and DAMOS_COLLAPSE is the granularity.
> 
> In DAMOS_HUGEPAGE case, the granularity is always VMA, even if the hot region is narrow.
> It's true for both page-fault based collapse and khugepaged collapse.
*page-fault THP allocation, not collapse of course.

> 
> With DAMOS_COLLAPSE we can cover cases, when there's large VMA, for example,
> which contains some hot VA region inside, so we can collapse just that region, not the whole VMA.
>  
> 
>> I think the description is better to be wordsmithed or clarified.  Maybe just
>> pointing the MADV_COLLAPSE intro commit (7d8faaf15545 ("mm/madvise: introduce
>> MADV_COLLAPSE sync hugepage collapse")) for the rationale could also be a good
>> approach, as the aimed goal of DAMOS_COLLAPSE is not different from
>> MADV_COLLAPSE.
>>
>>>
>>> DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse
>>> the address range synchronously.
>>>
>>> This new action may be required to support autotuning with hugepage
>>> as a goal[1].
>>>
>>> [1]: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/
>>>
>>> ---------
>>> Benchmarks:
>>
>> I recently heard some tools could think above line as the commentary
>> area [1] separation line.  Please use ==== like separator instead.  For
>> example,
>>
>>     Benchmarks
>>     ==========
>>
>>>
>>> Tests were performed in an ARM physical server with MariaDB 10.5 and 
>>> sysbench. Read only benchmark was perform with uniform row hitting,
>>> which means that all rows will be access with equal probability.
>>>
>>> T n, D h: THP set to never, DAMON action set to hugepage
>>> T m, D h: THP set to madvise, DAMON action set to hugepage
>>> T n, D c: THP set to never, DAMON action set to collapse
>>>
>>> Memory consumption. Lower is better.
>>>
>>> +------------------+----------+----------+----------+
>>> |                  | T n, D h | T m, D h | T n, D c |
>>> +------------------+----------+----------+----------+
>>> | Total memory use | 2.07     | 2.09     | 2.07     |
>>> | Huge pages       | 0        | 1.3      | 1.25     |
>>> +------------------+----------+----------+----------+
>>>
>>> Performance in TPS (Transactions Per Second). Higher is better.
>>>
>>> T n, D h: 18324.57
>>> T n, D h 18452.69
>>
>> "T m, D h" ?
>>
>>> T n, D c: 18432.17
>>>
>>> Performance counter
>>>
>>> I got the number of L1 D/I TLB accesses and the number a D/I TLB
>>> accesses that triggered a page walk. I divided the second by the
>>> first to get the percentage of page walkes per TLB access. The
>>> lower the better.
>>>
>>> +---------------+--------------+--------------+--------------+
>>> |               | T n, D h     | T m, D h     | T n, D c     |
>>> +---------------+--------------+--------------+--------------+
>>> | L1 DTLB       | 127248242753 | 125431020479 | 125327001821 |
>>> | L1 ITLB       | 80332558619  | 79346759071  | 79298139590  |
>>> | DTLB walk     | 75011087     | 52800418     | 55895794     |
>>> | ITLB walk     | 71577076     | 71505137     | 67262140     |
>>> | DTLB % misses | 0.058948623  | 0.042095183  | 0.044599961  |
>>> | ITLB % misses | 0.089100954  | 0.090117275  | 0.084821839  |
>>> +---------------+--------------+--------------+--------------+
>>>
>>> - We can see that DAMOS "hugepage" action works only when THP is set
>>>   to madvise. "collapse" action works even when THP is set to never.
>>
>> Make sense.
>>
>>> - Performance for "collapse" action is slightly lower than "hugepage"
>>>   action and THP madvise.
>>
>> It would be good to add your theory about from where the difference comes.  I
>> suspect that's mainly because "hugepage" setup was allocating more THP?
>>
>>> - Memory consumption is slighly lower for "collapse" than "hugepage"
>>>   with THP madvise. This is due to the khugepage collapses all VMAs,
>>>   while "collapse" action only collapses the VMAs in the hot region.
>>
>> But you use thp=madvise, not thp=always?  So only hot regions, which
>> DAMOS_HUGEPAGE applied, could use THP.  It is same to DAMOS_COLLAPSE use case,
>> isn't it?
>>
>> I'd rather suspect the natural-aligned region huge page allocation of
>> DAMOS_HUGEPAGE as a reason of this difference.  That is, DAMOS_HUGEPAGE applied
>> regions can allocate hugepages in the fault time, on multiple user threads.
>> Meanwhile, DAMOS_COLLAPSE should be executed by the single kdamond (if you
>> utilize only single kdamond).  This might resulted in DAMOS_HUGEPAGE allocating
>> more huge pages faster than DAMOS_COLLAPSE?
>>
>>> - There is an improvement in THP utilization when collapse through
>>>   "hugepage" or "collapse" actions are triggered.
>>
>> Could you clarify which data point is showing this?  Maybe "Huge pages" /
>> "Total memory use" ?  And why?  I again suspect the fault time huge pages
>> allocation.
>>
>>> - "collapse" action is performance synchronously, which means that
>>>   page collapses happen earlier and more rapidly.
>>
>> But these test results are not showing it clearly.  Rather, the results is
>> saying "hugepage" was able to make more huge pages than "collapse".  Still the
>> above sentence makes sense when we say about "collapsing" operations.  But,
>> this test is not showing it clearly.  I think we should make it clear the
>> limitation of this test.
>>
>>>   This can be
>>>   useful or not, depending on the scenario.
>>>
>>> Collapse action just adds a new option to chose the correct system
>>> balance.
>>
>> That's a fair point.  I believe we also discussed pros and cons of
>> MADV_COLLAPSE, and concluded MADV_COLLAPSE is worthy to be added.  For
>> DAMOS_COLLAPSE, I don't think we have to do that again.
>>
>>>
>>> Changes
>>> ---------
>>> RFC v2 -> v1:
>>> Fixed a missing comma in the selftest python stript
>>> Added performance benchmarks
>>>
>>> RFC v1 -> RFC v2:
>>> Added benchmarks
>>> Added damos_filter_type documentation for new action to fix kernel-doc
>>
>> Please put changelog in the commentary area, and consider adding links to the
>> previous revisions [1].
>>
>>>
>>> Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
>>> ---
>>
>> Code looks good to me.  Nonetheless I'd hope above commit message and benchmark
>> results analysis be more polished and/or clarified.
>>
>> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
>>
>>
>> Thanks,
>> SJ
> 
> 


-- 
Anatoly Stepanov, Huawei


  reply	other threads:[~2026-03-31 10:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 14:57 gutierrez.asier
2026-03-30 23:43 ` (sashiko review) " SeongJae Park
2026-03-31  0:01   ` SeongJae Park
2026-03-31  1:31 ` SeongJae Park
2026-03-31 10:46   ` Stepanov Anatoly
2026-03-31 10:50     ` Stepanov Anatoly [this message]
2026-04-01  0:54       ` SeongJae Park
2026-03-31 15:15   ` Gutierrez Asier
2026-04-01  0:59     ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=716989f5-78d3-4e78-98ae-2bf3caa447a9@huawei.com \
    --to=stepanov.anatoly@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=artem.kuzin@huawei.com \
    --cc=damon@lists.linux.dev \
    --cc=gutierrez.asier@huawei-partners.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=yanquanmin1@huawei.com \
    --cc=zuoze1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox