* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
@ 2025-04-24 18:10 Mitchell Augustin
2025-04-24 18:56 ` Nico Pache
0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Augustin @ 2025-04-24 18:10 UTC (permalink / raw)
To: akpm, 20250211152341.3431089327c5e0ec6ba6064d
Cc: 21cnbao, aneesh.kumar, anshuman.khandual, apopple, baohua,
catalin.marinas, cl, dave.hansen, david, dev.jain, haowenchao22,
hughd, ioworker0, jack, jglisse, John Hubbard, kirill.shutemov,
linux-kernel, linux-mm, mhocko, npache, Peter Xu, ryan.roberts,
srivatsa, surenb, vbabka, vishal.moola, wangkefeng.wang, will,
willy, yang, zhengqi.arch, Zi Yan, zokeefe, Jacob Martin,
Vanda Hendrychová
Hello,
I realize this is an older version of the series, but @Vanda
Hendrychová and I started on a benchmark effort of this version prior
to the most recent revision's introduction and wanted to provide our
results as feedback for this discussion.
For context, my team and I previously identified that some of the
benchmarks outlined in this phoronix benchmark suite [0] perform more
poorly with thp=madvise than thp=always - so I suspected that the
THP=defer and khugepaged collapse functionality outlined in this
article [6] might yield performance in between madvise and always for
the following benchmarks from that suite:
- GraphicsMagick (all tests), which were substantially improved when
switching from thp=madvise to thp=always
- 7-Zip Compression rating, which was substantially improved when
switching from thp=madvise to thp=always
- Compilation time tests, which were slightly improved when switching
from thp=madvise to thp=always
There were more benchmarks in this suite, but these three were the
ones we had previously identified as being significantly impacted by
the thp setting, and thus are the primary focus of our results.
To analyze this, we ran the benchmarks outlined in this article on the
upstream 6.14 kernel with the following configurations:
- linux v6.14 thp=defer-v1: Transparent Huge Pages: defer
- linux v6.14 thp=defer-v2: Transparent Huge Pages: defer
- linux v6.14 thp=always: Transparent Huge Pages: always
- linux v6.14 thp=never: Transparent Huge Pages: never
- linux v6.14 thp=madvise: Transparent Huge Pages: madvise
"defer-v1" refers to the thp collapse implementation by Nico Pache
[3], and "defer-v2" refers to the implementation in this thread [4].
Both use defer as implemented by series [5].
Ultimately, we did observe that some of the GraphicsMagick tests
performed marginally better with Nico Pache's khugepaged collapse
implementation and thp=defer than with just thp=madvise, which aligns
a bit with my theory - however, these improvements unfortunately did
not appear to be statistically significant and gained only marginal
ground in the performance gap between thp=madvise and thp=always in
our workloads of interest.
Results for other benchmarks in this set also did not show any
conclusive performance gains from mTHP=defer (however I was not
expecting those to change significantly with this series, since they
weren’t heavily impacted by thp settings in my prior tests).
I can't speak for the impact of this series on other workloads - I
just wanted to share results for the ones we were aware of and
interested in.
Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2]
are linked below.
[0]: https://www.phoronix.com/review/linux-os-ampereone/5
[1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
[2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/
[3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com
[4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com
[5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com
[6]: https://lwn.net/Articles/1009039/
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-04-24 18:10 [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse Mitchell Augustin
@ 2025-04-24 18:56 ` Nico Pache
2025-04-24 19:45 ` Mitchell Augustin
0 siblings, 1 reply; 10+ messages in thread
From: Nico Pache @ 2025-04-24 18:56 UTC (permalink / raw)
To: Mitchell Augustin
Cc: akpm, 20250211152341.3431089327c5e0ec6ba6064d, 21cnbao,
aneesh.kumar, anshuman.khandual, apopple, baohua,
catalin.marinas, cl, dave.hansen, david, dev.jain, haowenchao22,
hughd, ioworker0, jack, jglisse, John Hubbard, kirill.shutemov,
linux-kernel, linux-mm, mhocko, Peter Xu, ryan.roberts, srivatsa,
surenb, vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
zhengqi.arch, Zi Yan, zokeefe, Jacob Martin,
Vanda Hendrychová
On Thu, Apr 24, 2025 at 12:18 PM Mitchell Augustin
<mitchell.augustin@canonical.com> wrote:
>
> Hello,
>
> I realize this is an older version of the series, but @Vanda
> Hendrychová and I started on a benchmark effort of this version prior
> to the most recent revision's introduction and wanted to provide our
> results as feedback for this discussion.
>
> For context, my team and I previously identified that some of the
> benchmarks outlined in this phoronix benchmark suite [0] perform more
> poorly with thp=madvise than thp=always - so I suspected that the
> THP=defer and khugepaged collapse functionality outlined in this
> article [6] might yield performance in between madvise and always for
> the following benchmarks from that suite:
> - GraphicsMagick (all tests), which were substantially improved when
> switching from thp=madvise to thp=always
> - 7-Zip Compression rating, which was substantially improved when
> switching from thp=madvise to thp=always
> - Compilation time tests, which were slightly improved when switching
> from thp=madvise to thp=always
>
> There were more benchmarks in this suite, but these three were the
> ones we had previously identified as being significantly impacted by
> the thp setting, and thus are the primary focus of our results.
>
> To analyze this, we ran the benchmarks outlined in this article on the
> upstream 6.14 kernel with the following configurations:
> - linux v6.14 thp=defer-v1: Transparent Huge Pages: defer
> - linux v6.14 thp=defer-v2: Transparent Huge Pages: defer
> - linux v6.14 thp=always: Transparent Huge Pages: always
> - linux v6.14 thp=never: Transparent Huge Pages: never
> - linux v6.14 thp=madvise: Transparent Huge Pages: madvise
>
> "defer-v1" refers to the thp collapse implementation by Nico Pache
> [3], and "defer-v2" refers to the implementation in this thread [4].
> Both use defer as implemented by series [5].
>
>
> Ultimately, we did observe that some of the GraphicsMagick tests
> performed marginally better with Nico Pache's khugepaged collapse
> implementation and thp=defer than with just thp=madvise, which aligns
> a bit with my theory - however, these improvements unfortunately did
> not appear to be statistically significant and gained only marginal
> ground in the performance gap between thp=madvise and thp=always in
> our workloads of interest.
>
> Results for other benchmarks in this set also did not show any
> conclusive performance gains from mTHP=defer (however I was not
> expecting those to change significantly with this series, since they
> weren’t heavily impacted by thp settings in my prior tests).
>
> I can't speak for the impact of this series on other workloads - I
> just wanted to share results for the ones we were aware of and
> interested in.
Hi Mitchell,
Thank you very much for both testing and sharing the results! I'm glad
no major regressions were noted, and in some cases performance was
marginally better. Another good set of workloads to test for defer
would be latency tests... THP=always can increase PF latencies, while
"defer" should eliminate that penalty, with the hopes of regaining
some of the THP benefits after the khugepaged collapse.
I wanted to note one thing, with the default of max_ptes_none=511 and
no mTHP sizes configured, the khugepaged series' (both mine and Devs)
should have very little impact. This is a good test of the defer
feature, while confirming that neither me nor Dev regressed the legacy
PMD khugepaged case; however, this is not a good test of the actual
mTHP collapsing.
If you plan on testing the mTHP changes for performance changes, I
would suggest enabling all the mTHP orders and setting max_ptes_none=0
(Devs series requires 0 or 511 for mTHP collapse to work). Given this
is a new feature, it may be hard to find something to compare it to,
other than each other's series'. enabling defer during these tests has
the added benefit of pushing everything to khugepaged and really
stressing its mTHP collapse performance.
Once again thank you for taking the time to test these features :)
-- Nico
>
> Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2]
> are linked below.
>
> [0]: https://www.phoronix.com/review/linux-os-ampereone/5
> [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
> [2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/
> [3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com
> [4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com
> [5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com
> [6]: https://lwn.net/Articles/1009039/
> --
> Mitchell Augustin
> Software Engineer - Ubuntu Partner Engineering
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-04-24 18:56 ` Nico Pache
@ 2025-04-24 19:45 ` Mitchell Augustin
2025-05-02 20:32 ` Mitchell Augustin
2025-05-02 20:34 ` Mitchell Augustin
0 siblings, 2 replies; 10+ messages in thread
From: Mitchell Augustin @ 2025-04-24 19:45 UTC (permalink / raw)
To: Nico Pache
Cc: akpm, 20250211152341.3431089327c5e0ec6ba6064d, 21cnbao,
aneesh.kumar, anshuman.khandual, apopple, baohua,
catalin.marinas, cl, dave.hansen, david, dev.jain, haowenchao22,
hughd, ioworker0, jack, jglisse, John Hubbard, kirill.shutemov,
linux-kernel, linux-mm, mhocko, Peter Xu, ryan.roberts, srivatsa,
surenb, vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
zhengqi.arch, Zi Yan, zokeefe, Jacob Martin,
Vanda Hendrychová
Hi Nico,
Thank you for the quick response and suggestions! I'll see if we can
find some time to test our workload out with your suggested settings
and will let you know what we find (although it may be a few weeks).
-Mitchell Augustin
On Thu, Apr 24, 2025 at 1:57 PM Nico Pache <npache@redhat.com> wrote:
>
> On Thu, Apr 24, 2025 at 12:18 PM Mitchell Augustin
> <mitchell.augustin@canonical.com> wrote:
> >
> > Hello,
> >
> > I realize this is an older version of the series, but @Vanda
> > Hendrychová and I started on a benchmark effort of this version prior
> > to the most recent revision's introduction and wanted to provide our
> > results as feedback for this discussion.
> >
> > For context, my team and I previously identified that some of the
> > benchmarks outlined in this phoronix benchmark suite [0] perform more
> > poorly with thp=madvise than thp=always - so I suspected that the
> > THP=defer and khugepaged collapse functionality outlined in this
> > article [6] might yield performance in between madvise and always for
> > the following benchmarks from that suite:
> > - GraphicsMagick (all tests), which were substantially improved when
> > switching from thp=madvise to thp=always
> > - 7-Zip Compression rating, which was substantially improved when
> > switching from thp=madvise to thp=always
> > - Compilation time tests, which were slightly improved when switching
> > from thp=madvise to thp=always
> >
> > There were more benchmarks in this suite, but these three were the
> > ones we had previously identified as being significantly impacted by
> > the thp setting, and thus are the primary focus of our results.
> >
> > To analyze this, we ran the benchmarks outlined in this article on the
> > upstream 6.14 kernel with the following configurations:
> > - linux v6.14 thp=defer-v1: Transparent Huge Pages: defer
> > - linux v6.14 thp=defer-v2: Transparent Huge Pages: defer
> > - linux v6.14 thp=always: Transparent Huge Pages: always
> > - linux v6.14 thp=never: Transparent Huge Pages: never
> > - linux v6.14 thp=madvise: Transparent Huge Pages: madvise
> >
> > "defer-v1" refers to the thp collapse implementation by Nico Pache
> > [3], and "defer-v2" refers to the implementation in this thread [4].
> > Both use defer as implemented by series [5].
> >
> >
> > Ultimately, we did observe that some of the GraphicsMagick tests
> > performed marginally better with Nico Pache's khugepaged collapse
> > implementation and thp=defer than with just thp=madvise, which aligns
> > a bit with my theory - however, these improvements unfortunately did
> > not appear to be statistically significant and gained only marginal
> > ground in the performance gap between thp=madvise and thp=always in
> > our workloads of interest.
> >
> > Results for other benchmarks in this set also did not show any
> > conclusive performance gains from mTHP=defer (however I was not
> > expecting those to change significantly with this series, since they
> > weren’t heavily impacted by thp settings in my prior tests).
> >
> > I can't speak for the impact of this series on other workloads - I
> > just wanted to share results for the ones we were aware of and
> > interested in.
> Hi Mitchell,
>
> Thank you very much for both testing and sharing the results! I'm glad
> no major regressions were noted, and in some cases performance was
> marginally better. Another good set of workloads to test for defer
> would be latency tests... THP=always can increase PF latencies, while
> "defer" should eliminate that penalty, with the hopes of regaining
> some of the THP benefits after the khugepaged collapse.
>
> I wanted to note one thing, with the default of max_ptes_none=511 and
> no mTHP sizes configured, the khugepaged series' (both mine and Devs)
> should have very little impact. This is a good test of the defer
> feature, while confirming that neither me nor Dev regressed the legacy
> PMD khugepaged case; however, this is not a good test of the actual
> mTHP collapsing.
>
> If you plan on testing the mTHP changes for performance changes, I
> would suggest enabling all the mTHP orders and setting max_ptes_none=0
> (Devs series requires 0 or 511 for mTHP collapse to work). Given this
> is a new feature, it may be hard to find something to compare it to,
> other than each other's series'. enabling defer during these tests has
> the added benefit of pushing everything to khugepaged and really
> stressing its mTHP collapse performance.
>
> Once again thank you for taking the time to test these features :)
> -- Nico
>
>
> >
> > Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2]
> > are linked below.
> >
> > [0]: https://www.phoronix.com/review/linux-os-ampereone/5
> > [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
> > [2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/
> > [3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com
> > [4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com
> > [5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com
> > [6]: https://lwn.net/Articles/1009039/
> > --
> > Mitchell Augustin
> > Software Engineer - Ubuntu Partner Engineering
> >
>
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-04-24 19:45 ` Mitchell Augustin
@ 2025-05-02 20:32 ` Mitchell Augustin
2025-05-02 20:34 ` Mitchell Augustin
1 sibling, 0 replies; 10+ messages in thread
From: Mitchell Augustin @ 2025-05-02 20:32 UTC (permalink / raw)
To: Nico Pache
Cc: akpm, 20250211152341.3431089327c5e0ec6ba6064d, 21cnbao,
aneesh.kumar, anshuman.khandual, apopple, baohua,
catalin.marinas, cl, dave.hansen, david, dev.jain, haowenchao22,
hughd, ioworker0, jack, jglisse, John Hubbard, kirill.shutemov,
linux-kernel, linux-mm, mhocko, Peter Xu, ryan.roberts, srivatsa,
surenb, vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
zhengqi.arch, Zi Yan, zokeefe, Jacob Martin,
Vanda Hendrychová
[-- Attachment #1: Type: text/plain, Size: 6639 bytes --]
Hi Nico,
As suggested, I did some new runs of my workloads with your recommended
configurations (on akpm/mm-new this time). The results for the subset that
my team is most interested in still do not show significant improvements
(in the context of the delta between the control test and the thp=always
case).
On the bright side, I did observe that the Rodinia OpenMP tests show
slightly more noticeable performance improvements when defer+collapse are
in use than without, and I also did not observe any concerning regression
indicators in any of these results.
My report for these tests is attached if you'd like to take a look. [0]
Thanks!
[0] https://pastebin.ubuntu.com/p/432KtgnXH3/
On Thu, Apr 24, 2025 at 2:45 PM Mitchell Augustin <
mitchell.augustin@canonical.com> wrote:
> Hi Nico,
>
> Thank you for the quick response and suggestions! I'll see if we can
> find some time to test our workload out with your suggested settings
> and will let you know what we find (although it may be a few weeks).
>
> -Mitchell Augustin
>
> On Thu, Apr 24, 2025 at 1:57 PM Nico Pache <npache@redhat.com> wrote:
> >
> > On Thu, Apr 24, 2025 at 12:18 PM Mitchell Augustin
> > <mitchell.augustin@canonical.com> wrote:
> > >
> > > Hello,
> > >
> > > I realize this is an older version of the series, but @Vanda
> > > Hendrychová and I started on a benchmark effort of this version prior
> > > to the most recent revision's introduction and wanted to provide our
> > > results as feedback for this discussion.
> > >
> > > For context, my team and I previously identified that some of the
> > > benchmarks outlined in this phoronix benchmark suite [0] perform more
> > > poorly with thp=madvise than thp=always - so I suspected that the
> > > THP=defer and khugepaged collapse functionality outlined in this
> > > article [6] might yield performance in between madvise and always for
> > > the following benchmarks from that suite:
> > > - GraphicsMagick (all tests), which were substantially improved when
> > > switching from thp=madvise to thp=always
> > > - 7-Zip Compression rating, which was substantially improved when
> > > switching from thp=madvise to thp=always
> > > - Compilation time tests, which were slightly improved when switching
> > > from thp=madvise to thp=always
> > >
> > > There were more benchmarks in this suite, but these three were the
> > > ones we had previously identified as being significantly impacted by
> > > the thp setting, and thus are the primary focus of our results.
> > >
> > > To analyze this, we ran the benchmarks outlined in this article on the
> > > upstream 6.14 kernel with the following configurations:
> > > - linux v6.14 thp=defer-v1: Transparent Huge Pages: defer
> > > - linux v6.14 thp=defer-v2: Transparent Huge Pages: defer
> > > - linux v6.14 thp=always: Transparent Huge Pages: always
> > > - linux v6.14 thp=never: Transparent Huge Pages: never
> > > - linux v6.14 thp=madvise: Transparent Huge Pages: madvise
> > >
> > > "defer-v1" refers to the thp collapse implementation by Nico Pache
> > > [3], and "defer-v2" refers to the implementation in this thread [4].
> > > Both use defer as implemented by series [5].
> > >
> > >
> > > Ultimately, we did observe that some of the GraphicsMagick tests
> > > performed marginally better with Nico Pache's khugepaged collapse
> > > implementation and thp=defer than with just thp=madvise, which aligns
> > > a bit with my theory - however, these improvements unfortunately did
> > > not appear to be statistically significant and gained only marginal
> > > ground in the performance gap between thp=madvise and thp=always in
> > > our workloads of interest.
> > >
> > > Results for other benchmarks in this set also did not show any
> > > conclusive performance gains from mTHP=defer (however I was not
> > > expecting those to change significantly with this series, since they
> > > weren’t heavily impacted by thp settings in my prior tests).
> > >
> > > I can't speak for the impact of this series on other workloads - I
> > > just wanted to share results for the ones we were aware of and
> > > interested in.
> > Hi Mitchell,
> >
> > Thank you very much for both testing and sharing the results! I'm glad
> > no major regressions were noted, and in some cases performance was
> > marginally better. Another good set of workloads to test for defer
> > would be latency tests... THP=always can increase PF latencies, while
> > "defer" should eliminate that penalty, with the hopes of regaining
> > some of the THP benefits after the khugepaged collapse.
> >
> > I wanted to note one thing, with the default of max_ptes_none=511 and
> > no mTHP sizes configured, the khugepaged series' (both mine and Devs)
> > should have very little impact. This is a good test of the defer
> > feature, while confirming that neither me nor Dev regressed the legacy
> > PMD khugepaged case; however, this is not a good test of the actual
> > mTHP collapsing.
> >
> > If you plan on testing the mTHP changes for performance changes, I
> > would suggest enabling all the mTHP orders and setting max_ptes_none=0
> > (Devs series requires 0 or 511 for mTHP collapse to work). Given this
> > is a new feature, it may be hard to find something to compare it to,
> > other than each other's series'. enabling defer during these tests has
> > the added benefit of pushing everything to khugepaged and really
> > stressing its mTHP collapse performance.
> >
> > Once again thank you for taking the time to test these features :)
> > -- Nico
> >
> >
> > >
> > > Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2]
> > > are linked below.
> > >
> > > [0]: https://www.phoronix.com/review/linux-os-ampereone/5
> > > [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
> > > [2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/
> > > [3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com
> > > [4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com
> > > [5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com
> > > [6]: https://lwn.net/Articles/1009039/
> > > --
> > > Mitchell Augustin
> > > Software Engineer - Ubuntu Partner Engineering
> > >
> >
>
>
> --
> Mitchell Augustin
> Software Engineer - Ubuntu Partner Engineering
>
--
[image: Canonical-20th-anniversary]
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
Email:
mitchell.augustin@canonical.com
Location:
United States of America
canonical.com
ubuntu.com
[-- Attachment #2: Type: text/html, Size: 11479 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-04-24 19:45 ` Mitchell Augustin
2025-05-02 20:32 ` Mitchell Augustin
@ 2025-05-02 20:34 ` Mitchell Augustin
1 sibling, 0 replies; 10+ messages in thread
From: Mitchell Augustin @ 2025-05-02 20:34 UTC (permalink / raw)
To: Nico Pache
Cc: akpm, 20250211152341.3431089327c5e0ec6ba6064d, 21cnbao,
aneesh.kumar, anshuman.khandual, apopple, baohua,
catalin.marinas, cl, dave.hansen, david, dev.jain, haowenchao22,
hughd, ioworker0, jack, jglisse, John Hubbard, kirill.shutemov,
linux-kernel, linux-mm, mhocko, Peter Xu, ryan.roberts, srivatsa,
surenb, vbabka, vishal.moola, wangkefeng.wang, will, willy, yang,
zhengqi.arch, Zi Yan, zokeefe, Jacob Martin,
Vanda Hendrychová
Hi Nico,
As suggested, I did some new runs of my workloads with your
recommended configurations (on akpm/mm-new this time). The results for
the subset that my team is most interested in still do not show
significant improvements (in the context of the delta between the
control test and the thp=always case).
On the bright side, I did observe that the Rodinia OpenMP tests show
slightly more noticeable performance improvements when defer+collapse
are in use than without, and I also did not observe any concerning
regression indicators in any of these results.
My report for these tests is attached if you'd like to take a look. [0] Thanks!
[0] https://pastebin.ubuntu.com/p/432KtgnXH3/
On Thu, Apr 24, 2025 at 2:45 PM Mitchell Augustin
<mitchell.augustin@canonical.com> wrote:
>
> Hi Nico,
>
> Thank you for the quick response and suggestions! I'll see if we can
> find some time to test our workload out with your suggested settings
> and will let you know what we find (although it may be a few weeks).
>
> -Mitchell Augustin
>
> On Thu, Apr 24, 2025 at 1:57 PM Nico Pache <npache@redhat.com> wrote:
> >
> > On Thu, Apr 24, 2025 at 12:18 PM Mitchell Augustin
> > <mitchell.augustin@canonical.com> wrote:
> > >
> > > Hello,
> > >
> > > I realize this is an older version of the series, but @Vanda
> > > Hendrychová and I started on a benchmark effort of this version prior
> > > to the most recent revision's introduction and wanted to provide our
> > > results as feedback for this discussion.
> > >
> > > For context, my team and I previously identified that some of the
> > > benchmarks outlined in this phoronix benchmark suite [0] perform more
> > > poorly with thp=madvise than thp=always - so I suspected that the
> > > THP=defer and khugepaged collapse functionality outlined in this
> > > article [6] might yield performance in between madvise and always for
> > > the following benchmarks from that suite:
> > > - GraphicsMagick (all tests), which were substantially improved when
> > > switching from thp=madvise to thp=always
> > > - 7-Zip Compression rating, which was substantially improved when
> > > switching from thp=madvise to thp=always
> > > - Compilation time tests, which were slightly improved when switching
> > > from thp=madvise to thp=always
> > >
> > > There were more benchmarks in this suite, but these three were the
> > > ones we had previously identified as being significantly impacted by
> > > the thp setting, and thus are the primary focus of our results.
> > >
> > > To analyze this, we ran the benchmarks outlined in this article on the
> > > upstream 6.14 kernel with the following configurations:
> > > - linux v6.14 thp=defer-v1: Transparent Huge Pages: defer
> > > - linux v6.14 thp=defer-v2: Transparent Huge Pages: defer
> > > - linux v6.14 thp=always: Transparent Huge Pages: always
> > > - linux v6.14 thp=never: Transparent Huge Pages: never
> > > - linux v6.14 thp=madvise: Transparent Huge Pages: madvise
> > >
> > > "defer-v1" refers to the thp collapse implementation by Nico Pache
> > > [3], and "defer-v2" refers to the implementation in this thread [4].
> > > Both use defer as implemented by series [5].
> > >
> > >
> > > Ultimately, we did observe that some of the GraphicsMagick tests
> > > performed marginally better with Nico Pache's khugepaged collapse
> > > implementation and thp=defer than with just thp=madvise, which aligns
> > > a bit with my theory - however, these improvements unfortunately did
> > > not appear to be statistically significant and gained only marginal
> > > ground in the performance gap between thp=madvise and thp=always in
> > > our workloads of interest.
> > >
> > > Results for other benchmarks in this set also did not show any
> > > conclusive performance gains from mTHP=defer (however I was not
> > > expecting those to change significantly with this series, since they
> > > weren’t heavily impacted by thp settings in my prior tests).
> > >
> > > I can't speak for the impact of this series on other workloads - I
> > > just wanted to share results for the ones we were aware of and
> > > interested in.
> > Hi Mitchell,
> >
> > Thank you very much for both testing and sharing the results! I'm glad
> > no major regressions were noted, and in some cases performance was
> > marginally better. Another good set of workloads to test for defer
> > would be latency tests... THP=always can increase PF latencies, while
> > "defer" should eliminate that penalty, with the hopes of regaining
> > some of the THP benefits after the khugepaged collapse.
> >
> > I wanted to note one thing, with the default of max_ptes_none=511 and
> > no mTHP sizes configured, the khugepaged series' (both mine and Devs)
> > should have very little impact. This is a good test of the defer
> > feature, while confirming that neither me nor Dev regressed the legacy
> > PMD khugepaged case; however, this is not a good test of the actual
> > mTHP collapsing.
> >
> > If you plan on testing the mTHP changes for performance changes, I
> > would suggest enabling all the mTHP orders and setting max_ptes_none=0
> > (Devs series requires 0 or 511 for mTHP collapse to work). Given this
> > is a new feature, it may be hard to find something to compare it to,
> > other than each other's series'. enabling defer during these tests has
> > the added benefit of pushing everything to khugepaged and really
> > stressing its mTHP collapse performance.
> >
> > Once again thank you for taking the time to test these features :)
> > -- Nico
> >
> >
> > >
> > > Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2]
> > > are linked below.
> > >
> > > [0]: https://www.phoronix.com/review/linux-os-ampereone/5
> > > [1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
> > > [2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/
> > > [3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com
> > > [4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com
> > > [5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com
> > > [6]: https://lwn.net/Articles/1009039/
> > > --
> > > Mitchell Augustin
> > > Software Engineer - Ubuntu Partner Engineering
> > >
> >
>
>
> --
> Mitchell Augustin
> Software Engineer - Ubuntu Partner Engineering
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
@ 2025-02-11 11:13 Dev Jain
2025-02-11 23:23 ` Andrew Morton
2025-02-15 1:47 ` Nico Pache
0 siblings, 2 replies; 10+ messages in thread
From: Dev Jain @ 2025-02-11 11:13 UTC (permalink / raw)
To: akpm, david, willy, kirill.shutemov
Cc: npache, ryan.roberts, anshuman.khandual, catalin.marinas, cl,
vbabka, mhocko, apopple, dave.hansen, will, baohua, jack,
srivatsa, haowenchao22, hughd, aneesh.kumar, yang, peterx,
ioworker0, wangkefeng.wang, ziy, jglisse, surenb, vishal.moola,
zokeefe, zhengqi.arch, jhubbard, 21cnbao, linux-mm, linux-kernel,
Dev Jain
This patchset extends khugepaged from collapsing only PMD-sized THPs to
collapsing anonymous mTHPs.
mTHPs were introduced in the kernel to improve memory management by allocating
chunks of larger memory, so as to reduce number of page faults, TLB misses (due
to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
is often lost due to CoW, swap-in/out, and when the kernel just cannot find
enough physically contiguous memory to allocate on fault. Henceforth, there is a
need to regain mTHPs in the system asynchronously. This work is an attempt in
this direction, starting with anonymous folios.
In the fault handler, we select the THP order in a greedy manner; the same has
been used here, along with the same sysfs interface to control the order of
collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().
---------------------------------------------------------
Testing
---------------------------------------------------------
The set has been build tested on x86_64.
For Aarch64,
1. mm-selftests: No regressions.
2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.
This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22215df22d).
v1->v2:
- Handle VMAs less than PMD size (patches 12-15)
- Do not add mTHP into deferred split queue
- Drop lock optimization and collapse mTHP under mmap_write_lock()
- Define policy on what to do when we encounter a folio order larger than
the order we are scanning for
- Prevent the creep problem by enforcing tunable simplification
- Update Documentation
- Drop patch 12 from v1 updating selftest w.r.t the creep problem
- Drop patch 1 from v1
v1:
https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@arm.com/
Dev Jain (17):
khugepaged: Generalize alloc_charge_folio()
khugepaged: Generalize hugepage_vma_revalidate()
khugepaged: Generalize __collapse_huge_page_swapin()
khugepaged: Generalize __collapse_huge_page_isolate()
khugepaged: Generalize __collapse_huge_page_copy()
khugepaged: Abstract PMD-THP collapse
khugepaged: Scan PTEs order-wise
khugepaged: Introduce vma_collapse_anon_folio()
khugepaged: Define collapse policy if a larger folio is already mapped
khugepaged: Exit early on fully-mapped aligned mTHP
khugepaged: Enable sysfs to control order of collapse
khugepaged: Enable variable-sized VMA collapse
khugepaged: Lock all VMAs mapping the PTE table
khugepaged: Reset scan address to correct alignment
khugepaged: Delay cond_resched()
khugepaged: Implement strict policy for mTHP collapse
Documentation: transhuge: Define khugepaged mTHP collapse policy
Documentation/admin-guide/mm/transhuge.rst | 49 +-
include/linux/huge_mm.h | 2 +
mm/huge_memory.c | 4 +
mm/khugepaged.c | 603 ++++++++++++++++-----
4 files changed, 511 insertions(+), 147 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-02-11 11:13 Dev Jain
@ 2025-02-11 23:23 ` Andrew Morton
2025-02-12 4:18 ` Dev Jain
2025-02-15 1:47 ` Nico Pache
1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2025-02-11 23:23 UTC (permalink / raw)
To: Dev Jain
Cc: david, willy, kirill.shutemov, npache, ryan.roberts,
anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
21cnbao, linux-mm, linux-kernel
On Tue, 11 Feb 2025 16:43:09 +0530 Dev Jain <dev.jain@arm.com> wrote:
> This patchset extends khugepaged from collapsing only PMD-sized THPs to
> collapsing anonymous mTHPs.
>
> mTHPs were introduced in the kernel to improve memory management by allocating
> chunks of larger memory, so as to reduce number of page faults, TLB misses (due
> to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
> is often lost due to CoW, swap-in/out, and when the kernel just cannot find
> enough physically contiguous memory to allocate on fault. Henceforth, there is a
> need to regain mTHPs in the system asynchronously. This work is an attempt in
> this direction, starting with anonymous folios.
>
> In the fault handler, we select the THP order in a greedy manner; the same has
> been used here, along with the same sysfs interface to control the order of
> collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().
>
> ---------------------------------------------------------
> Testing
> ---------------------------------------------------------
>
> The set has been build tested on x86_64.
> For Aarch64,
> 1. mm-selftests: No regressions.
> 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
> aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
> and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.
It would be nice to provide some evidence that this patchset actually
makes Linux better for our users, and by how much.
Thanks, I think I'll skip v2 and shall await reviewer input.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-02-11 23:23 ` Andrew Morton
@ 2025-02-12 4:18 ` Dev Jain
0 siblings, 0 replies; 10+ messages in thread
From: Dev Jain @ 2025-02-12 4:18 UTC (permalink / raw)
To: Andrew Morton
Cc: david, willy, kirill.shutemov, npache, ryan.roberts,
anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
21cnbao, linux-mm, linux-kernel
On 12/02/25 4:53 am, Andrew Morton wrote:
> On Tue, 11 Feb 2025 16:43:09 +0530 Dev Jain <dev.jain@arm.com> wrote:
>
>> This patchset extends khugepaged from collapsing only PMD-sized THPs to
>> collapsing anonymous mTHPs.
>>
>> mTHPs were introduced in the kernel to improve memory management by allocating
>> chunks of larger memory, so as to reduce number of page faults, TLB misses (due
>> to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
>> is often lost due to CoW, swap-in/out, and when the kernel just cannot find
>> enough physically contiguous memory to allocate on fault. Henceforth, there is a
>> need to regain mTHPs in the system asynchronously. This work is an attempt in
>> this direction, starting with anonymous folios.
>>
>> In the fault handler, we select the THP order in a greedy manner; the same has
>> been used here, along with the same sysfs interface to control the order of
>> collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().
>>
>> ---------------------------------------------------------
>> Testing
>> ---------------------------------------------------------
>>
>> The set has been build tested on x86_64.
>> For Aarch64,
>> 1. mm-selftests: No regressions.
>> 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
>> aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
>> and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.
>
> It would be nice to provide some evidence that this patchset actually
> makes Linux better for our users, and by how much.
>
> Thanks, I think I'll skip v2 and shall await reviewer input.
Hi Andrew, thanks for your reply.
Although the introduction of mTHPs leads to the natural conclusion of
extending khugepaged to support mTHP collapse, I'll try to get some
performance statistics out.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-02-11 11:13 Dev Jain
2025-02-11 23:23 ` Andrew Morton
@ 2025-02-15 1:47 ` Nico Pache
2025-02-15 7:36 ` Dev Jain
1 sibling, 1 reply; 10+ messages in thread
From: Nico Pache @ 2025-02-15 1:47 UTC (permalink / raw)
To: Dev Jain
Cc: akpm, david, willy, kirill.shutemov, ryan.roberts,
anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
21cnbao, linux-mm, linux-kernel
Hi Dev,
I tried to run your kernel to get some performance numbers out of it,
but ran into the following issue while running my defer-mthp-test.sh
workload.
[ 297.393032] =====================================
[ 297.393618] WARNING: bad unlock balance detected!
[ 297.394201] 6.14.0-rc2mthpDEV #2 Not tainted
[ 297.394732] -------------------------------------
[ 297.395421] khugepaged/111 is trying to release lock (&mm->mmap_lock) at:
[ 297.396509] [<ffffffff947cb76a>] khugepaged+0x23a/0xb40
[ 297.397205] but there are no more locks to release!
[ 297.397865]
[ 297.397865] other info that might help us debug this:
[ 297.398684] no locks held by khugepaged/111.
[ 297.399155]
[ 297.399155] stack backtrace:
[ 297.399591] CPU: 10 UID: 0 PID: 111 Comm: khugepaged Not tainted
6.14.0-rc2mthpDEV #2
[ 297.399593] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.3-2.fc40 04/01/2014
[ 297.399595] Call Trace:
[ 297.399599] <TASK>
[ 297.399602] dump_stack_lvl+0x6e/0xa0
[ 297.399607] ? khugepaged+0x23a/0xb40
[ 297.399610] print_unlock_imbalance_bug.part.0+0xfb/0x110
[ 297.399612] ? khugepaged+0x23a/0xb40
[ 297.399614] lock_release+0x283/0x3f0
[ 297.399620] up_read+0x1b/0x30
[ 297.399622] khugepaged+0x23a/0xb40
[ 297.399631] ? __pfx_khugepaged+0x10/0x10
[ 297.399633] kthread+0xf2/0x240
[ 297.399636] ? __pfx_kthread+0x10/0x10
[ 297.399638] ret_from_fork+0x34/0x50
[ 297.399640] ? __pfx_kthread+0x10/0x10
[ 297.399642] ret_from_fork_asm+0x1a/0x30
[ 297.399649] </TASK>
[ 297.505555] ------------[ cut here ]------------
[ 297.506044] DEBUG_RWSEMS_WARN_ON(tmp < 0): count =
0xffffffffffffff00, magic = 0xffff8c6e03bc1f88, owner = 0x1, curr
0xffff8c6e0eccb700, list empty
[ 297.507362] WARNING: CPU: 8 PID: 1946 at
kernel/locking/rwsem.c:1346 __up_read+0x1ba/0x220
[ 297.508220] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 rfkill nf_tables intel_rapl_msr intel_rapl_common
kvm_amd iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm i2c_i801
i2c_smbus lpc_ich virtio_net net_failover failover virtio_balloon
joydev fuse loop nfnetlink zram xfs polyval_clmulni polyval_generic
ghash_clmulni_intel sha512_ssse3 sha256_ssse3 virtio_console
virtio_blk sha1_ssse3 serio_raw qemu_fw_cfg
[ 297.513474] CPU: 8 UID: 0 PID: 1946 Comm: thp_test Not tainted
6.14.0-rc2mthpDEV #2
[ 297.514314] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.3-2.fc40 04/01/2014
[ 297.515265] RIP: 0010:__up_read+0x1ba/0x220
[ 297.515756] Code: c6 78 8b e1 95 48 c7 c7 88 0e d3 95 48 39 c2 48
c7 c2 be 39 e4 95 48 c7 c0 29 8b e1 95 48 0f 44 c2 48 8b 13 50 e8 e6
44 f5 ff <0f> 0b 58 e9 20 ff ff ff 48 8b 57 60 48 8d 47 60 4c 8b 47 08
c6 05
[ 297.517659] RSP: 0018:ffffa8a943533ac8 EFLAGS: 00010282
[ 297.518209] RAX: 0000000000000000 RBX: ffff8c6e03bc1f88 RCX: 0000000000000000
[ 297.518884] RDX: ffff8c7366ff0980 RSI: ffff8c7366fe1a80 RDI: ffff8c7366fe1a80
[ 297.519577] RBP: ffffa8a943533b58 R08: 0000000000000000 R09: 0000000000000001
[ 297.520272] R10: 0000000000000000 R11: 0770076d07650720 R12: ffffa8a943533b10
[ 297.520949] R13: ffff8c6e03bc1f88 R14: ffffa8a943533b58 R15: ffffa8a943533b10
[ 297.521651] FS: 00007f24de01b740(0000) GS:ffff8c7366e00000(0000)
knlGS:0000000000000000
[ 297.522425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 297.522990] CR2: 0000000a7ffef000 CR3: 000000010d9d6000 CR4: 0000000000750ef0
[ 297.523799] PKRU: 55555554
[ 297.524100] Call Trace:
[ 297.524367] <TASK>
[ 297.524597] ? __warn.cold+0xb7/0x151
[ 297.525072] ? __up_read+0x1ba/0x220
[ 297.525442] ? report_bug+0xff/0x140
[ 297.525804] ? console_unlock+0x9d/0x150
[ 297.526233] ? handle_bug+0x58/0x90
[ 297.526590] ? exc_invalid_op+0x17/0x70
[ 297.526993] ? asm_exc_invalid_op+0x1a/0x20
[ 297.527420] ? __up_read+0x1ba/0x220
[ 297.527783] ? __up_read+0x1ba/0x220
[ 297.528160] vms_complete_munmap_vmas+0x19c/0x1f0
[ 297.528628] do_vmi_align_munmap+0x20a/0x280
[ 297.529069] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.529552] do_vmi_munmap+0xd0/0x190
[ 297.529920] __vm_munmap+0xb1/0x1b0
[ 297.530293] __x64_sys_munmap+0x1b/0x30
[ 297.530677] do_syscall_64+0x95/0x180
[ 297.531058] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.531534] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 297.532167] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.532640] ? syscall_exit_to_user_mode+0x97/0x290
[ 297.533226] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.533701] ? do_syscall_64+0xa1/0x180
[ 297.534097] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.534587] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 297.535129] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.535603] ? syscall_exit_to_user_mode+0x97/0x290
[ 297.536092] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.536568] ? do_syscall_64+0xa1/0x180
[ 297.536954] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.537444] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 297.537936] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.538524] ? syscall_exit_to_user_mode+0x97/0x290
[ 297.539044] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.539526] ? do_syscall_64+0xa1/0x180
[ 297.539931] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.540597] ? do_user_addr_fault+0x5a9/0x8a0
[ 297.541102] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.541580] ? trace_hardirqs_off+0x4b/0xc0
[ 297.542011] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.542488] ? lockdep_hardirqs_on_prepare+0xdb/0x190
[ 297.542991] ? srso_alias_return_thunk+0x5/0xfbef5
[ 297.543466] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 297.543960] RIP: 0033:0x7f24de1367eb
[ 297.544344] Code: 73 01 c3 48 8b 0d 2d f6 0c 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd f5 0c 00 f7 d8 64 89
01 48
[ 297.546074] RSP: 002b:00007ffc7bb2e2b8 EFLAGS: 00000206 ORIG_RAX:
000000000000000b
[ 297.546796] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f24de1367eb
[ 297.547488] RDX: 0000000080000000 RSI: 0000000080000000 RDI: 0000000480000000
[ 297.548182] RBP: 00007ffc7bb2e390 R08: 0000000000000064 R09: 00000000fffffffe
[ 297.548884] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000006
[ 297.549594] R13: 0000000000000000 R14: 00007f24de258000 R15: 0000000000403e00
[ 297.550292] </TASK>
[ 297.550530] irq event stamp: 64417291
[ 297.550903] hardirqs last enabled at (64417291):
[<ffffffff94749232>] seqcount_lockdep_reader_access+0x82/0x90
[ 297.551859] hardirqs last disabled at (64417290):
[<ffffffff947491fe>] seqcount_lockdep_reader_access+0x4e/0x90
[ 297.552810] softirqs last enabled at (64413640):
[<ffffffff943bf3c2>] __irq_exit_rcu+0xe2/0x100
[ 297.553654] softirqs last disabled at (64413627):
[<ffffffff943bf3c2>] __irq_exit_rcu+0xe2/0x100
[ 297.554504] ---[ end trace 0000000000000000 ]---
On Tue, Feb 11, 2025 at 4:13 AM Dev Jain <dev.jain@arm.com> wrote:
>
> This patchset extends khugepaged from collapsing only PMD-sized THPs to
> collapsing anonymous mTHPs.
>
> mTHPs were introduced in the kernel to improve memory management by allocating
> chunks of larger memory, so as to reduce number of page faults, TLB misses (due
> to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
> is often lost due to CoW, swap-in/out, and when the kernel just cannot find
> enough physically contiguous memory to allocate on fault. Henceforth, there is a
> need to regain mTHPs in the system asynchronously. This work is an attempt in
> this direction, starting with anonymous folios.
>
> In the fault handler, we select the THP order in a greedy manner; the same has
> been used here, along with the same sysfs interface to control the order of
> collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().
>
> ---------------------------------------------------------
> Testing
> ---------------------------------------------------------
>
> The set has been build tested on x86_64.
> For Aarch64,
> 1. mm-selftests: No regressions.
> 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
> aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
> and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.
>
> This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22215df22d).
>
> v1->v2:
> - Handle VMAs less than PMD size (patches 12-15)
> - Do not add mTHP into deferred split queue
> - Drop lock optimization and collapse mTHP under mmap_write_lock()
> - Define policy on what to do when we encounter a folio order larger than
> the order we are scanning for
> - Prevent the creep problem by enforcing tunable simplification
> - Update Documentation
> - Drop patch 12 from v1 updating selftest w.r.t the creep problem
> - Drop patch 1 from v1
>
> v1:
> https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@arm.com/
>
> Dev Jain (17):
> khugepaged: Generalize alloc_charge_folio()
> khugepaged: Generalize hugepage_vma_revalidate()
> khugepaged: Generalize __collapse_huge_page_swapin()
> khugepaged: Generalize __collapse_huge_page_isolate()
> khugepaged: Generalize __collapse_huge_page_copy()
> khugepaged: Abstract PMD-THP collapse
> khugepaged: Scan PTEs order-wise
> khugepaged: Introduce vma_collapse_anon_folio()
> khugepaged: Define collapse policy if a larger folio is already mapped
> khugepaged: Exit early on fully-mapped aligned mTHP
> khugepaged: Enable sysfs to control order of collapse
> khugepaged: Enable variable-sized VMA collapse
> khugepaged: Lock all VMAs mapping the PTE table
> khugepaged: Reset scan address to correct alignment
> khugepaged: Delay cond_resched()
> khugepaged: Implement strict policy for mTHP collapse
> Documentation: transhuge: Define khugepaged mTHP collapse policy
>
> Documentation/admin-guide/mm/transhuge.rst | 49 +-
> include/linux/huge_mm.h | 2 +
> mm/huge_memory.c | 4 +
> mm/khugepaged.c | 603 ++++++++++++++++-----
> 4 files changed, 511 insertions(+), 147 deletions(-)
>
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
2025-02-15 1:47 ` Nico Pache
@ 2025-02-15 7:36 ` Dev Jain
0 siblings, 0 replies; 10+ messages in thread
From: Dev Jain @ 2025-02-15 7:36 UTC (permalink / raw)
To: Nico Pache
Cc: akpm, david, willy, kirill.shutemov, ryan.roberts,
anshuman.khandual, catalin.marinas, cl, vbabka, mhocko, apopple,
dave.hansen, will, baohua, jack, srivatsa, haowenchao22, hughd,
aneesh.kumar, yang, peterx, ioworker0, wangkefeng.wang, ziy,
jglisse, surenb, vishal.moola, zokeefe, zhengqi.arch, jhubbard,
21cnbao, linux-mm, linux-kernel
On 15/02/25 7:17 am, Nico Pache wrote:
> Hi Dev,
>
> I tried to run your kernel to get some performance numbers out of it,
> but ran into the following issue while running my defer-mthp-test.sh
> workload.
>
> [ 297.393032] =====================================
> [ 297.393618] WARNING: bad unlock balance detected!
> [ 297.394201] 6.14.0-rc2mthpDEV #2 Not tainted
> [ 297.394732] -------------------------------------
> [ 297.395421] khugepaged/111 is trying to release lock (&mm->mmap_lock) at:
> [ 297.396509] [<ffffffff947cb76a>] khugepaged+0x23a/0xb40
> [ 297.397205] but there are no more locks to release!
> [ 297.397865]
> [ 297.397865] other info that might help us debug this:
> [ 297.398684] no locks held by khugepaged/111.
> [ 297.399155]
> [ 297.399155] stack backtrace:
> [ 297.399591] CPU: 10 UID: 0 PID: 111 Comm: khugepaged Not tainted
> 6.14.0-rc2mthpDEV #2
> [ 297.399593] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-2.fc40 04/01/2014
> [ 297.399595] Call Trace:
> [ 297.399599] <TASK>
> [ 297.399602] dump_stack_lvl+0x6e/0xa0
> [ 297.399607] ? khugepaged+0x23a/0xb40
> [ 297.399610] print_unlock_imbalance_bug.part.0+0xfb/0x110
> [ 297.399612] ? khugepaged+0x23a/0xb40
> [ 297.399614] lock_release+0x283/0x3f0
> [ 297.399620] up_read+0x1b/0x30
> [ 297.399622] khugepaged+0x23a/0xb40
> [ 297.399631] ? __pfx_khugepaged+0x10/0x10
> [ 297.399633] kthread+0xf2/0x240
> [ 297.399636] ? __pfx_kthread+0x10/0x10
> [ 297.399638] ret_from_fork+0x34/0x50
> [ 297.399640] ? __pfx_kthread+0x10/0x10
> [ 297.399642] ret_from_fork_asm+0x1a/0x30
> [ 297.399649] </TASK>
> [ 297.505555] ------------[ cut here ]------------
> [ 297.506044] DEBUG_RWSEMS_WARN_ON(tmp < 0): count =
> 0xffffffffffffff00, magic = 0xffff8c6e03bc1f88, owner = 0x1, curr
> 0xffff8c6e0eccb700, list empty
> [ 297.507362] WARNING: CPU: 8 PID: 1946 at
> kernel/locking/rwsem.c:1346 __up_read+0x1ba/0x220
> [ 297.508220] Modules linked in: nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 rfkill nf_tables intel_rapl_msr intel_rapl_common
> kvm_amd iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm i2c_i801
> i2c_smbus lpc_ich virtio_net net_failover failover virtio_balloon
> joydev fuse loop nfnetlink zram xfs polyval_clmulni polyval_generic
> ghash_clmulni_intel sha512_ssse3 sha256_ssse3 virtio_console
> virtio_blk sha1_ssse3 serio_raw qemu_fw_cfg
> [ 297.513474] CPU: 8 UID: 0 PID: 1946 Comm: thp_test Not tainted
> 6.14.0-rc2mthpDEV #2
> [ 297.514314] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-2.fc40 04/01/2014
> [ 297.515265] RIP: 0010:__up_read+0x1ba/0x220
> [ 297.515756] Code: c6 78 8b e1 95 48 c7 c7 88 0e d3 95 48 39 c2 48
> c7 c2 be 39 e4 95 48 c7 c0 29 8b e1 95 48 0f 44 c2 48 8b 13 50 e8 e6
> 44 f5 ff <0f> 0b 58 e9 20 ff ff ff 48 8b 57 60 48 8d 47 60 4c 8b 47 08
> c6 05
> [ 297.517659] RSP: 0018:ffffa8a943533ac8 EFLAGS: 00010282
> [ 297.518209] RAX: 0000000000000000 RBX: ffff8c6e03bc1f88 RCX: 0000000000000000
> [ 297.518884] RDX: ffff8c7366ff0980 RSI: ffff8c7366fe1a80 RDI: ffff8c7366fe1a80
> [ 297.519577] RBP: ffffa8a943533b58 R08: 0000000000000000 R09: 0000000000000001
> [ 297.520272] R10: 0000000000000000 R11: 0770076d07650720 R12: ffffa8a943533b10
> [ 297.520949] R13: ffff8c6e03bc1f88 R14: ffffa8a943533b58 R15: ffffa8a943533b10
> [ 297.521651] FS: 00007f24de01b740(0000) GS:ffff8c7366e00000(0000)
> knlGS:0000000000000000
> [ 297.522425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 297.522990] CR2: 0000000a7ffef000 CR3: 000000010d9d6000 CR4: 0000000000750ef0
> [ 297.523799] PKRU: 55555554
> [ 297.524100] Call Trace:
> [ 297.524367] <TASK>
> [ 297.524597] ? __warn.cold+0xb7/0x151
> [ 297.525072] ? __up_read+0x1ba/0x220
> [ 297.525442] ? report_bug+0xff/0x140
> [ 297.525804] ? console_unlock+0x9d/0x150
> [ 297.526233] ? handle_bug+0x58/0x90
> [ 297.526590] ? exc_invalid_op+0x17/0x70
> [ 297.526993] ? asm_exc_invalid_op+0x1a/0x20
> [ 297.527420] ? __up_read+0x1ba/0x220
> [ 297.527783] ? __up_read+0x1ba/0x220
> [ 297.528160] vms_complete_munmap_vmas+0x19c/0x1f0
> [ 297.528628] do_vmi_align_munmap+0x20a/0x280
> [ 297.529069] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.529552] do_vmi_munmap+0xd0/0x190
> [ 297.529920] __vm_munmap+0xb1/0x1b0
> [ 297.530293] __x64_sys_munmap+0x1b/0x30
> [ 297.530677] do_syscall_64+0x95/0x180
> [ 297.531058] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.531534] ? lockdep_hardirqs_on_prepare+0xdb/0x190
> [ 297.532167] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.532640] ? syscall_exit_to_user_mode+0x97/0x290
> [ 297.533226] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.533701] ? do_syscall_64+0xa1/0x180
> [ 297.534097] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.534587] ? lockdep_hardirqs_on_prepare+0xdb/0x190
> [ 297.535129] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.535603] ? syscall_exit_to_user_mode+0x97/0x290
> [ 297.536092] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.536568] ? do_syscall_64+0xa1/0x180
> [ 297.536954] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.537444] ? lockdep_hardirqs_on_prepare+0xdb/0x190
> [ 297.537936] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.538524] ? syscall_exit_to_user_mode+0x97/0x290
> [ 297.539044] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.539526] ? do_syscall_64+0xa1/0x180
> [ 297.539931] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.540597] ? do_user_addr_fault+0x5a9/0x8a0
> [ 297.541102] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.541580] ? trace_hardirqs_off+0x4b/0xc0
> [ 297.542011] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.542488] ? lockdep_hardirqs_on_prepare+0xdb/0x190
> [ 297.542991] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 297.543466] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 297.543960] RIP: 0033:0x7f24de1367eb
> [ 297.544344] Code: 73 01 c3 48 8b 0d 2d f6 0c 00 f7 d8 64 89 01 48
> 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd f5 0c 00 f7 d8 64 89
> 01 48
> [ 297.546074] RSP: 002b:00007ffc7bb2e2b8 EFLAGS: 00000206 ORIG_RAX:
> 000000000000000b
> [ 297.546796] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f24de1367eb
> [ 297.547488] RDX: 0000000080000000 RSI: 0000000080000000 RDI: 0000000480000000
> [ 297.548182] RBP: 00007ffc7bb2e390 R08: 0000000000000064 R09: 00000000fffffffe
> [ 297.548884] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000006
> [ 297.549594] R13: 0000000000000000 R14: 00007f24de258000 R15: 0000000000403e00
> [ 297.550292] </TASK>
> [ 297.550530] irq event stamp: 64417291
> [ 297.550903] hardirqs last enabled at (64417291):
> [<ffffffff94749232>] seqcount_lockdep_reader_access+0x82/0x90
> [ 297.551859] hardirqs last disabled at (64417290):
> [<ffffffff947491fe>] seqcount_lockdep_reader_access+0x4e/0x90
> [ 297.552810] softirqs last enabled at (64413640):
> [<ffffffff943bf3c2>] __irq_exit_rcu+0xe2/0x100
> [ 297.553654] softirqs last disabled at (64413627):
> [<ffffffff943bf3c2>] __irq_exit_rcu+0xe2/0x100
> [ 297.554504] ---[ end trace 0000000000000000 ]---
Thanks for testing. Hmm...can you do this: Drop patches 12-16, and
instead of 16, apply this:
commit 112f4fa8e92b2bb93051595b2a804b3546b3545a
Author: Dev Jain <dev.jain@arm.com>
Date: Fri Jan 24 10:52:15 2025 +0000
khugepaged: Implement strict policy for mTHP collapse
Signed-off-by: Dev Jain <dev.jain@arm.com>
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 37cfa7beba3d..1caf9eb3bfd9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -417,6 +417,17 @@ static inline int
hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
static bool thp_enabled(void)
{
+ bool anon_pmd_enabled = (test_bit(PMD_ORDER, &huge_anon_orders_always) ||
+ test_bit(PMD_ORDER, &huge_anon_orders_madvise) ||
+ (test_bit(PMD_ORDER, &huge_anon_orders_inherit) &&
+ hugepage_global_enabled()));
+
+ /*
+ * If PMD_ORDER is ineligible for collapse, check if mTHP collapse
policy is obeyed;
+ * see Documentation/admin-guide/transhuge.rst
+ */
+ bool anon_collapse_mthp = (khugepaged_max_ptes_none == 0 ||
+ khugepaged_max_ptes_none == HPAGE_PMD_NR - 1);
/*
* We cover the anon, shmem and the file-backed case here; file-backed
* hugepages, when configured in, are determined by the global control.
@@ -427,8 +438,9 @@ static bool thp_enabled(void)
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
hugepage_global_enabled())
return true;
- if (huge_anon_orders_always || huge_anon_orders_madvise ||
- (huge_anon_orders_inherit && hugepage_global_enabled()))
+ if ((huge_anon_orders_always || huge_anon_orders_madvise ||
+ (huge_anon_orders_inherit && hugepage_global_enabled())) &&
+ (anon_pmd_enabled || anon_collapse_mthp))
return true;
if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
return true;
@@ -578,13 +590,16 @@ static int __collapse_huge_page_isolate(struct
vm_area_struct *vma,
pte_t *_pte;
int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
bool writable = false;
- unsigned int max_ptes_shared = khugepaged_max_ptes_shared >>
(HPAGE_PMD_ORDER - order);
+ unsigned int max_ptes_shared = khugepaged_max_ptes_shared;
unsigned int max_ptes_none = khugepaged_max_ptes_none >>
(HPAGE_PMD_ORDER - order);
bool all_pfns_present = true;
bool all_pfns_contig = true;
bool first_pfn_aligned = true;
pte_t prev_pteval;
+ if (order != HPAGE_PMD_ORDER)
+ max_ptes_shared = 0;
+
for (_pte = pte; _pte < pte + (1UL << order);
_pte++, address += PAGE_SIZE) {
pte_t pteval = ptep_get(_pte);
@@ -1442,11 +1457,16 @@ static int hpage_collapse_scan_pmd(struct
mm_struct *mm,
if (!cc->is_khugepaged)
order = HPAGE_PMD_ORDER;
+ max_ptes_none = khugepaged_max_ptes_none;
+ max_ptes_shared = khugepaged_max_ptes_shared;
+ max_ptes_swap = khugepaged_max_ptes_swap;
+
scan_pte_range:
- max_ptes_shared = khugepaged_max_ptes_shared >> (HPAGE_PMD_ORDER - order);
+ if (order != HPAGE_PMD_ORDER)
+ max_ptes_shared = max_ptes_swap = 0;
+
max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order);
- max_ptes_swap = khugepaged_max_ptes_swap >> (HPAGE_PMD_ORDER - order);
referenced = 0, shared = 0, none_or_zero = 0, unmapped = 0;
all_pfns_present = true, all_pfns_contig = true, first_pfn_aligned =
true;
@@ -2636,6 +2656,11 @@ static unsigned int
khugepaged_scan_mm_slot(unsigned int pages, int *result,
struct mm_struct *mm;
struct vm_area_struct *vma;
int progress = 0;
+ bool collapse_mthp = true;
+
+ /* Avoid the creep problem; see Documentation/admin-guide/transhuge.rst */
+ if (khugepaged_max_ptes_none && khugepaged_max_ptes_none !=
HPAGE_PMD_NR - 1)
+ collapse_mthp = false;
VM_BUG_ON(!pages);
lockdep_assert_held(&khugepaged_mm_lock);
The dropped patches are the variable-sized VMA extension, and
implementing that was quite a task, I ran into a lot of problems...and
also, David notes that we may have to take the rmap locks in patch 13 of
my v2 after all...in any case the implementation can be brute-forced by
implementing a function akin to mm_take_all_locks().
Also, the policy I am implementing for large folio skip is different
from v1; now I am not necessarily skipping if I see a large folio. So
this may increase the latency of my method too, so it may not be a fair
comparison, although I don't think this should cause a major difference.
>
>
>
>
> On Tue, Feb 11, 2025 at 4:13 AM Dev Jain <dev.jain@arm.com> wrote:
>>
>> This patchset extends khugepaged from collapsing only PMD-sized THPs to
>> collapsing anonymous mTHPs.
>>
>> mTHPs were introduced in the kernel to improve memory management by allocating
>> chunks of larger memory, so as to reduce number of page faults, TLB misses (due
>> to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
>> is often lost due to CoW, swap-in/out, and when the kernel just cannot find
>> enough physically contiguous memory to allocate on fault. Henceforth, there is a
>> need to regain mTHPs in the system asynchronously. This work is an attempt in
>> this direction, starting with anonymous folios.
>>
>> In the fault handler, we select the THP order in a greedy manner; the same has
>> been used here, along with the same sysfs interface to control the order of
>> collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().
>>
>> ---------------------------------------------------------
>> Testing
>> ---------------------------------------------------------
>>
>> The set has been build tested on x86_64.
>> For Aarch64,
>> 1. mm-selftests: No regressions.
>> 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
>> aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
>> and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.
>>
>> This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22215df22d).
>>
>> v1->v2:
>> - Handle VMAs less than PMD size (patches 12-15)
>> - Do not add mTHP into deferred split queue
>> - Drop lock optimization and collapse mTHP under mmap_write_lock()
>> - Define policy on what to do when we encounter a folio order larger than
>> the order we are scanning for
>> - Prevent the creep problem by enforcing tunable simplification
>> - Update Documentation
>> - Drop patch 12 from v1 updating selftest w.r.t the creep problem
>> - Drop patch 1 from v1
>>
>> v1:
>> https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@arm.com/
>>
>> Dev Jain (17):
>> khugepaged: Generalize alloc_charge_folio()
>> khugepaged: Generalize hugepage_vma_revalidate()
>> khugepaged: Generalize __collapse_huge_page_swapin()
>> khugepaged: Generalize __collapse_huge_page_isolate()
>> khugepaged: Generalize __collapse_huge_page_copy()
>> khugepaged: Abstract PMD-THP collapse
>> khugepaged: Scan PTEs order-wise
>> khugepaged: Introduce vma_collapse_anon_folio()
>> khugepaged: Define collapse policy if a larger folio is already mapped
>> khugepaged: Exit early on fully-mapped aligned mTHP
>> khugepaged: Enable sysfs to control order of collapse
>> khugepaged: Enable variable-sized VMA collapse
>> khugepaged: Lock all VMAs mapping the PTE table
>> khugepaged: Reset scan address to correct alignment
>> khugepaged: Delay cond_resched()
>> khugepaged: Implement strict policy for mTHP collapse
>> Documentation: transhuge: Define khugepaged mTHP collapse policy
>>
>> Documentation/admin-guide/mm/transhuge.rst | 49 +-
>> include/linux/huge_mm.h | 2 +
>> mm/huge_memory.c | 4 +
>> mm/khugepaged.c | 603 ++++++++++++++++-----
>> 4 files changed, 511 insertions(+), 147 deletions(-)
>>
>> --
>> 2.30.2
>>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-05-02 20:35 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-24 18:10 [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse Mitchell Augustin
2025-04-24 18:56 ` Nico Pache
2025-04-24 19:45 ` Mitchell Augustin
2025-05-02 20:32 ` Mitchell Augustin
2025-05-02 20:34 ` Mitchell Augustin
-- strict thread matches above, loose matches on Subject: below --
2025-02-11 11:13 Dev Jain
2025-02-11 23:23 ` Andrew Morton
2025-02-12 4:18 ` Dev Jain
2025-02-15 1:47 ` Nico Pache
2025-02-15 7:36 ` Dev Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox