linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
@ 2025-04-24 18:10 Mitchell Augustin
  2025-04-24 18:56 ` Nico Pache
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Augustin @ 2025-04-24 18:10 UTC (permalink / raw)
  To: akpm, 20250211152341.3431089327c5e0ec6ba6064d
  Cc: 21cnbao, aneesh.kumar, anshuman.khandual, apopple, baohua,
	catalin.marinas, cl, dave.hansen, david, dev.jain, haowenchao22,
	hughd, ioworker0, jack, jglisse, John Hubbard, kirill.shutemov,
	linux-kernel, linux-mm, mhocko, npache, Peter Xu, ryan.roberts,
	srivatsa, surenb, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, zhengqi.arch, Zi Yan, zokeefe, Jacob Martin,
	Vanda Hendrychová

Hello,

I realize this is an older version of the series, but @Vanda
Hendrychová and I started on a benchmark effort of this version prior
to the most recent revision's introduction and wanted to provide our
results as feedback for this discussion.

For context, my team and I previously identified that some of the
benchmarks outlined in this phoronix benchmark suite [0] perform more
poorly with thp=madvise than thp=always - so I suspected that the
THP=defer and khugepaged collapse functionality outlined in this
article [6] might yield performance in between madvise and always for
the following benchmarks from that suite:
- GraphicsMagick (all tests), which were substantially improved when
switching from thp=madvise to thp=always
- 7-Zip Compression rating, which was substantially improved when
switching from thp=madvise to thp=always
- Compilation time tests, which were slightly improved when switching
from thp=madvise to thp=always

There were more benchmarks in this suite, but these three were the
ones we had previously identified as being significantly impacted by
the thp setting, and thus are the primary focus of our results.

To analyze this, we ran the benchmarks outlined in this article on the
upstream 6.14 kernel with the following configurations:
- linux v6.14 thp=defer-v1: Transparent Huge Pages: defer
- linux v6.14 thp=defer-v2: Transparent Huge Pages: defer
- linux v6.14 thp=always: Transparent Huge Pages: always
- linux v6.14 thp=never: Transparent Huge Pages: never
- linux v6.14 thp=madvise: Transparent Huge Pages: madvise

"defer-v1" refers to the thp collapse implementation by Nico Pache
[3], and "defer-v2" refers to the implementation in this thread [4].
Both use defer as implemented by series [5].


Ultimately, we did observe that some of the GraphicsMagick tests
performed marginally better with Nico Pache's khugepaged collapse
implementation and thp=defer than with just thp=madvise, which aligns
a bit with my theory - however, these improvements unfortunately did
not appear to be statistically significant and gained only marginal
ground in the performance gap between thp=madvise and thp=always in
our workloads of interest.

Results for other benchmarks in this set also did not show any
conclusive performance gains from mTHP=defer (however I was not
expecting those to change significantly with this series, since they
weren’t heavily impacted by thp settings in my prior tests).

I can't speak for the impact of this series on other workloads - I
just wanted to share results for the ones we were aware of and
interested in.

Full results from our tests on the DGX A100 [1] and Lenovo SR670v2 [2]
are linked below.

[0]: https://www.phoronix.com/review/linux-os-ampereone/5
[1]: https://pastebin.ubuntu.com/p/SDSSj8cr6k/
[2]: https://pastebin.ubuntu.com/p/nqbWxyC33d/
[3]: https://lwn.net/ml/all/20250211003028.213461-1-npache@redhat.com
[4]: https://lwn.net/ml/all/20250211111326.14295-1-dev.jain@arm.com
[5]: https://lwn.net/ml/all/20250211004054.222931-1-npache@redhat.com
[6]: https://lwn.net/Articles/1009039/
-- 
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering


^ permalink raw reply	[flat|nested] 10+ messages in thread
* [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
@ 2025-02-11 11:13 Dev Jain
  2025-02-11 23:23 ` Andrew Morton
  2025-02-15  1:47 ` Nico Pache
  0 siblings, 2 replies; 10+ messages in thread
From: Dev Jain @ 2025-02-11 11:13 UTC (permalink / raw)
  To: akpm, david, willy, kirill.shutemov
  Cc: npache, ryan.roberts, anshuman.khandual, catalin.marinas, cl,
	vbabka, mhocko, apopple, dave.hansen, will, baohua, jack,
	srivatsa, haowenchao22, hughd, aneesh.kumar, yang, peterx,
	ioworker0, wangkefeng.wang, ziy, jglisse, surenb, vishal.moola,
	zokeefe, zhengqi.arch, jhubbard, 21cnbao, linux-mm, linux-kernel,
	Dev Jain

This patchset extends khugepaged from collapsing only PMD-sized THPs to
collapsing anonymous mTHPs.

mTHPs were introduced in the kernel to improve memory management by allocating
chunks of larger memory, so as to reduce number of page faults, TLB misses (due
to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
is often lost due to CoW, swap-in/out, and when the kernel just cannot find
enough physically contiguous memory to allocate on fault. Henceforth, there is a
need to regain mTHPs in the system asynchronously. This work is an attempt in
this direction, starting with anonymous folios.

In the fault handler, we select the THP order in a greedy manner; the same has
been used here, along with the same sysfs interface to control the order of
collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().

---------------------------------------------------------
Testing
---------------------------------------------------------

The set has been build tested on x86_64.
For Aarch64,
1. mm-selftests: No regressions.
2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
   aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
   and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.

This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22215df22d).

v1->v2:
 - Handle VMAs less than PMD size (patches 12-15)
 - Do not add mTHP into deferred split queue
 - Drop lock optimization and collapse mTHP under mmap_write_lock()
 - Define policy on what to do when we encounter a folio order larger than
   the order we are scanning for
 - Prevent the creep problem by enforcing tunable simplification
 - Update Documentation
 - Drop patch 12 from v1 updating selftest w.r.t the creep problem
 - Drop patch 1 from v1

 v1:
 https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@arm.com/

Dev Jain (17):
  khugepaged: Generalize alloc_charge_folio()
  khugepaged: Generalize hugepage_vma_revalidate()
  khugepaged: Generalize __collapse_huge_page_swapin()
  khugepaged: Generalize __collapse_huge_page_isolate()
  khugepaged: Generalize __collapse_huge_page_copy()
  khugepaged: Abstract PMD-THP collapse
  khugepaged: Scan PTEs order-wise
  khugepaged: Introduce vma_collapse_anon_folio()
  khugepaged: Define collapse policy if a larger folio is already mapped
  khugepaged: Exit early on fully-mapped aligned mTHP
  khugepaged: Enable sysfs to control order of collapse
  khugepaged: Enable variable-sized VMA collapse
  khugepaged: Lock all VMAs mapping the PTE table
  khugepaged: Reset scan address to correct alignment
  khugepaged: Delay cond_resched()
  khugepaged: Implement strict policy for mTHP collapse
  Documentation: transhuge: Define khugepaged mTHP collapse policy

 Documentation/admin-guide/mm/transhuge.rst |  49 +-
 include/linux/huge_mm.h                    |   2 +
 mm/huge_memory.c                           |   4 +
 mm/khugepaged.c                            | 603 ++++++++++++++++-----
 4 files changed, 511 insertions(+), 147 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-02 20:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-24 18:10 [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse Mitchell Augustin
2025-04-24 18:56 ` Nico Pache
2025-04-24 19:45   ` Mitchell Augustin
2025-05-02 20:32     ` Mitchell Augustin
2025-05-02 20:34     ` Mitchell Augustin
  -- strict thread matches above, loose matches on Subject: below --
2025-02-11 11:13 Dev Jain
2025-02-11 23:23 ` Andrew Morton
2025-02-12  4:18   ` Dev Jain
2025-02-15  1:47 ` Nico Pache
2025-02-15  7:36   ` Dev Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox