From: Dev Jain <dev.jain@arm.com>
To: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org,
kirill.shutemov@linux.intel.com
Cc: npache@redhat.com, ryan.roberts@arm.com,
anshuman.khandual@arm.com, catalin.marinas@arm.com,
cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com,
apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org,
baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu,
haowenchao22@gmail.com, hughd@google.com,
aneesh.kumar@kernel.org, yang@os.amperecomputing.com,
peterx@redhat.com, ioworker0@gmail.com,
wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com,
surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com,
zhengqi.arch@bytedance.com, jhubbard@nvidia.com,
21cnbao@gmail.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Dev Jain <dev.jain@arm.com>
Subject: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse
Date: Tue, 11 Feb 2025 16:43:09 +0530 [thread overview]
Message-ID: <20250211111326.14295-1-dev.jain@arm.com> (raw)
This patchset extends khugepaged from collapsing only PMD-sized THPs to
collapsing anonymous mTHPs.
mTHPs were introduced in the kernel to improve memory management by allocating
chunks of larger memory, so as to reduce number of page faults, TLB misses (due
to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property
is often lost due to CoW, swap-in/out, and when the kernel just cannot find
enough physically contiguous memory to allocate on fault. Henceforth, there is a
need to regain mTHPs in the system asynchronously. This work is an attempt in
this direction, starting with anonymous folios.
In the fault handler, we select the THP order in a greedy manner; the same has
been used here, along with the same sysfs interface to control the order of
collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock().
---------------------------------------------------------
Testing
---------------------------------------------------------
The set has been build tested on x86_64.
For Aarch64,
1. mm-selftests: No regressions.
2. Analyzing with tools/mm/thpmaps on different userspace programs mapping
aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs),
and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs.
This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22215df22d).
v1->v2:
- Handle VMAs less than PMD size (patches 12-15)
- Do not add mTHP into deferred split queue
- Drop lock optimization and collapse mTHP under mmap_write_lock()
- Define policy on what to do when we encounter a folio order larger than
the order we are scanning for
- Prevent the creep problem by enforcing tunable simplification
- Update Documentation
- Drop patch 12 from v1 updating selftest w.r.t the creep problem
- Drop patch 1 from v1
v1:
https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@arm.com/
Dev Jain (17):
khugepaged: Generalize alloc_charge_folio()
khugepaged: Generalize hugepage_vma_revalidate()
khugepaged: Generalize __collapse_huge_page_swapin()
khugepaged: Generalize __collapse_huge_page_isolate()
khugepaged: Generalize __collapse_huge_page_copy()
khugepaged: Abstract PMD-THP collapse
khugepaged: Scan PTEs order-wise
khugepaged: Introduce vma_collapse_anon_folio()
khugepaged: Define collapse policy if a larger folio is already mapped
khugepaged: Exit early on fully-mapped aligned mTHP
khugepaged: Enable sysfs to control order of collapse
khugepaged: Enable variable-sized VMA collapse
khugepaged: Lock all VMAs mapping the PTE table
khugepaged: Reset scan address to correct alignment
khugepaged: Delay cond_resched()
khugepaged: Implement strict policy for mTHP collapse
Documentation: transhuge: Define khugepaged mTHP collapse policy
Documentation/admin-guide/mm/transhuge.rst | 49 +-
include/linux/huge_mm.h | 2 +
mm/huge_memory.c | 4 +
mm/khugepaged.c | 603 ++++++++++++++++-----
4 files changed, 511 insertions(+), 147 deletions(-)
--
2.30.2
next reply other threads:[~2025-02-11 11:13 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 11:13 Dev Jain [this message]
2025-02-11 11:13 ` [PATCH v2 01/17] khugepaged: Generalize alloc_charge_folio() Dev Jain
2025-02-11 11:13 ` [PATCH v2 02/17] khugepaged: Generalize hugepage_vma_revalidate() Dev Jain
2025-02-11 11:13 ` [PATCH v2 03/17] khugepaged: Generalize __collapse_huge_page_swapin() Dev Jain
2025-02-11 11:13 ` [PATCH v2 04/17] khugepaged: Generalize __collapse_huge_page_isolate() Dev Jain
2025-02-11 11:13 ` [PATCH v2 05/17] khugepaged: Generalize __collapse_huge_page_copy() Dev Jain
2025-02-11 11:13 ` [PATCH v2 06/17] khugepaged: Abstract PMD-THP collapse Dev Jain
2025-02-11 11:13 ` [PATCH v2 07/17] khugepaged: Scan PTEs order-wise Dev Jain
2025-02-11 11:13 ` [PATCH v2 08/17] khugepaged: Introduce vma_collapse_anon_folio() Dev Jain
2025-02-11 11:13 ` [PATCH v2 09/17] khugepaged: Define collapse policy if a larger folio is already mapped Dev Jain
2025-02-11 11:13 ` [PATCH v2 10/17] khugepaged: Exit early on fully-mapped aligned mTHP Dev Jain
2025-02-11 11:13 ` [PATCH v2 11/17] khugepaged: Enable sysfs to control order of collapse Dev Jain
2025-02-11 11:13 ` [PATCH v2 12/17] khugepaged: Enable variable-sized VMA collapse Dev Jain
2025-02-11 11:13 ` [PATCH v2 13/17] khugepaged: Lock all VMAs mapping the PTE table Dev Jain
2025-02-11 11:13 ` [PATCH v2 14/17] khugepaged: Reset scan address to correct alignment Dev Jain
2025-02-11 11:13 ` [PATCH v2 15/17] khugepaged: Delay cond_resched() Dev Jain
2025-02-11 11:13 ` [PATCH v2 16/17] khugepaged: Implement strict policy for mTHP collapse Dev Jain
2025-02-11 11:13 ` [PATCH v2 17/17] Documentation: transhuge: Define khugepaged mTHP collapse policy Dev Jain
2025-02-11 23:23 ` [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse Andrew Morton
2025-02-12 4:18 ` Dev Jain
2025-02-15 1:47 ` Nico Pache
2025-02-15 7:36 ` Dev Jain
2025-04-24 18:10 Mitchell Augustin
2025-04-24 18:56 ` Nico Pache
2025-04-24 19:45 ` Mitchell Augustin
2025-05-02 20:32 ` Mitchell Augustin
2025-05-02 20:34 ` Mitchell Augustin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250211111326.14295-1-dev.jain@arm.com \
--to=dev.jain@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=haowenchao22@gmail.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=jack@suse.cz \
--cc=jglisse@google.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=srivatsa@csail.mit.edu \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox