* [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support
@ 2026-04-18 2:44 Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
Hi all,
This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
read-only THPs for FSes with large folio support (the supported orders
need to include PMD_ORDER) by default.
Before the patchset, the status of creating read-only THPs is below:
| PF | MADV_COLLAPSE | khugepaged |
|-----------|---------------|------------|
large folio FSes only | ✓ | x | x |
READ_ONLY_THP_FOR_FS only | x | ✓ | ✓ |
both | ✓ | ✓ | ✓ |
where READ_ONLY_THP_FOR_FS implies no large folio FSes.
Now without READ_ONLY_THP_FOR_FS:
| PF | MADV_COLLAPSE | khugepaged |
|-----------|---------------|------------|
large folio FSes | ✓ | ✓ | ✓ |
no large folio FSes | x | x | x |
This means no large folio FSes need to add large folio support (the
supported orders need to include PMD_ORDER), so that they can leverage
read-only THP creation function.
To prevent breaking read-only THP support for large folio FSes,
1. first 4 patches enables the support, so that without READ_ONLY_THP_FOR_FS,
read-only THP still works for large folio FSes,
2. Patch 5 removes READ_ONLY_THP_FOR_FS Kconfig,
3. the rest of patches remove code related to READ_ONLY_THP_FOR_FS.
The overview of the changes is:
1. collapse_file() checks for to-be-collapsed folio dirtiness after they
are locked, unmapped to make sure no new write happens. Before,
mapping->nr_thps and inode->i_writecount are used to cause read-only
THP truncation before a fd becomes writable.
2. hugepage_pmd_enabled() is true for anon, shmem, and file-backed cases
if the global khugepaged control is on, otherwise, khugepaged for
file-backed case is turned off and anon and shmem depend on per-size
control knobs.
3. collapse_file() from mm/khugepaged.c, instead of checking
CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
of struct address_space of the file is at least PMD_ORDER.
4. file_thp_enabled() also checks mapping_max_folio_order() instead and
no longer checks if the input file is opened as read-only (Change 1
handles read-write files).
5. truncate_inode_partial_folio() calls folio_split() directly instead
of the removed try_folio_split_to_order(), since large folios can
only show up on a FS with large folio support.
6. nr_thps is removed from struct address_space, since it is no longer
needed to drop all read-only THPs from a FS without large folio
support when the fd becomes writable. Its related filemap_nr_thps*()
are removed too.
7. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
8. Updated comments in various places.
Changelog
===
From V2[3]:
1. removed unnecessary check in collapse_scan_file().
2. removed inode_is_open_for_write() check in file_thp_enabled().
3. changed hugepage_pmd_enabled() to return true if khugepaged global
control is on instead of false. cleaned up anon and shmem code in the
function.
4. moved folio dirtiness check after try_to_unmap() but before
try_to_unmap_flush(), since that is sufficient to prevent new writes.
5. reordered patch 4 and 5, so that khugepaged behavior does not change
after READ_ONLY_THP_FOR_FS is removed.
6. added read-write file test in khugepaged selftest.
7. removed the read-only file restriction from guard-region selftest.
From V1[2]:
1. removed inode_is_open_for_write() check in collapse_file(), since the
added folio dirtiness check after try_to_unmap_flush() should be
sufficient to prevent writes to candidate folios.
2. removed READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled(), please
see Patch 5 and item 2 in the overview for more details.
3. moved the patch removing READ_ONLY_THP_FOR_FS Kconfig after enabling
khugepaged and MADV_COLLAPSE to create read-only THPs.
4. added mapping_pmd_thp_support() helper function.
5. used VM_WARN_ON_ONCE() in collapse_file() for mapping eligibility check
and address alignment check instead of if + return error code. Always
allow shmem, since MADV_COLLAPSE ignore shmem huge config.
6. added mapping eligibility check in collapse_scan_file().
7. removed trailing ; for folio_split() in the !CONFIG_TRANSPARENT_HUGEPAGE.
8. simplified code in folio_check_splittable() after removing
READ_ONLY_THP_FOR_FS code.
9. clarified that read-only THP works for FSes with PMD THP support by
default.
From RFC[1]:
1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
on by default for all FSes with large folio support and the supported
orders includes PMD_ORDER.
Suggestions and comments are welcome.
Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20260327014255.2058916-1-ziy@nvidia.com/ [2]
Link: https://lore.kernel.org/all/20260413192030.3275825-1-ziy@nvidia.com/ [3]
Zi Yan (12):
mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
mm/khugepaged: add folio dirty check after try_to_unmap()
mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in
hugepage_pmd_enabled()
mm: remove READ_ONLY_THP_FOR_FS Kconfig option
mm: fs: remove filemap_nr_thps*() functions and their users
fs: remove nr_thps from struct address_space
mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
mm/truncate: use folio_split() in truncate_inode_partial_folio()
fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions
fs/btrfs/defrag.c | 3 -
fs/inode.c | 3 -
fs/open.c | 27 -----
include/linux/fs.h | 5 -
include/linux/huge_mm.h | 25 +----
include/linux/pagemap.h | 35 ++-----
include/linux/shmem_fs.h | 2 +-
mm/Kconfig | 11 ---
mm/filemap.c | 1 -
mm/huge_memory.c | 39 ++------
mm/khugepaged.c | 86 ++++++++--------
mm/truncate.c | 8 +-
tools/testing/selftests/mm/guard-regions.c | 18 +---
tools/testing/selftests/mm/khugepaged.c | 110 +++++++++++++++------
tools/testing/selftests/mm/run_vmtests.sh | 12 ++-
15 files changed, 156 insertions(+), 229 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
Add a helper function mapping_pmd_thp_support() for FSes supporting large
folio with at least PMD_ORDER.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/pagemap.h | 10 ++++++++++
mm/khugepaged.c | 5 +++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ec442af3f886..c3cb1ec982cd 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -524,6 +524,16 @@ static inline bool mapping_large_folio_support(const struct address_space *mappi
return mapping_max_folio_order(mapping) > 0;
}
+static inline bool mapping_pmd_thp_support(const struct address_space *mapping)
+{
+ /* AS_FOLIO_ORDER is only reasonable for pagecache folios */
+ VM_WARN_ONCE((unsigned long)mapping & FOLIO_MAPPING_ANON,
+ "Anonymous mapping always supports PMD THP");
+
+ return mapping_max_folio_order(mapping) >= PMD_ORDER;
+}
+
+
/* Return the maximum folio size for this pagecache mapping, in bytes. */
static inline size_t mapping_max_folio_size(const struct address_space *mapping)
{
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8452dbdb043..3eb5d982d3d3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1892,8 +1892,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
int nr_none = 0;
bool is_shmem = shmem_file(file);
- VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
- VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
+ /* MADV_COLLAPSE ignores shmem huge config, so do not check shmem */
+ VM_WARN_ON_ONCE(!is_shmem && !mapping_pmd_thp_support(mapping));
+ VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
result = alloc_charge_folio(&new_folio, mm, cc);
if (result != SCAN_SUCCEED)
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
This check ensures the correctness of collapse read-only THPs for FSes
after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
PMD THP pagecache.
READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, skip if any folio is dirty after
try_to_unmap(), since a dirty folio means this read-only folio
got some writes via mmap can happen between try_to_unmap() and
try_to_unmap_flush() via cached TLB entries and khugepaged does not support
writable pagecache folio collapse yet.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/khugepaged.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3eb5d982d3d3..1c0fdc81d276 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
}
} else if (folio_test_dirty(folio)) {
/*
- * khugepaged only works on read-only fd,
- * so this page is dirty because it hasn't
+ * This page is dirty because it hasn't
* been flushed since first write. There
* won't be new dirty pages.
*
@@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
if (!is_shmem && (folio_test_dirty(folio) ||
folio_test_writeback(folio))) {
/*
- * khugepaged only works on read-only fd, so this
- * folio is dirty because it hasn't been flushed
+ * khugepaged only works on clean file-backed folios,
+ * so this folio is dirty because it hasn't been flushed
* since first write.
*/
result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
@@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto out_unlock;
}
+ /*
+ * At this point, the folio is locked, unmapped. Make sure the
+ * folio is clean, so that no one else is able to write to it,
+ * since that would require taking the folio lock first.
+ * Otherwise that means the folio was pointed by a dirty PTE and
+ * some CPU might have a valid TLB entry with dirty bit set
+ * still pointing to this folio and writes can happen without
+ * causing a page table walk and folio lock acquisition before
+ * the try_to_unmap_flush() below is done. After the collapse,
+ * file-backed folio is not set as dirty and can be discarded
+ * before any new write marks the folio dirty, causing data
+ * corruption.
+ */
+ if (!is_shmem && folio_test_dirty(folio)) {
+ result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+ goto out_unlock;
+ }
+
/*
* Accumulate the folios that are being collapsed.
*/
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
Replace it with a check on the max folio order of the file's address space
mapping, making sure PMD THP is supported. Also remove the read-only fd
check, since collapse_file() now makes sure all to-be-collapsed folios are
clean and the created PMD file THP can be handled by FSes properly.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/huge_memory.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..7e9cf8c0985f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
{
struct inode *inode;
- if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
- return false;
-
if (!vma->vm_file)
return false;
@@ -97,7 +94,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
if (IS_ANON_FILE(inode))
return false;
- return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
+ if (!mapping_pmd_thp_support(inode->i_mapping))
+ return false;
+
+ return S_ISREG(inode->i_mode);
}
/* If returns true, we are unable to access the VMA's folios. */
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (2 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
Remove READ_ONLY_THP_FOR_FS and khugepaged for file-backed pmd-sized
hugepages are enabled by the global transparent hugepage control.
khugepaged can still be enabled by per-size control for anon and shmem when
the global control is off.
Add shmem_hpage_pmd_enabled() stub for !CONFIG_SHMEM to remove
IS_ENABLED(SHMEM) in hugepage_pmd_enabled().
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/shmem_fs.h | 2 +-
mm/khugepaged.c | 28 ++++++++++++++++------------
2 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 1a345142af7d..dff8fb6ddac0 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -127,7 +127,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
void shmem_truncate_range(struct inode *inode, loff_t start, uoff_t end);
int shmem_unuse(unsigned int type);
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && defined(CONFIG_SHMEM)
unsigned long shmem_allowable_huge_orders(struct inode *inode,
struct vm_area_struct *vma, pgoff_t index,
loff_t write_end, bool shmem_huge_force);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1c0fdc81d276..718a2d06d1e6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -406,18 +406,8 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
}
-static bool hugepage_pmd_enabled(void)
+static inline bool anon_hpage_pmd_enabled(void)
{
- /*
- * We cover the anon, shmem and the file-backed case here; file-backed
- * hugepages, when configured in, are determined by the global control.
- * Anon pmd-sized hugepages are determined by the pmd-size control.
- * Shmem pmd-sized hugepages are also determined by its pmd-size control,
- * except when the global shmem_huge is set to SHMEM_HUGE_DENY.
- */
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
- hugepage_global_enabled())
- return true;
if (test_bit(PMD_ORDER, &huge_anon_orders_always))
return true;
if (test_bit(PMD_ORDER, &huge_anon_orders_madvise))
@@ -425,7 +415,21 @@ static bool hugepage_pmd_enabled(void)
if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) &&
hugepage_global_enabled())
return true;
- if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
+ return false;
+}
+
+static bool hugepage_pmd_enabled(void)
+{
+ /*
+ * Anon, shmem and file-backed pmd-size hugepages are all determined by
+ * the global control. If the global control is off, anon and shmem
+ * pmd-sized hugepages are also determined by its per-size control.
+ */
+ if (hugepage_global_enabled())
+ return true;
+ if (anon_hpage_pmd_enabled())
+ return true;
+ if (shmem_hpage_pmd_enabled())
return true;
return false;
}
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (3 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(),
khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache
support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first
so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove
mapping->nr_thps, which its safe guard mechanism relies on.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/Kconfig | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index bd283958d675..408fc7b82233 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -937,17 +937,6 @@ config THP_SWAP
For selection by architectures with reasonable THP sizes.
-config READ_ONLY_THP_FOR_FS
- bool "Read-only THP for filesystems (EXPERIMENTAL)"
- depends on TRANSPARENT_HUGEPAGE
-
- help
- Allow khugepaged to put read-only file-backed pages in THP.
-
- This is marked experimental because it is a new feature. Write
- support of file THPs will be developed in the next few release
- cycles.
-
config NO_PAGE_MAPCOUNT
bool "No per-page mapcount (EXPERIMENTAL)"
help
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (4 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
large folio support, so that read-only THPs created in these FSes are not
seen by the FSes when the underlying fd becomes writable. Now read-only PMD
THPs only appear in a FS with large folio support and the supported orders
include PMD_ORDRE.
READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and
smp_mb() to prevent writes to a read-only THP and collapsing writable
folios into a THP. In collapse_file(), mapping->nr_thps is increased, then
smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while
do_dentry_open() first increases inode->i_writecount, then a full memory
fence, and if mapping->nr_thps > 0, all read-only THPs are truncated.
Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code,
since a dirty folio check has been added after try_to_unmap() and
try_to_unmap_flush() in collapse_file() to make sure no writable folio can
be collapsed.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
fs/open.c | 27 ---------------------------
include/linux/pagemap.h | 29 -----------------------------
mm/filemap.c | 1 -
mm/huge_memory.c | 1 -
mm/khugepaged.c | 28 ----------------------------
5 files changed, 86 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 91f1139591ab..cef382d9d8b8 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -970,33 +970,6 @@ static int do_dentry_open(struct file *f,
if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT))
return -EINVAL;
- /*
- * XXX: Huge page cache doesn't support writing yet. Drop all page
- * cache for this file before processing writes.
- */
- if (f->f_mode & FMODE_WRITE) {
- /*
- * Depends on full fence from get_write_access() to synchronize
- * against collapse_file() regarding i_writecount and nr_thps
- * updates. Ensures subsequent insertion of THPs into the page
- * cache will fail.
- */
- if (filemap_nr_thps(inode->i_mapping)) {
- struct address_space *mapping = inode->i_mapping;
-
- filemap_invalidate_lock(inode->i_mapping);
- /*
- * unmap_mapping_range just need to be called once
- * here, because the private pages is not need to be
- * unmapped mapping (e.g. data segment of dynamic
- * shared libraries here).
- */
- unmap_mapping_range(mapping, 0, 0, 0);
- truncate_inode_pages(mapping, 0);
- filemap_invalidate_unlock(inode->i_mapping);
- }
- }
-
return 0;
cleanup_all:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c3cb1ec982cd..a63f818910dc 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -540,35 +540,6 @@ static inline size_t mapping_max_folio_size(const struct address_space *mapping)
return PAGE_SIZE << mapping_max_folio_order(mapping);
}
-static inline int filemap_nr_thps(const struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- return atomic_read(&mapping->nr_thps);
-#else
- return 0;
-#endif
-}
-
-static inline void filemap_nr_thps_inc(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- if (!mapping_large_folio_support(mapping))
- atomic_inc(&mapping->nr_thps);
-#else
- WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
-static inline void filemap_nr_thps_dec(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- if (!mapping_large_folio_support(mapping))
- atomic_dec(&mapping->nr_thps);
-#else
- WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
struct address_space *folio_mapping(const struct folio *folio);
/**
diff --git a/mm/filemap.c b/mm/filemap.c
index 4e636647100c..d3cd4d2f3734 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -189,7 +189,6 @@ static void filemap_unaccount_folio(struct address_space *mapping,
lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
} else if (folio_test_pmd_mappable(folio)) {
lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
- filemap_nr_thps_dec(mapping);
}
if (test_bit(AS_KERNEL_FILE, &folio->mapping->flags))
mod_node_page_state(folio_pgdat(folio),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7e9cf8c0985f..3a310f1f7177 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3926,7 +3926,6 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
} else {
lruvec_stat_mod_folio(folio,
NR_FILE_THPS, -nr);
- filemap_nr_thps_dec(mapping);
}
}
}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 718a2d06d1e6..c23c75703161 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2116,21 +2116,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto xa_unlocked;
}
- if (!is_shmem) {
- filemap_nr_thps_inc(mapping);
- /*
- * Paired with the fence in do_dentry_open() -> get_write_access()
- * to ensure i_writecount is up to date and the update to nr_thps
- * is visible. Ensures the page cache will be truncated if the
- * file is opened writable.
- */
- smp_mb();
- if (inode_is_open_for_write(mapping->host)) {
- result = SCAN_FAIL;
- filemap_nr_thps_dec(mapping);
- }
- }
-
xa_locked:
xas_unlock_irq(&xas);
xa_unlocked:
@@ -2308,19 +2293,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
folio_putback_lru(folio);
folio_put(folio);
}
- /*
- * Undo the updates of filemap_nr_thps_inc for non-SHMEM
- * file only. This undo is not needed unless failure is
- * due to SCAN_COPY_MC.
- */
- if (!is_shmem && result == SCAN_COPY_MC) {
- filemap_nr_thps_dec(mapping);
- /*
- * Paired with the fence in do_dentry_open() -> get_write_access()
- * to ensure the update to nr_thps is visible.
- */
- smp_mb();
- }
new_folio->mapping = NULL;
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (5 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
is no longer needed. Remove it. This shrinks struct address_space by 8
bytes on 64-bit systems which may increase the number of inodes we can
cache.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
fs/inode.c | 3 ---
include/linux/fs.h | 5 -----
2 files changed, 8 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index cc12b68e021b..16ab0a345419 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -280,9 +280,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
mapping->flags = 0;
mapping->wb_err = 0;
atomic_set(&mapping->i_mmap_writable, 0);
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- atomic_set(&mapping->nr_thps, 0);
-#endif
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
mapping->i_private_data = NULL;
mapping->writeback_index = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f3ca9b841892..824625d8de1a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -455,7 +455,6 @@ extern const struct address_space_operations empty_aops;
* memory mappings.
* @gfp_mask: Memory allocation flags to use for allocating pages.
* @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
- * @nr_thps: Number of THPs in the pagecache (non-shmem only).
* @i_mmap: Tree of private and shared mappings.
* @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
* @nrpages: Number of page entries, protected by the i_pages lock.
@@ -473,10 +472,6 @@ struct address_space {
struct rw_semaphore invalidate_lock;
gfp_t gfp_mask;
atomic_t i_mmap_writable;
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
- /* number of thp, only for non-shmem files */
- atomic_t nr_thps;
-#endif
struct rb_root_cached i_mmap;
unsigned long nrpages;
pgoff_t writeback_index;
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (6 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
Without READ_ONLY_THP_FOR_FS, large file-backed folios cannot be created by
a FS without large folio support. The check is no longer needed.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/huge_memory.c | 30 +++---------------------------
1 file changed, 3 insertions(+), 27 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3a310f1f7177..a1eebc8ed105 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3821,33 +3821,9 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
if (!folio->mapping && !folio_test_anon(folio))
return -EBUSY;
- if (folio_test_anon(folio)) {
- /* order-1 is not supported for anonymous THP. */
- if (new_order == 1)
- return -EINVAL;
- } else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
- !mapping_large_folio_support(folio->mapping)) {
- /*
- * We can always split a folio down to a single page
- * (new_order == 0) uniformly.
- *
- * For any other scenario
- * a) uniform split targeting a large folio
- * (new_order > 0)
- * b) any non-uniform split
- * we must confirm that the file system supports large
- * folios.
- *
- * Note that we might still have THPs in such
- * mappings, which is created from khugepaged when
- * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
- * case, the mapping does not actually support large
- * folios properly.
- */
- return -EINVAL;
- }
- }
+ /* order-1 is not supported for anonymous THP. */
+ if (folio_test_anon(folio) && new_order == 1)
+ return -EINVAL;
/*
* swapcache folio could only be split to order 0
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (7 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
not. folio_split() can be used on a FS with large folio support without
worrying about getting a THP on a FS without large folio support.
When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can
appear in a FS without large folio support after khugepaged or
madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(),
such a PMD large pagecache folio is split and if the FS does not support
large folio, it needs to be split to order-0 ones and could not be split
non uniformly to ones with various orders. try_folio_split_to_order() was
added to handle this situation by checking folio_check_splittable(...,
SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to
READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now
READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created
with FSes supporting large folio, this function is no longer needed and all
large pagecache folios can be split non uniformly.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
include/linux/huge_mm.h | 25 ++-----------------------
mm/truncate.c | 8 ++++----
2 files changed, 6 insertions(+), 27 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2949e5acff35..164d6edf1b65 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -389,27 +389,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
return split_huge_page_to_list_to_order(page, NULL, new_order);
}
-/**
- * try_folio_split_to_order() - try to split a @folio at @page to @new_order
- * using non uniform split.
- * @folio: folio to be split
- * @page: split to @new_order at the given page
- * @new_order: the target split order
- *
- * Try to split a @folio at @page using non uniform split to @new_order, if
- * non uniform split is not supported, fall back to uniform split. After-split
- * folios are put back to LRU list. Use min_order_for_split() to get the lower
- * bound of @new_order.
- *
- * Return: 0 - split is successful, otherwise split failed.
- */
-static inline int try_folio_split_to_order(struct folio *folio,
- struct page *page, unsigned int new_order)
-{
- if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
- return split_huge_page_to_order(&folio->page, new_order);
- return folio_split(folio, new_order, page, NULL);
-}
static inline int split_huge_page(struct page *page)
{
return split_huge_page_to_list_to_order(page, NULL, 0);
@@ -642,8 +621,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
return -EINVAL;
}
-static inline int try_folio_split_to_order(struct folio *folio,
- struct page *page, unsigned int new_order)
+static inline int folio_split(struct folio *folio, unsigned int new_order,
+ struct page *page, struct list_head *list)
{
VM_WARN_ON_ONCE_FOLIO(1, folio);
return -EINVAL;
diff --git a/mm/truncate.c b/mm/truncate.c
index 12cc89f89afc..b58ba940be47 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -177,7 +177,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
return 0;
}
-static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
+static int folio_split_or_unmap(struct folio *folio, struct page *split_at,
unsigned long min_order)
{
enum ttu_flags ttu_flags =
@@ -186,7 +186,7 @@ static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
TTU_IGNORE_MLOCK;
int ret;
- ret = try_folio_split_to_order(folio, split_at, min_order);
+ ret = folio_split(folio, min_order, split_at, NULL);
/*
* If the split fails, unmap the folio, so it will be refaulted
@@ -252,7 +252,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
min_order = mapping_min_folio_order(folio->mapping);
split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
- if (!try_folio_split_or_unmap(folio, split_at, min_order)) {
+ if (!folio_split_or_unmap(folio, split_at, min_order)) {
/*
* try to split at offset + length to make sure folios within
* the range can be dropped, especially to avoid memory waste
@@ -279,7 +279,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
/* make sure folio2 is large and does not change its mapping */
if (folio_test_large(folio2) &&
folio2->mapping == folio->mapping)
- try_folio_split_or_unmap(folio2, split_at2, min_order);
+ folio_split_or_unmap(folio2, split_at2, min_order);
folio_unlock(folio2);
out:
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (8 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
READ_ONLY_THP_FOR_FS is no longer present, remove related comment.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/defrag.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 7e2db5d3a4d4..a8d49d9ca981 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -860,9 +860,6 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t
return folio;
/*
- * Since we can defragment files opened read-only, we can encounter
- * transparent huge pages here (see CONFIG_READ_ONLY_THP_FOR_FS).
- *
* The IO for such large folios is not fully tested, thus return
* an error to reject such folios unless it's an experimental build.
*
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (9 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-18 9:27 ` [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Lorenzo Stoakes
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
Change the requirement to a file system with large folio support and the
supported order needs to include PMD_ORDER.
Also add tests of opening a file with read write permission and populating
folios with writes. Reuse the XFS image from split_huge_page_test.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
tools/testing/selftests/mm/khugepaged.c | 110 ++++++++++++++++------
tools/testing/selftests/mm/run_vmtests.sh | 12 ++-
2 files changed, 90 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 3fe7ef04ac62..627472cbc910 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -49,7 +49,8 @@ struct mem_ops {
const char *name;
};
-static struct mem_ops *file_ops;
+static struct mem_ops *read_only_file_ops;
+static struct mem_ops *read_write_file_ops;
static struct mem_ops *anon_ops;
static struct mem_ops *shmem_ops;
@@ -112,7 +113,8 @@ static void restore_settings(int sig)
static void save_settings(void)
{
printf("Save THP and khugepaged settings...");
- if (file_ops && finfo.type == VMA_FILE)
+ if ((read_only_file_ops || read_write_file_ops) &&
+ finfo.type == VMA_FILE)
thp_set_read_ahead_path(finfo.dev_queue_read_ahead_path);
thp_save_settings();
@@ -364,11 +366,14 @@ static bool anon_check_huge(void *addr, int nr_hpages)
return check_huge_anon(addr, nr_hpages, hpage_pmd_size);
}
-static void *file_setup_area(int nr_hpages)
+static void *file_setup_area_common(int nr_hpages, bool read_only)
{
int fd;
void *p;
unsigned long size;
+ int open_opt = read_only ? O_RDONLY : O_RDWR;
+ int mmap_prot = read_only ? PROT_READ : (PROT_READ | PROT_WRITE);
+ int mmap_opt = read_only ? MAP_PRIVATE : MAP_SHARED;
unlink(finfo.path); /* Cleanup from previous failed tests */
printf("Creating %s for collapse%s...", finfo.path,
@@ -388,14 +393,15 @@ static void *file_setup_area(int nr_hpages)
munmap(p, size);
success("OK");
- printf("Opening %s read only for collapse...", finfo.path);
- finfo.fd = open(finfo.path, O_RDONLY, 777);
+ printf("Opening %s %s for collapse...", finfo.path,
+ read_only ? "read only" : "read-write");
+ finfo.fd = open(finfo.path, open_opt, 777);
if (finfo.fd < 0) {
perror("open()");
exit(EXIT_FAILURE);
}
- p = mmap(BASE_ADDR, size, PROT_READ,
- MAP_PRIVATE, finfo.fd, 0);
+ p = mmap(BASE_ADDR, size, mmap_prot,
+ mmap_opt, finfo.fd, 0);
if (p == MAP_FAILED || p != BASE_ADDR) {
perror("mmap()");
exit(EXIT_FAILURE);
@@ -407,6 +413,15 @@ static void *file_setup_area(int nr_hpages)
return p;
}
+static void *file_setup_read_only_area(int nr_hpages)
+{
+ return file_setup_area_common(nr_hpages, /* read_only= */ true);
+}
+
+static void *file_setup_read_write_area(int nr_hpages)
+{
+ return file_setup_area_common(nr_hpages, /* read_only= */ false);
+}
static void file_cleanup_area(void *p, unsigned long size)
{
munmap(p, size);
@@ -414,14 +429,25 @@ static void file_cleanup_area(void *p, unsigned long size)
unlink(finfo.path);
}
-static void file_fault(void *p, unsigned long start, unsigned long end)
+static void file_fault_common(void *p, unsigned long start, unsigned long end,
+ int madv_ops)
{
- if (madvise(((char *)p) + start, end - start, MADV_POPULATE_READ)) {
+ if (madvise(((char *)p) + start, end - start, madv_ops)) {
perror("madvise(MADV_POPULATE_READ");
exit(EXIT_FAILURE);
}
}
+static void file_fault_read(void *p, unsigned long start, unsigned long end)
+{
+ file_fault_common(p, start, end, MADV_POPULATE_READ);
+}
+
+static void file_fault_write(void *p, unsigned long start, unsigned long end)
+{
+ file_fault_common(p, start, end, MADV_POPULATE_WRITE);
+}
+
static bool file_check_huge(void *addr, int nr_hpages)
{
switch (finfo.type) {
@@ -477,10 +503,18 @@ static struct mem_ops __anon_ops = {
.name = "anon",
};
-static struct mem_ops __file_ops = {
- .setup_area = &file_setup_area,
+static struct mem_ops __read_only_file_ops = {
+ .setup_area = &file_setup_read_only_area,
+ .cleanup_area = &file_cleanup_area,
+ .fault = &file_fault_read,
+ .check_huge = &file_check_huge,
+ .name = "file",
+};
+
+static struct mem_ops __read_write_file_ops = {
+ .setup_area = &file_setup_read_write_area,
.cleanup_area = &file_cleanup_area,
- .fault = &file_fault,
+ .fault = &file_fault_write,
.check_huge = &file_check_huge,
.name = "file",
};
@@ -603,7 +637,9 @@ static struct collapse_context __madvise_context = {
static bool is_tmpfs(struct mem_ops *ops)
{
- return ops == &__file_ops && finfo.type == VMA_SHMEM;
+ return (ops == &__read_only_file_ops ||
+ ops == &__read_write_file_ops) &&
+ finfo.type == VMA_SHMEM;
}
static bool is_anon(struct mem_ops *ops)
@@ -1086,8 +1122,8 @@ static void usage(void)
fprintf(stderr, "\t<context>\t: [all|khugepaged|madvise]\n");
fprintf(stderr, "\t<mem_type>\t: [all|anon|file|shmem]\n");
fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n");
- fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n");
- fprintf(stderr, "\tCONFIG_READ_ONLY_THP_FOR_FS=y\n");
+ fprintf(stderr, "\n\t\"file,all\" mem_type requires a file system\n");
+ fprintf(stderr, "\twith large folio support (order >= PMD order)\n");
fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n");
fprintf(stderr, "\tmounted with huge=advise option for khugepaged tests to work\n");
fprintf(stderr, "\n\tSupported Options:\n");
@@ -1143,20 +1179,22 @@ static void parse_test_type(int argc, char **argv)
usage();
if (!strcmp(buf, "all")) {
- file_ops = &__file_ops;
+ read_only_file_ops = &__read_only_file_ops;
+ read_write_file_ops = &__read_write_file_ops;
anon_ops = &__anon_ops;
shmem_ops = &__shmem_ops;
} else if (!strcmp(buf, "anon")) {
anon_ops = &__anon_ops;
} else if (!strcmp(buf, "file")) {
- file_ops = &__file_ops;
+ read_only_file_ops = &__read_only_file_ops;
+ read_write_file_ops = &__read_write_file_ops;
} else if (!strcmp(buf, "shmem")) {
shmem_ops = &__shmem_ops;
} else {
usage();
}
- if (!file_ops)
+ if (!read_only_file_ops && !read_write_file_ops)
return;
if (argc != 2)
@@ -1228,37 +1266,47 @@ int main(int argc, char **argv)
} while (0)
TEST(collapse_full, khugepaged_context, anon_ops);
- TEST(collapse_full, khugepaged_context, file_ops);
+ TEST(collapse_full, khugepaged_context, read_only_file_ops);
+ TEST(collapse_full, khugepaged_context, read_write_file_ops);
TEST(collapse_full, khugepaged_context, shmem_ops);
TEST(collapse_full, madvise_context, anon_ops);
- TEST(collapse_full, madvise_context, file_ops);
+ TEST(collapse_full, madvise_context, read_only_file_ops);
+ TEST(collapse_full, madvise_context, read_write_file_ops);
TEST(collapse_full, madvise_context, shmem_ops);
TEST(collapse_empty, khugepaged_context, anon_ops);
TEST(collapse_empty, madvise_context, anon_ops);
TEST(collapse_single_pte_entry, khugepaged_context, anon_ops);
- TEST(collapse_single_pte_entry, khugepaged_context, file_ops);
+ TEST(collapse_single_pte_entry, khugepaged_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry, khugepaged_context, read_write_file_ops);
TEST(collapse_single_pte_entry, khugepaged_context, shmem_ops);
TEST(collapse_single_pte_entry, madvise_context, anon_ops);
- TEST(collapse_single_pte_entry, madvise_context, file_ops);
+ TEST(collapse_single_pte_entry, madvise_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry, madvise_context, read_write_file_ops);
TEST(collapse_single_pte_entry, madvise_context, shmem_ops);
TEST(collapse_max_ptes_none, khugepaged_context, anon_ops);
- TEST(collapse_max_ptes_none, khugepaged_context, file_ops);
+ TEST(collapse_max_ptes_none, khugepaged_context, read_only_file_ops);
+ TEST(collapse_max_ptes_none, khugepaged_context, read_write_file_ops);
TEST(collapse_max_ptes_none, madvise_context, anon_ops);
- TEST(collapse_max_ptes_none, madvise_context, file_ops);
+ TEST(collapse_max_ptes_none, madvise_context, read_only_file_ops);
+ TEST(collapse_max_ptes_none, madvise_context, read_write_file_ops);
TEST(collapse_single_pte_entry_compound, khugepaged_context, anon_ops);
- TEST(collapse_single_pte_entry_compound, khugepaged_context, file_ops);
+ TEST(collapse_single_pte_entry_compound, khugepaged_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry_compound, khugepaged_context, read_write_file_ops);
TEST(collapse_single_pte_entry_compound, madvise_context, anon_ops);
- TEST(collapse_single_pte_entry_compound, madvise_context, file_ops);
+ TEST(collapse_single_pte_entry_compound, madvise_context, read_only_file_ops);
+ TEST(collapse_single_pte_entry_compound, madvise_context, read_write_file_ops);
TEST(collapse_full_of_compound, khugepaged_context, anon_ops);
- TEST(collapse_full_of_compound, khugepaged_context, file_ops);
+ TEST(collapse_full_of_compound, khugepaged_context, read_only_file_ops);
+ TEST(collapse_full_of_compound, khugepaged_context, read_write_file_ops);
TEST(collapse_full_of_compound, khugepaged_context, shmem_ops);
TEST(collapse_full_of_compound, madvise_context, anon_ops);
- TEST(collapse_full_of_compound, madvise_context, file_ops);
+ TEST(collapse_full_of_compound, madvise_context, read_only_file_ops);
+ TEST(collapse_full_of_compound, madvise_context, read_write_file_ops);
TEST(collapse_full_of_compound, madvise_context, shmem_ops);
TEST(collapse_compound_extreme, khugepaged_context, anon_ops);
@@ -1280,10 +1328,12 @@ int main(int argc, char **argv)
TEST(collapse_max_ptes_shared, madvise_context, anon_ops);
TEST(madvise_collapse_existing_thps, madvise_context, anon_ops);
- TEST(madvise_collapse_existing_thps, madvise_context, file_ops);
+ TEST(madvise_collapse_existing_thps, madvise_context, read_only_file_ops);
+ TEST(madvise_collapse_existing_thps, madvise_context, read_write_file_ops);
TEST(madvise_collapse_existing_thps, madvise_context, shmem_ops);
- TEST(madvise_retracted_page_tables, madvise_context, file_ops);
+ TEST(madvise_retracted_page_tables, madvise_context, read_only_file_ops);
+ TEST(madvise_retracted_page_tables, madvise_context, read_write_file_ops);
TEST(madvise_retracted_page_tables, madvise_context, shmem_ops);
restore_settings(0);
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index d8468451b3a3..50dd6b6d0225 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -489,8 +489,6 @@ CATEGORY="thp" run_test ./khugepaged all:shmem
CATEGORY="thp" run_test ./khugepaged -s 4 all:shmem
-CATEGORY="thp" run_test ./transhuge-stress -d 20
-
# Try to create XFS if not provided
if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then
if [ "${HAVE_HUGEPAGES}" = "1" ]; then
@@ -507,6 +505,14 @@ if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then
fi
fi
+if [ -n "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then
+CATEGORY="thp" run_test ./khugepaged all:file ${SPLIT_HUGE_PAGE_TEST_XFS_PATH}
+else
+ count_total=$(( count_total + 1 ))
+ count_skip=$(( count_skip + 1 ))
+ echo "[SKIP] ./khugepaged all:file" | tap_prefix
+fi
+
CATEGORY="thp" run_test ./split_huge_page_test ${SPLIT_HUGE_PAGE_TEST_XFS_PATH}
if [ -n "${MOUNTED_XFS}" ]; then
@@ -515,6 +521,8 @@ if [ -n "${MOUNTED_XFS}" ]; then
rm -f ${XFS_IMG}
fi
+CATEGORY="thp" run_test ./transhuge-stress -d 20
+
CATEGORY="thp" run_test ./folio_split_race_test
CATEGORY="migration" run_test ./migration
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (10 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
@ 2026-04-18 2:44 ` Zi Yan
2026-04-18 9:27 ` [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Lorenzo Stoakes
12 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2026-04-18 2:44 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Song Liu
Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
Any file system with large folio support and the supported orders include
PMD_ORDER can be used. There is no need to open a file with read-only.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
tools/testing/selftests/mm/guard-regions.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
index 48e8b1539be3..117639891953 100644
--- a/tools/testing/selftests/mm/guard-regions.c
+++ b/tools/testing/selftests/mm/guard-regions.c
@@ -2203,17 +2203,6 @@ TEST_F(guard_regions, collapse)
if (variant->backing != ANON_BACKED)
ASSERT_EQ(ftruncate(self->fd, size), 0);
- /*
- * We must close and re-open local-file backed as read-only for
- * CONFIG_READ_ONLY_THP_FOR_FS to work.
- */
- if (variant->backing == LOCAL_FILE_BACKED) {
- ASSERT_EQ(close(self->fd), 0);
-
- self->fd = open(self->path, O_RDONLY);
- ASSERT_GE(self->fd, 0);
- }
-
ptr = mmap_(self, variant, NULL, size, PROT_READ, 0, 0);
ASSERT_NE(ptr, MAP_FAILED);
@@ -2237,9 +2226,10 @@ TEST_F(guard_regions, collapse)
/*
* Now collapse the entire region. This should fail in all cases.
*
- * The madvise() call will also fail if CONFIG_READ_ONLY_THP_FOR_FS is
- * not set for the local file case, but we can't differentiate whether
- * this occurred or if the collapse was rightly rejected.
+ * The madvise() call will also fail if the file system does not support
+ * large folio or the supported orders do not include PMD_ORDER for the
+ * local file case, but we can't differentiate whether this occurred or
+ * if the collapse was rightly rejected.
*/
EXPECT_NE(madvise(ptr, size, MADV_COLLAPSE), 0);
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
` (11 preceding siblings ...)
2026-04-18 2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
@ 2026-04-18 9:27 ` Lorenzo Stoakes
12 siblings, 0 replies; 14+ messages in thread
From: Lorenzo Stoakes @ 2026-04-18 9:27 UTC (permalink / raw)
To: Zi Yan
Cc: Matthew Wilcox (Oracle),
Song Liu, Chris Mason, David Sterba, Alexander Viro,
Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
linux-kernel, linux-fsdevel, linux-mm, linux-kselftest
On Fri, Apr 17, 2026 at 10:44:17PM -0400, Zi Yan wrote:
> Hi all,
>
> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
> read-only THPs for FSes with large folio support (the supported orders
> need to include PMD_ORDER) by default.
>
> Before the patchset, the status of creating read-only THPs is below:
Good to specify the read-only bit up front!
>
> | PF | MADV_COLLAPSE | khugepaged |
> |-----------|---------------|------------|
> large folio FSes only | ✓ | x | x |
> READ_ONLY_THP_FOR_FS only | x | ✓ | ✓ |
> both | ✓ | ✓ | ✓ |
This diagrams seem familiar :P but very nice, thanks!
And since we include cover letter in series in mm this should be some nice
documentation in the commit msg also.
>
> where READ_ONLY_THP_FOR_FS implies no large folio FSes.
>
>
> Now without READ_ONLY_THP_FOR_FS:
>
> | PF | MADV_COLLAPSE | khugepaged |
> |-----------|---------------|------------|
> large folio FSes | ✓ | ✓ | ✓ |
> no large folio FSes | x | x | x |
This is really nice and clear thanks!
>
> This means no large folio FSes need to add large folio support (the
> supported orders need to include PMD_ORDER), so that they can leverage
> read-only THP creation function.
>
> To prevent breaking read-only THP support for large folio FSes,
> 1. first 4 patches enables the support, so that without READ_ONLY_THP_FOR_FS,
> read-only THP still works for large folio FSes,
I guess this introduces what was previously supported by
CONFIG_READ_ONLY_THP_FOR_FS to large folios as part of that before removal of
the config option?
> 2. Patch 5 removes READ_ONLY_THP_FOR_FS Kconfig,
> 3. the rest of patches remove code related to READ_ONLY_THP_FOR_FS.
Makes sense thanks!
>
>
> The overview of the changes is:
>
> 1. collapse_file() checks for to-be-collapsed folio dirtiness after they
> are locked, unmapped to make sure no new write happens. Before,
> mapping->nr_thps and inode->i_writecount are used to cause read-only
> THP truncation before a fd becomes writable.
>
> 2. hugepage_pmd_enabled() is true for anon, shmem, and file-backed cases
> if the global khugepaged control is on, otherwise, khugepaged for
> file-backed case is turned off and anon and shmem depend on per-size
> control knobs.
>
> 3. collapse_file() from mm/khugepaged.c, instead of checking
> CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
> of struct address_space of the file is at least PMD_ORDER.
>
> 4. file_thp_enabled() also checks mapping_max_folio_order() instead and
> no longer checks if the input file is opened as read-only (Change 1
> handles read-write files).
>
> 5. truncate_inode_partial_folio() calls folio_split() directly instead
> of the removed try_folio_split_to_order(), since large folios can
> only show up on a FS with large folio support.
>
> 6. nr_thps is removed from struct address_space, since it is no longer
> needed to drop all read-only THPs from a FS without large folio
> support when the fd becomes writable. Its related filemap_nr_thps*()
> are removed too.
>
> 7. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
>
> 8. Updated comments in various places.
>
>
> Changelog
> ===
> From V2[3]:
> 1. removed unnecessary check in collapse_scan_file().
>
> 2. removed inode_is_open_for_write() check in file_thp_enabled().
>
> 3. changed hugepage_pmd_enabled() to return true if khugepaged global
> control is on instead of false. cleaned up anon and shmem code in the
> function.
>
> 4. moved folio dirtiness check after try_to_unmap() but before
> try_to_unmap_flush(), since that is sufficient to prevent new writes.
>
> 5. reordered patch 4 and 5, so that khugepaged behavior does not change
> after READ_ONLY_THP_FOR_FS is removed.
>
> 6. added read-write file test in khugepaged selftest.
>
> 7. removed the read-only file restriction from guard-region selftest.
>
> From V1[2]:
> 1. removed inode_is_open_for_write() check in collapse_file(), since the
> added folio dirtiness check after try_to_unmap_flush() should be
> sufficient to prevent writes to candidate folios.
>
> 2. removed READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled(), please
> see Patch 5 and item 2 in the overview for more details.
>
> 3. moved the patch removing READ_ONLY_THP_FOR_FS Kconfig after enabling
> khugepaged and MADV_COLLAPSE to create read-only THPs.
>
> 4. added mapping_pmd_thp_support() helper function.
>
> 5. used VM_WARN_ON_ONCE() in collapse_file() for mapping eligibility check
> and address alignment check instead of if + return error code. Always
> allow shmem, since MADV_COLLAPSE ignore shmem huge config.
>
> 6. added mapping eligibility check in collapse_scan_file().
>
> 7. removed trailing ; for folio_split() in the !CONFIG_TRANSPARENT_HUGEPAGE.
>
> 8. simplified code in folio_check_splittable() after removing
> READ_ONLY_THP_FOR_FS code.
>
> 9. clarified that read-only THP works for FSes with PMD THP support by
> default.
>
> From RFC[1]:
> 1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
> on by default for all FSes with large folio support and the supported
> orders includes PMD_ORDER.
>
> Suggestions and comments are welcome.
>
> Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@nvidia.com/ [1]
> Link: https://lore.kernel.org/all/20260327014255.2058916-1-ziy@nvidia.com/ [2]
> Link: https://lore.kernel.org/all/20260413192030.3275825-1-ziy@nvidia.com/ [3]
>
> Zi Yan (12):
> mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
> mm/khugepaged: add folio dirty check after try_to_unmap()
> mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
> mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in
> hugepage_pmd_enabled()
> mm: remove READ_ONLY_THP_FOR_FS Kconfig option
> mm: fs: remove filemap_nr_thps*() functions and their users
> fs: remove nr_thps from struct address_space
> mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
> mm/truncate: use folio_split() in truncate_inode_partial_folio()
> fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
> selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
> selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions
>
> fs/btrfs/defrag.c | 3 -
> fs/inode.c | 3 -
> fs/open.c | 27 -----
> include/linux/fs.h | 5 -
> include/linux/huge_mm.h | 25 +----
> include/linux/pagemap.h | 35 ++-----
> include/linux/shmem_fs.h | 2 +-
> mm/Kconfig | 11 ---
> mm/filemap.c | 1 -
> mm/huge_memory.c | 39 ++------
> mm/khugepaged.c | 86 ++++++++--------
> mm/truncate.c | 8 +-
> tools/testing/selftests/mm/guard-regions.c | 18 +---
> tools/testing/selftests/mm/khugepaged.c | 110 +++++++++++++++------
> tools/testing/selftests/mm/run_vmtests.sh | 12 ++-
> 15 files changed, 156 insertions(+), 229 deletions(-)
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-04-18 9:27 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-18 9:27 ` [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox