[PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support
@ 2026-04-13 19:20 Zi Yan
  2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Hi all,

This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
read-only THPs for FSes with large folio support (the supported orders
need to include PMD_ORDER) by default.

Before the patchset, the status of creating read-only THPs is below:

                            |    PF     | MADV_COLLAPSE | khugepaged |
                            |-----------|---------------|------------|
 large folio FSes only      |     ✓     |       x       |      x     |
 READ_ONLY_THP_FOR_FS only  |     x     |       ✓       |      ✓     |
 both                       |     ✓     |       ✓       |      ✓     |

where READ_ONLY_THP_FOR_FS implies no large folio FSes.

Now without READ_ONLY_THP_FOR_FS:

                           |    PF     | MADV_COLLAPSE | khugepaged |
                           |-----------|---------------|------------|
 large folio FSes          |     ✓     |       ✓       |      ✓     |
 no large folio FSes       |     x     |       x       |      x     |

This means no large folio FSes need to add large folio support (the
supported orders need to include PMD_ORDER), so that they can leverage
read-only THP creation function.

To prevent breaking read-only THP support for large folio FSes,
1. first 3 patches enables the support, so that without READ_ONLY_THP_FOR_FS,
   read-only THP still works for large folio FSes,
2. Patch 4 removes READ_ONLY_THP_FOR_FS Kconfig,
3. the rest of patches remove code related to READ_ONLY_THP_FOR_FS.

The overview of the changes is:

1. collapse_file() checks for to-be-collapsed folio dirtiness after they
   are locked, unmapped, and corresponding mappings are flushed from
   TLBs to prevent writes to candidate folios. Before, mapping->nr_thps
   and inode->i_writecount are used to cause read-only THP truncation
   before a fd becomes writable.

2. hugepage_pmd_enabled() no longer always returned true if
   CONFIG_READ_ONLY_THP_FOR_FS was enabaled. This affects whether
   khugepaged will be on or not. It now depends on anon and shmem
   configurations. Namely, if a user who has set
   /sys/kernel/mm/transparent_hugepage/enabled to always or madvise but
   explicitly disabled anon PMD THP and shmem THP via the per-order
   sysfs controls, read-only THP support will be disabled.

3. collapse_file() from mm/khugepaged.c, instead of checking
   CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
   of struct address_space of the file is at least PMD_ORDER.

4. file_thp_enabled() also checks mapping_max_folio_order() instead.

5. truncate_inode_partial_folio() calls folio_split() directly instead
   of the removed try_folio_split_to_order(), since large folios can
   only show up on a FS with large folio support.

6. nr_thps is removed from struct address_space, since it is no longer
   needed to drop all read-only THPs from a FS without large folio
   support when the fd becomes writable. Its related filemap_nr_thps*()
   are removed too.

7. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.

8. Updated comments in various places.

Changelog
===
From V1[2]:
1. removed inode_is_open_for_write() check in collapse_file(), since the
   added folio dirtiness check after try_to_unmap_flush() should be
   sufficient to prevent writes to candidate folios.

2. removed READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled(), please
   see Patch 5 and item 2 in the overview for more details.

3. moved the patch removing READ_ONLY_THP_FOR_FS Kconfig after enabling
   khugepaged and MADV_COLLAPSE to create read-only THPs.

4. added mapping_pmd_thp_support() helper function.

5. used VM_WARN_ON_ONCE() in collapse_file() for mapping eligibility check
   and address alignment check instead of if + return error code. Always
   allow shmem, since MADV_COLLAPSE ignore shmem huge config.

6. added mapping eligibility check in collapse_scan_file().

7. removed trailing ; for folio_split() in the !CONFIG_TRANSPARENT_HUGEPAGE.

8. simplified code in folio_check_splittable() after removing
   READ_ONLY_THP_FOR_FS code.

9. clarified that read-only THP works for FSes with PMD THP support by
   default.

From RFC[1]:
1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
   on by default for all FSes with large folio support and the supported
   orders includes PMD_ORDER.

Suggestions and comments are welcome.

Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20260327014255.2058916-1-ziy@nvidia.com/ [2]

Zi Yan (12):
  mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  mm/khugepaged: add folio dirty check after try_to_unmap_flush()
  mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in
    hugepage_pmd_enabled()
  mm: fs: remove filemap_nr_thps*() functions and their users
  fs: remove nr_thps from struct address_space
  mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  mm/truncate: use folio_split() in truncate_inode_partial_folio()
  fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in
    guard-regions

 fs/btrfs/defrag.c                          |  3 -
 fs/inode.c                                 |  3 -
 fs/open.c                                  | 27 ---------
 include/linux/fs.h                         |  5 --
 include/linux/huge_mm.h                    | 25 +--------
 include/linux/pagemap.h                    | 29 ----------
 mm/Kconfig                                 | 11 ----
 mm/filemap.c                               |  1 -
 mm/huge_memory.c                           | 37 ++----------
 mm/khugepaged.c                            | 65 ++++++++++------------
 mm/truncate.c                              |  8 +--
 tools/testing/selftests/mm/guard-regions.c |  9 +--
 tools/testing/selftests/mm/khugepaged.c    |  4 +-
 13 files changed, 49 insertions(+), 178 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:20   ` Matthew Wilcox
  2026-04-14 10:29   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.

While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.

In collapse_scan_file(), add FS eligibility check to avoid redundant scans.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8452dbdb043..d2f0acd2dac2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1892,8 +1892,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	int nr_none = 0;
 	bool is_shmem = shmem_file(file);
 
-	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
-	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
+	/* MADV_COLLAPSE ignores shmem huge config, so do not check shmem */
+	VM_WARN_ON_ONCE(!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER);
+	VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
 
 	result = alloc_charge_folio(&new_folio, mm, cc);
 	if (result != SCAN_SUCCEED)
@@ -2321,6 +2322,13 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
 	int node = NUMA_NO_NODE;
 	enum scan_result result = SCAN_SUCCEED;
 
+	/*
+	 * skip files without PMD-order folio support
+	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
+	 */
+	if (!shmem_file(file) && mapping_max_folio_order(mapping) < PMD_ORDER)
+		return SCAN_FAIL;
+
 	present = 0;
 	swap = 0;
 	memset(cc->node_load, 0, sizeof(cc->node_load));
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-04-13 20:20   ` Matthew Wilcox
  2026-04-13 20:34     ` Zi Yan
  2026-04-14 10:29   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:20 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:19PM -0400, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
> 
> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.

Why?  These are bugs.  I don't think we gain anything from continuing.

> +	/*
> +	 * skip files without PMD-order folio support
> +	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
> +	 */
> +	if (!shmem_file(file) && mapping_max_folio_order(mapping) < PMD_ORDER)
> +		return SCAN_FAIL;

I wonder if it should.  If the commit message to 5a90c155defa is
to be believed,

    Since 'deny' is for emergencies and 'force' is for testing, performance
    issues should not be a problem in real production environments, so don't
    call mapping_set_large_folios() in __shmem_get_inode() when large folio is
    disabled with mount huge=never option (default policy).

so maybe MADV_COLLAPSE should honour huge=never?
Documentation/filesystems/tmpfs.rst implies that we do!

huge=never       Do not allocate huge pages.  This is the default.
huge=always      Attempt to allocate huge page every time a new page is needed.
huge=within_size Only allocate huge page if it will be fully within i_size.
                 Also respect madvise(2) hints.
huge=advise      Only allocate huge page if requested with madvise(2).

so what's the difference between huge=never and huge=madvise?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-13 20:20   ` Matthew Wilcox
@ 2026-04-13 20:34     ` Zi Yan
  2026-04-14 10:19       ` David Hildenbrand (Arm)
  2026-04-14 10:20       ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 20:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 13 Apr 2026, at 16:20, Matthew Wilcox wrote:

> On Mon, Apr 13, 2026 at 03:20:19PM -0400, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>>
>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>
> Why?  These are bugs.  I don't think we gain anything from continuing.

The goal is to catch these issues during development. VM_BUG_ON crashes
the system and that is too much for such issues in collapse_file().

>
>> +	/*
>> +	 * skip files without PMD-order folio support
>> +	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
>> +	 */
>> +	if (!shmem_file(file) && mapping_max_folio_order(mapping) < PMD_ORDER)
>> +		return SCAN_FAIL;
>
> I wonder if it should.  If the commit message to 5a90c155defa is
> to be believed,
>
>     Since 'deny' is for emergencies and 'force' is for testing, performance
>     issues should not be a problem in real production environments, so don't
>     call mapping_set_large_folios() in __shmem_get_inode() when large folio is
>     disabled with mount huge=never option (default policy).
>
> so maybe MADV_COLLAPSE should honour huge=never?
> Documentation/filesystems/tmpfs.rst implies that we do!
>
> huge=never       Do not allocate huge pages.  This is the default.
> huge=always      Attempt to allocate huge page every time a new page is needed.
> huge=within_size Only allocate huge page if it will be fully within i_size.
>                  Also respect madvise(2) hints.
> huge=advise      Only allocate huge page if requested with madvise(2).
>
> so what's the difference between huge=never and huge=madvise?

I think madvise means MADV_HUGEPAGE for the region, not MADV_COLLAPSE.

In v1, I did the check for shmem, but that regressed MADV_COLLAPSE, which
always can collapse THPs on shmem. I know it sounds unreasonable, but
that ship has sailed.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-13 20:34     ` Zi Yan
@ 2026-04-14 10:19       ` David Hildenbrand (Arm)
  2026-04-14 10:20       ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 10:19 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 22:34, Zi Yan wrote:
> On 13 Apr 2026, at 16:20, Matthew Wilcox wrote:
> 
>> On Mon, Apr 13, 2026 at 03:20:19PM -0400, Zi Yan wrote:
>>> collapse_file() requires FSes supporting large folio with at least
>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>>>
>>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>>
>> Why?  These are bugs.  I don't think we gain anything from continuing.
> 
> The goal is to catch these issues during development. VM_BUG_ON crashes
> the system and that is too much for such issues in collapse_file().

VM_BUG_ON should have never been added to the kernel.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-13 20:34     ` Zi Yan
  2026-04-14 10:19       ` David Hildenbrand (Arm)
@ 2026-04-14 10:20       ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 10:20 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

>> huge=never       Do not allocate huge pages.  This is the default.
>> huge=always      Attempt to allocate huge page every time a new page is needed.
>> huge=within_size Only allocate huge page if it will be fully within i_size.
>>                  Also respect madvise(2) hints.
>> huge=advise      Only allocate huge page if requested with madvise(2).
>>
>> so what's the difference between huge=never and huge=madvise?
> 
> I think madvise means MADV_HUGEPAGE for the region, not MADV_COLLAPSE.

Yes, most likely. The doc is quite old.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
  2026-04-13 20:20   ` Matthew Wilcox
@ 2026-04-14 10:29   ` David Hildenbrand (Arm)
  2026-04-14 15:37     ` Lance Yang
  1 sibling, 1 reply; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 10:29 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
> 
> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
> 
> In collapse_scan_file(), add FS eligibility check to avoid redundant scans.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/khugepaged.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b8452dbdb043..d2f0acd2dac2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1892,8 +1892,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  	int nr_none = 0;
>  	bool is_shmem = shmem_file(file);
>  
> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> +	/* MADV_COLLAPSE ignores shmem huge config, so do not check shmem */
> +	VM_WARN_ON_ONCE(!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER);
> +	VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
>  
>  	result = alloc_charge_folio(&new_folio, mm, cc);
>  	if (result != SCAN_SUCCEED)
> @@ -2321,6 +2322,13 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
>  	int node = NUMA_NO_NODE;
>  	enum scan_result result = SCAN_SUCCEED;
>  
> +	/*
> +	 * skip files without PMD-order folio support
> +	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
> +	 */

How is the !collapse path handled? Through thp_vma_allowable_order() in
collapse_scan_mm_slot()?

Wouldn't it be better to have that check exactly there?

-- 
Cheers,

David



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-14 10:29   ` David Hildenbrand (Arm)
@ 2026-04-14 15:37     ` Lance Yang
  2026-04-14 15:43       ` Lance Yang
  0 siblings, 1 reply; 50+ messages in thread
From: Lance Yang @ 2026-04-14 15:37 UTC (permalink / raw)
  To: david, ziy
  Cc: willy, songliubraving, clm, dsterba, viro, brauner, jack, akpm,
	ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, lance.yang, vbabka, rppt, surenb, mhocko, shuah,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest


On Tue, Apr 14, 2026 at 12:29:04PM +0200, David Hildenbrand (Arm) wrote:
>On 4/13/26 21:20, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>> 
>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>> 
>> In collapse_scan_file(), add FS eligibility check to avoid redundant scans.
>> 
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/khugepaged.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b8452dbdb043..d2f0acd2dac2 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1892,8 +1892,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>  	int nr_none = 0;
>>  	bool is_shmem = shmem_file(file);
>>  
>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>> +	/* MADV_COLLAPSE ignores shmem huge config, so do not check shmem */
>> +	VM_WARN_ON_ONCE(!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER);
>> +	VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
>>  
>>  	result = alloc_charge_folio(&new_folio, mm, cc);
>>  	if (result != SCAN_SUCCEED)
>> @@ -2321,6 +2322,13 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
>>  	int node = NUMA_NO_NODE;
>>  	enum scan_result result = SCAN_SUCCEED;
>>  
>> +	/*
>> +	 * skip files without PMD-order folio support
>> +	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
>> +	 */
>
>How is the !collapse path handled? Through thp_vma_allowable_order() in
>collapse_scan_mm_slot()?
>
>Wouldn't it be better to have that check exactly there?

Right! Looks like patch #03[1] already does that, as David also pointed
out there :)

With that in place, regular files should end up in file_thp_enabled(),
which checks that mapping_max_folio_order() >= PMD_ORDER.

For khugepaged, collapse_scan_mm_slot() calls thp_vma_allowable_order()
before entering the per-PMD scan loop, so ineligible regular file VMAs
should already get filtered there.

madvise_collapse() also calls thp_vma_allowable_order() early, so it
should get the same filtering before reaching collapse_scan_file().

So the extra check here looks redundant :)

[1] https://lore.kernel.org/linux-mm/20260413192030.3275825-4-ziy@nvidia.com/

Cheers,
Lance


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-14 15:37     ` Lance Yang
@ 2026-04-14 15:43       ` Lance Yang
  2026-04-14 15:59         ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: Lance Yang @ 2026-04-14 15:43 UTC (permalink / raw)
  To: ziy, david
  Cc: willy, songliubraving, clm, dsterba, viro, brauner, jack, akpm,
	ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest,
	Lance Yang


On Tue, Apr 14, 2026 at 11:37:24PM +0800, Lance Yang wrote:
>
>On Tue, Apr 14, 2026 at 12:29:04PM +0200, David Hildenbrand (Arm) wrote:
>>On 4/13/26 21:20, Zi Yan wrote:
>>> collapse_file() requires FSes supporting large folio with at least
>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>>> 
>>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>>> 
>>> In collapse_scan_file(), add FS eligibility check to avoid redundant scans.
>>> 
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>  mm/khugepaged.c | 12 ++++++++++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index b8452dbdb043..d2f0acd2dac2 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1892,8 +1892,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>  	int nr_none = 0;
>>>  	bool is_shmem = shmem_file(file);
>>>  
>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>> +	/* MADV_COLLAPSE ignores shmem huge config, so do not check shmem */
>>> +	VM_WARN_ON_ONCE(!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER);
>>> +	VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
>>>  
>>>  	result = alloc_charge_folio(&new_folio, mm, cc);
>>>  	if (result != SCAN_SUCCEED)
>>> @@ -2321,6 +2322,13 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
>>>  	int node = NUMA_NO_NODE;
>>>  	enum scan_result result = SCAN_SUCCEED;
>>>  
>>> +	/*
>>> +	 * skip files without PMD-order folio support
>>> +	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
>>> +	 */
>>
>>How is the !collapse path handled? Through thp_vma_allowable_order() in
>>collapse_scan_mm_slot()?
>>
>>Wouldn't it be better to have that check exactly there?
>
>Right! Looks like patch #03[1] already does that, as David also pointed
>out there :)
>
>With that in place, regular files should end up in file_thp_enabled(),
>which checks that mapping_max_folio_order() >= PMD_ORDER.

Forgot to add:

thp_vma_allowable_order()
	-> file_thp_enabled()
	-> mapping_max_folio_order() check

>For khugepaged, collapse_scan_mm_slot() calls thp_vma_allowable_order()
>before entering the per-PMD scan loop, so ineligible regular file VMAs
>should already get filtered there.
>
>madvise_collapse() also calls thp_vma_allowable_order() early, so it
>should get the same filtering before reaching collapse_scan_file().
>
>So the extra check here looks redundant :)
>
>[1] https://lore.kernel.org/linux-mm/20260413192030.3275825-4-ziy@nvidia.com/
>
>Cheers,
>Lance
>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
  2026-04-14 15:43       ` Lance Yang
@ 2026-04-14 15:59         ` Zi Yan
  0 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-14 15:59 UTC (permalink / raw)
  To: Lance Yang, david
  Cc: willy, songliubraving, clm, dsterba, viro, brauner, jack, akpm,
	ljs, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, vbabka, rppt, surenb, mhocko, shuah, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 11:43, Lance Yang wrote:

> On Tue, Apr 14, 2026 at 11:37:24PM +0800, Lance Yang wrote:
>>
>> On Tue, Apr 14, 2026 at 12:29:04PM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/13/26 21:20, Zi Yan wrote:
>>>> collapse_file() requires FSes supporting large folio with at least
>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>>>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>>>>
>>>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>>>>
>>>> In collapse_scan_file(), add FS eligibility check to avoid redundant scans.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>>  mm/khugepaged.c | 12 ++++++++++--
>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index b8452dbdb043..d2f0acd2dac2 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1892,8 +1892,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>  	int nr_none = 0;
>>>>  	bool is_shmem = shmem_file(file);
>>>>
>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>> +	/* MADV_COLLAPSE ignores shmem huge config, so do not check shmem */
>>>> +	VM_WARN_ON_ONCE(!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER);
>>>> +	VM_WARN_ON_ONCE(start & (HPAGE_PMD_NR - 1));
>>>>
>>>>  	result = alloc_charge_folio(&new_folio, mm, cc);
>>>>  	if (result != SCAN_SUCCEED)
>>>> @@ -2321,6 +2322,13 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
>>>>  	int node = NUMA_NO_NODE;
>>>>  	enum scan_result result = SCAN_SUCCEED;
>>>>
>>>> +	/*
>>>> +	 * skip files without PMD-order folio support
>>>> +	 * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
>>>> +	 */
>>>
>>> How is the !collapse path handled? Through thp_vma_allowable_order() in
>>> collapse_scan_mm_slot()?
>>>
>>> Wouldn't it be better to have that check exactly there?
>>
>> Right! Looks like patch #03[1] already does that, as David also pointed
>> out there :)
>>
>> With that in place, regular files should end up in file_thp_enabled(),
>> which checks that mapping_max_folio_order() >= PMD_ORDER.
>
> Forgot to add:
>
> thp_vma_allowable_order()
> 	-> file_thp_enabled()
> 	-> mapping_max_folio_order() check
>
>> For khugepaged, collapse_scan_mm_slot() calls thp_vma_allowable_order()
>> before entering the per-PMD scan loop, so ineligible regular file VMAs
>> should already get filtered there.
>>
>> madvise_collapse() also calls thp_vma_allowable_order() early, so it
>> should get the same filtering before reaching collapse_scan_file().
>>
>> So the extra check here looks redundant :)
>>
>> [1] https://lore.kernel.org/linux-mm/20260413192030.3275825-4-ziy@nvidia.com/
>>

Right. Will remove the extra check from collapse_scan_file().

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
  2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:23   ` Matthew Wilcox
  2026-04-14 10:38   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

This check ensures the correctness of collapse read-only THPs for FSes
after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
PMD THP pagecache.

READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, rollback if any folio is dirty after
try_to_unmap_flush() to , since a dirty folio means this read-only folio
got some writes via mmap can happen between try_to_unmap() and
try_to_unmap_flush() via cached TLB entries and khugepaged does not support
collapse writable pagecache folios.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d2f0acd2dac2..ec609e53082e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2121,6 +2121,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	 */
 	try_to_unmap_flush();
 
+	/*
+	 * At this point, all folios are locked, unmapped, and all cached
+	 * mappings in TLBs are flushed. No one else is able to write to these
+	 * folios, since
+	 * 1. writes via FS ops require folio locks (see write_begin_get_folio());
+	 * 2. writes via mmap require taking a fault and locking folio locks.
+	 *
+	 * khugepaged only works for read-only fd, make sure all folios are
+	 * clean, since writes via mmap can happen between try_to_unmap() and
+	 * try_to_unmap_flush() via cached TLB entries.
+	 */
+	list_for_each_entry(folio, &pagelist, lru) {
+		if (!is_shmem && (folio_test_dirty(folio))) {
+			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+			goto rollback;
+		}
+	}
+
 	if (result == SCAN_SUCCEED && nr_none &&
 	    !shmem_charge(mapping->host, nr_none))
 		result = SCAN_FAIL;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()
  2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
@ 2026-04-13 20:23   ` Matthew Wilcox
  2026-04-13 20:28     ` Zi Yan
  2026-04-14 10:38   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:23 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:20PM -0400, Zi Yan wrote:
> +		if (!is_shmem && (folio_test_dirty(folio))) {

seems like a spurious pair of brackets?

		if (!is_shmem && folio_test_dirty(folio)) {

should be fine



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()
  2026-04-13 20:23   ` Matthew Wilcox
@ 2026-04-13 20:28     ` Zi Yan
  0 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 20:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 13 Apr 2026, at 16:23, Matthew Wilcox wrote:

> On Mon, Apr 13, 2026 at 03:20:20PM -0400, Zi Yan wrote:
>> +		if (!is_shmem && (folio_test_dirty(folio))) {
>
> seems like a spurious pair of brackets?
>
> 		if (!is_shmem && folio_test_dirty(folio)) {
>
> should be fine

Will remove them.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()
  2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
  2026-04-13 20:23   ` Matthew Wilcox
@ 2026-04-14 10:38   ` David Hildenbrand (Arm)
  2026-04-14 15:55     ` Zi Yan
  1 sibling, 1 reply; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 10:38 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> This check ensures the correctness of collapse read-only THPs for FSes
> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
> PMD THP pagecache.
> 
> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
> and inode->i_writecount to prevent any write to read-only to-be-collapsed
> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
> aforementioned mechanism will go away too. To ensure khugepaged functions
> as expected after the changes, rollback if any folio is dirty after
> try_to_unmap_flush() to , since a dirty folio means this read-only folio
> got some writes via mmap can happen between try_to_unmap() and
> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
> collapse writable pagecache folios.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/khugepaged.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d2f0acd2dac2..ec609e53082e 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2121,6 +2121,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  	 */
>  	try_to_unmap_flush();
>  
> +	/*
> +	 * At this point, all folios are locked, unmapped, and all cached
> +	 * mappings in TLBs are flushed. No one else is able to write to these
> +	 * folios, since
> +	 * 1. writes via FS ops require folio locks (see write_begin_get_folio());
> +	 * 2. writes via mmap require taking a fault and locking folio locks.
> +	 *

maybe simplify to "folios, since that would require taking the folio lock first."

> +	 * khugepaged only works for read-only fd, make sure all folios are
> +	 * clean, since writes via mmap can happen between try_to_unmap() and
> +	 * try_to_unmap_flush() via cached TLB entries.


IIRC, after successful try_to_unmap() the PTE dirty bit would be synced to the 
folio. That's what you care about, not about any stale TLB entries.

The important part is that the 

So can't we simply test for dirty folios after the refcount check (where
we made sure the folio is no longer mapped)?



diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b2ac28ddd480..920e16067134 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2089,6 +2089,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
                        goto out_unlock;
                }
 
+               /* ... */
+               if (!is_shmem && folio_test_dirty(folio)) {
+                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+                       xas_unlock_irq(&xas);
+                       folio_putback_lru(folio);
+                       goto out_unlock;
+               }
+
                /*
                 * Accumulate the folios that are being collapsed.


I guess we don't have to recheck folio_test_writeback() ?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()
  2026-04-14 10:38   ` David Hildenbrand (Arm)
@ 2026-04-14 15:55     ` Zi Yan
  0 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-14 15:55 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Matthew Wilcox (Oracle),
	Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 6:38, David Hildenbrand (Arm) wrote:

> On 4/13/26 21:20, Zi Yan wrote:
>> This check ensures the correctness of collapse read-only THPs for FSes
>> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>> PMD THP pagecache.
>>
>> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>> and inode->i_writecount to prevent any write to read-only to-be-collapsed
>> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>> aforementioned mechanism will go away too. To ensure khugepaged functions
>> as expected after the changes, rollback if any folio is dirty after
>> try_to_unmap_flush() to , since a dirty folio means this read-only folio
>> got some writes via mmap can happen between try_to_unmap() and
>> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>> collapse writable pagecache folios.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/khugepaged.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d2f0acd2dac2..ec609e53082e 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2121,6 +2121,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>  	 */
>>  	try_to_unmap_flush();
>>
>> +	/*
>> +	 * At this point, all folios are locked, unmapped, and all cached
>> +	 * mappings in TLBs are flushed. No one else is able to write to these
>> +	 * folios, since
>> +	 * 1. writes via FS ops require folio locks (see write_begin_get_folio());
>> +	 * 2. writes via mmap require taking a fault and locking folio locks.
>> +	 *
>
> maybe simplify to "folios, since that would require taking the folio lock first."

Sure.

>
>> +	 * khugepaged only works for read-only fd, make sure all folios are
>> +	 * clean, since writes via mmap can happen between try_to_unmap() and
>> +	 * try_to_unmap_flush() via cached TLB entries.
>
>
> IIRC, after successful try_to_unmap() the PTE dirty bit would be synced to the
> folio. That's what you care about, not about any stale TLB entries.

Right. I missed the PTE dirty bit to folio dirtiness part.

>
> The important part is that the
>
> So can't we simply test for dirty folios after the refcount check (where
> we made sure the folio is no longer mapped)?
>

I think so.

>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b2ac28ddd480..920e16067134 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2089,6 +2089,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>                         goto out_unlock;
>                 }
>
> +               /* ... */
> +               if (!is_shmem && folio_test_dirty(folio)) {
> +                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> +                       xas_unlock_irq(&xas);
> +                       folio_putback_lru(folio);
> +                       goto out_unlock;
> +               }
> +
>                 /*
>                  * Accumulate the folios that are being collapsed.
>
>
> I guess we don't have to recheck folio_test_writeback() ?

Right, writeback needs to take folio lock and we are holding it, so
others can only dirty the folio but are not able to initiate write backs.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
  2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
  2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-14 10:40   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Replace it with a check on the max folio order of the file's address space
mapping, making sure PMD_ORDER is supported.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..a22bb2364bdc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 {
 	struct inode *inode;
 
-	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
-		return false;
-
 	if (!vma->vm_file)
 		return false;
 
@@ -97,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 	if (IS_ANON_FILE(inode))
 		return false;
 
+	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
+		return false;
+
 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-04-13 19:20 ` [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-04-14 10:40   ` David Hildenbrand (Arm)
  2026-04-14 15:59     ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 10:40 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> Replace it with a check on the max folio order of the file's address space
> mapping, making sure PMD_ORDER is supported.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/huge_memory.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 970e077019b7..a22bb2364bdc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  {
>  	struct inode *inode;
>  
> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> -		return false;
> -
>  	if (!vma->vm_file)
>  		return false;
>  
> @@ -97,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  	if (IS_ANON_FILE(inode))
>  		return false;
>  
> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> +		return false;
> +
>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>  }
>  

I assume this change itself should be sufficient and the SCAN_FAIL check
in patch #1 is not required?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
  2026-04-14 10:40   ` David Hildenbrand (Arm)
@ 2026-04-14 15:59     ` Zi Yan
  0 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-14 15:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Matthew Wilcox (Oracle),
	Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 6:40, David Hildenbrand (Arm) wrote:

> On 4/13/26 21:20, Zi Yan wrote:
>> Replace it with a check on the max folio order of the file's address space
>> mapping, making sure PMD_ORDER is supported.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/huge_memory.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 970e077019b7..a22bb2364bdc 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -86,9 +86,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>  {
>>  	struct inode *inode;
>>
>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>> -		return false;
>> -
>>  	if (!vma->vm_file)
>>  		return false;
>>
>> @@ -97,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>  	if (IS_ANON_FILE(inode))
>>  		return false;
>>
>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>> +		return false;
>> +
>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>  }
>>
>
> I assume this change itself should be sufficient and the SCAN_FAIL check
> in patch #1 is not required?
>

Sure, I will remove that one.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (2 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-14 10:40   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(),
khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache
support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first
so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove
mapping->nr_thps, which its safe guard mechanism relies on.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/Kconfig | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index bd283958d675..408fc7b82233 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -937,17 +937,6 @@ config THP_SWAP
 
 	  For selection by architectures with reasonable THP sizes.
 
-config READ_ONLY_THP_FOR_FS
-	bool "Read-only THP for filesystems (EXPERIMENTAL)"
-	depends on TRANSPARENT_HUGEPAGE
-
-	help
-	  Allow khugepaged to put read-only file-backed pages in THP.
-
-	  This is marked experimental because it is a new feature. Write
-	  support of file THPs will be developed in the next few release
-	  cycles.
-
 config NO_PAGE_MAPCOUNT
 	bool "No per-page mapcount (EXPERIMENTAL)"
 	help
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option
  2026-04-13 19:20 ` [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
@ 2026-04-14 10:40   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 10:40 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> After removing READ_ONLY_THP_FOR_FS check in file_thp_enabled(),
> khugepaged and MADV_COLLAPSE can run on FSes with PMD THP pagecache
> support even without READ_ONLY_THP_FOR_FS enabled. Remove the Kconfig first
> so that no one can use READ_ONLY_THP_FOR_FS as upcoming commits remove
> mapping->nr_thps, which its safe guard mechanism relies on.
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (3 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:33   ` Matthew Wilcox
  2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

After READ_ONLY_THP_FOR_FS Kconfig is removed, this check becomes dead
code.

This changes hugepage_pmd_enabled() semantics. Previously, with
READ_ONLY_THP_FOR_FS enabled, hugepage_pmd_enabled() returned true whenever
/sys/kernel/mm/transparent_hugepage/enabled was set to "always" or
"madvise".

After this change, hugepage_pmd_enabled() is governed only by the anon and
shmem PMD THP controls. As a result, khugepaged collapse for file-backed
folios no longer runs unconditionally under the top-level THP setting, and
now depends on the anon/shmem PMD configuration.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ec609e53082e..79c985d7fa03 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -409,15 +409,12 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
 static bool hugepage_pmd_enabled(void)
 {
 	/*
-	 * We cover the anon, shmem and the file-backed case here; file-backed
-	 * hugepages, when configured in, are determined by the global control.
+	 * We cover the anon and shmem cases here.
 	 * Anon pmd-sized hugepages are determined by the pmd-size control.
 	 * Shmem pmd-sized hugepages are also determined by its pmd-size control,
 	 * except when the global shmem_huge is set to SHMEM_HUGE_DENY.
+	 * The file-backed case is determined by the anon and shmem cases.
 	 */
-	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-	    hugepage_global_enabled())
-		return true;
 	if (test_bit(PMD_ORDER, &huge_anon_orders_always))
 		return true;
 	if (test_bit(PMD_ORDER, &huge_anon_orders_madvise))
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-13 19:20 ` [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
@ 2026-04-13 20:33   ` Matthew Wilcox
  2026-04-13 20:42     ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:33 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:23PM -0400, Zi Yan wrote:
> After READ_ONLY_THP_FOR_FS Kconfig is removed, this check becomes dead
> code.
> 
> This changes hugepage_pmd_enabled() semantics. Previously, with
> READ_ONLY_THP_FOR_FS enabled, hugepage_pmd_enabled() returned true whenever
> /sys/kernel/mm/transparent_hugepage/enabled was set to "always" or
> "madvise".
> 
> After this change, hugepage_pmd_enabled() is governed only by the anon and
> shmem PMD THP controls. As a result, khugepaged collapse for file-backed
> folios no longer runs unconditionally under the top-level THP setting, and
> now depends on the anon/shmem PMD configuration.

This seems like it'll turn off khugepaged too easily.  I would have
thought we'd want:

-	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-	    hugepage_global_enabled())
+	if (hugepage_global_enabled())
 		return true;

... or maybe this whole thing could be simplified?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-13 20:33   ` Matthew Wilcox
@ 2026-04-13 20:42     ` Zi Yan
  2026-04-14 11:02       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 20:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 13 Apr 2026, at 16:33, Matthew Wilcox wrote:

> On Mon, Apr 13, 2026 at 03:20:23PM -0400, Zi Yan wrote:
>> After READ_ONLY_THP_FOR_FS Kconfig is removed, this check becomes dead
>> code.
>>
>> This changes hugepage_pmd_enabled() semantics. Previously, with
>> READ_ONLY_THP_FOR_FS enabled, hugepage_pmd_enabled() returned true whenever
>> /sys/kernel/mm/transparent_hugepage/enabled was set to "always" or
>> "madvise".
>>
>> After this change, hugepage_pmd_enabled() is governed only by the anon and
>> shmem PMD THP controls. As a result, khugepaged collapse for file-backed
>> folios no longer runs unconditionally under the top-level THP setting, and
>> now depends on the anon/shmem PMD configuration.
>
> This seems like it'll turn off khugepaged too easily.  I would have
> thought we'd want:
>
> -	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> -	    hugepage_global_enabled())
> +	if (hugepage_global_enabled())
>  		return true;

I thought about this, but it means khugepaged is turned on regardless of
anon and shmem configs. I tend to think the original code was a bug,
since enabling CONFIG_READ_ONLY_THP_FOR_FS would enable khugepaged all
the time.

>
> ... or maybe this whole thing could be simplified?

Alternatives could be:
1. to add a file-backed khhugepaged config, but another sysfs?
2. to replace hugepage_pmd_enabled() with hugepage_global_enabled()
   and let thp_vma_allowable_order() in collapse_scan_mm_slot()
   skip not qualified VMAs, but that would waste extra CPU cycles
   for scanning. Maybe not too much waste.



Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-13 20:42     ` Zi Yan
@ 2026-04-14 11:02       ` David Hildenbrand (Arm)
  2026-04-14 16:30         ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:02 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 22:42, Zi Yan wrote:
> On 13 Apr 2026, at 16:33, Matthew Wilcox wrote:
> 
>> On Mon, Apr 13, 2026 at 03:20:23PM -0400, Zi Yan wrote:
>>> After READ_ONLY_THP_FOR_FS Kconfig is removed, this check becomes dead
>>> code.
>>>
>>> This changes hugepage_pmd_enabled() semantics. Previously, with
>>> READ_ONLY_THP_FOR_FS enabled, hugepage_pmd_enabled() returned true whenever
>>> /sys/kernel/mm/transparent_hugepage/enabled was set to "always" or
>>> "madvise".
>>>
>>> After this change, hugepage_pmd_enabled() is governed only by the anon and
>>> shmem PMD THP controls. As a result, khugepaged collapse for file-backed
>>> folios no longer runs unconditionally under the top-level THP setting, and
>>> now depends on the anon/shmem PMD configuration.
>>
>> This seems like it'll turn off khugepaged too easily.  I would have
>> thought we'd want:
>>
>> -	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>> -	    hugepage_global_enabled())
>> +	if (hugepage_global_enabled())
>>  		return true;
>

I assume such a change should come before patch #4, as it seems to affect
the functionality that depended on CONFIG_READ_ONLY_THP_FOR_FS.
 
> I thought about this, but it means khugepaged is turned on regardless of
> anon and shmem configs. I tend to think the original code was a bug,
> since enabling CONFIG_READ_ONLY_THP_FOR_FS would enable khugepaged all
> the time.

There might be some FS mapping to collapse? So that makes sense to
some degree.

I really don't like the side-effects of "/sys/kernel/mm/transparent_hugepage/enabled".
Like, enabling khugepaged+PMD for files.

> 
>>
>> ... or maybe this whole thing could be simplified?
> 
> Alternatives could be:
> 1. to add a file-backed khhugepaged config, but another sysfs?

Maybe that would be the time to decouple file THP logic from
hugepage_global_enabled()/hugepage_global_always().

In particular, as pagecache folio allocation doesn't really care about __thp_vma_allowable_orders() IIRC.

I'm thinking about something like the following:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b2a6060b3c20..fb3a4fd84fe0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -184,15 +184,6 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
                                                   forced_collapse);
 
        if (!vma_is_anonymous(vma)) {
-               /*
-                * Enforce THP collapse requirements as necessary. Anonymous vmas
-                * were already handled in thp_vma_allowable_orders().
-                */
-               if (!forced_collapse &&
-                   (!hugepage_global_enabled() || (!(vm_flags & VM_HUGEPAGE) &&
-                                                   !hugepage_global_always())))
-                       return 0;
-
                /*
                 * Trust that ->huge_fault() handlers know what they are doing
                 * in fault path.

Then, we might indeed just want a khugepaged toggle whether to enable it at
all in files. (or just a toggle to disable khugeapged entirely?)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-14 11:02       ` David Hildenbrand (Arm)
@ 2026-04-14 16:30         ` Zi Yan
  2026-04-14 18:14           ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-14 16:30 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Matthew Wilcox, Nico Pache
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 7:02, David Hildenbrand (Arm) wrote:

> On 4/13/26 22:42, Zi Yan wrote:
>> On 13 Apr 2026, at 16:33, Matthew Wilcox wrote:
>>
>>> On Mon, Apr 13, 2026 at 03:20:23PM -0400, Zi Yan wrote:
>>>> After READ_ONLY_THP_FOR_FS Kconfig is removed, this check becomes dead
>>>> code.
>>>>
>>>> This changes hugepage_pmd_enabled() semantics. Previously, with
>>>> READ_ONLY_THP_FOR_FS enabled, hugepage_pmd_enabled() returned true whenever
>>>> /sys/kernel/mm/transparent_hugepage/enabled was set to "always" or
>>>> "madvise".
>>>>
>>>> After this change, hugepage_pmd_enabled() is governed only by the anon and
>>>> shmem PMD THP controls. As a result, khugepaged collapse for file-backed
>>>> folios no longer runs unconditionally under the top-level THP setting, and
>>>> now depends on the anon/shmem PMD configuration.
>>>
>>> This seems like it'll turn off khugepaged too easily.  I would have
>>> thought we'd want:
>>>
>>> -	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>> -	    hugepage_global_enabled())
>>> +	if (hugepage_global_enabled())
>>>  		return true;
>>
>
> I assume such a change should come before patch #4, as it seems to affect
> the functionality that depended on CONFIG_READ_ONLY_THP_FOR_FS.

If the goal is to have a knob of khugepaged for all files, yes I will move
the change before Patch 4.

>
>> I thought about this, but it means khugepaged is turned on regardless of
>> anon and shmem configs. I tend to think the original code was a bug,
>> since enabling CONFIG_READ_ONLY_THP_FOR_FS would enable khugepaged all
>> the time.
>
> There might be some FS mapping to collapse? So that makes sense to
> some degree.
>
> I really don't like the side-effects of "/sys/kernel/mm/transparent_hugepage/enabled".
> Like, enabling khugepaged+PMD for files.
>

I am not a fan either, but I was not sure about another sysfs knob.

>>
>>>
>>> ... or maybe this whole thing could be simplified?
>>
>> Alternatives could be:
>> 1. to add a file-backed khhugepaged config, but another sysfs?
>
> Maybe that would be the time to decouple file THP logic from
> hugepage_global_enabled()/hugepage_global_always().
>
> In particular, as pagecache folio allocation doesn't really care about __thp_vma_allowable_orders() IIRC.
>
> I'm thinking about something like the following:
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b2a6060b3c20..fb3a4fd84fe0 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -184,15 +184,6 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>                                                    forced_collapse);
>
>         if (!vma_is_anonymous(vma)) {
> -               /*
> -                * Enforce THP collapse requirements as necessary. Anonymous vmas
> -                * were already handled in thp_vma_allowable_orders().
> -                */
> -               if (!forced_collapse &&
> -                   (!hugepage_global_enabled() || (!(vm_flags & VM_HUGEPAGE) &&
> -                                                   !hugepage_global_always())))
> -                       return 0;
> -
>                 /*
>                  * Trust that ->huge_fault() handlers know what they are doing
>                  * in fault path.

Looks reasonable.

>
> Then, we might indeed just want a khugepaged toggle whether to enable it at
> all in files. (or just a toggle to disable khugeapged entirely?)
>

I think hugepage_global_enabled() should be enough to decide whether khugepaged
should run or not.

Currently, we have thp_vma_allowable_orders() to filter each VMAs and I do not
see a reason to use hugepage_pmd_enabled() to guard khugepaged daemon. I am
going to just remove hugepage_pmd_enabled() and replace it with
hugepage_global_enabled(). Let me know your thoughts.

BTW, this conflicts with Patch 12 from Nico’s khugepaged for mTHP patchset.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-14 16:30         ` Zi Yan
@ 2026-04-14 18:14           ` David Hildenbrand (Arm)
  2026-04-14 18:25             ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 18:14 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox, Nico Pache
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/14/26 18:30, Zi Yan wrote:
> On 14 Apr 2026, at 7:02, David Hildenbrand (Arm) wrote:
> 
>> On 4/13/26 22:42, Zi Yan wrote:
>>>
>>>
>>
>> I assume such a change should come before patch #4, as it seems to affect
>> the functionality that depended on CONFIG_READ_ONLY_THP_FOR_FS.
> 
> If the goal is to have a knob of khugepaged for all files, yes I will move
> the change before Patch 4.
> 
>>
>>> I thought about this, but it means khugepaged is turned on regardless of
>>> anon and shmem configs. I tend to think the original code was a bug,
>>> since enabling CONFIG_READ_ONLY_THP_FOR_FS would enable khugepaged all
>>> the time.
>>
>> There might be some FS mapping to collapse? So that makes sense to
>> some degree.
>>
>> I really don't like the side-effects of "/sys/kernel/mm/transparent_hugepage/enabled".
>> Like, enabling khugepaged+PMD for files.
>>
> 
> I am not a fan either, but I was not sure about another sysfs knob.
> 

Yeah, it would be better if we could avoid it. But the dependency on the
global toggle as it is today is a bit weird.

>>>
>>>
>>> Alternatives could be:
>>> 1. to add a file-backed khhugepaged config, but another sysfs?
>>
>> Maybe that would be the time to decouple file THP logic from
>> hugepage_global_enabled()/hugepage_global_always().
>>
>> In particular, as pagecache folio allocation doesn't really care about __thp_vma_allowable_orders() IIRC.
>>
>> I'm thinking about something like the following:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b2a6060b3c20..fb3a4fd84fe0 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -184,15 +184,6 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>                                                    forced_collapse);
>>
>>         if (!vma_is_anonymous(vma)) {
>> -               /*
>> -                * Enforce THP collapse requirements as necessary. Anonymous vmas
>> -                * were already handled in thp_vma_allowable_orders().
>> -                */
>> -               if (!forced_collapse &&
>> -                   (!hugepage_global_enabled() || (!(vm_flags & VM_HUGEPAGE) &&
>> -                                                   !hugepage_global_always())))
>> -                       return 0;
>> -
>>                 /*
>>                  * Trust that ->huge_fault() handlers know what they are doing
>>                  * in fault path.
> 
> Looks reasonable.

I don't think there is other interaction with FS and the global toggle
besides this and the one you are adjusting, right?

> 
>>
>> Then, we might indeed just want a khugepaged toggle whether to enable it at
>> all in files. (or just a toggle to disable khugeapged entirely?)
>>
> 
> I think hugepage_global_enabled() should be enough to decide whether khugepaged
> should run or not.

That would also be an option and would likely avoid other toggles.

So __thp_vma_allowable_orders() would allows THPs in any case for FS,
but hugepage_global_enabled() would control whether khugepaged runs (for
fs).

It gives less flexibility, but likely that's ok.

> 
> Currently, we have thp_vma_allowable_orders() to filter each VMAs and I do not
> see a reason to use hugepage_pmd_enabled() to guard khugepaged daemon. I am
> going to just remove hugepage_pmd_enabled() and replace it with
> hugepage_global_enabled(). Let me know your thoughts.

Can you send a quick draft of what you have in mind?

> 
> BTW, this conflicts with Patch 12 from Nico’s khugepaged for mTHP patchset.

Right. I guess it can be handled.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled()
  2026-04-14 18:14           ` David Hildenbrand (Arm)
@ 2026-04-14 18:25             ` Zi Yan
  0 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-14 18:25 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Matthew Wilcox, Nico Pache, Song Liu, Chris Mason, David Sterba,
	Alexander Viro, Christian Brauner, Jan Kara, Andrew Morton,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 14:14, David Hildenbrand (Arm) wrote:

> On 4/14/26 18:30, Zi Yan wrote:
>> On 14 Apr 2026, at 7:02, David Hildenbrand (Arm) wrote:
>>
>>> On 4/13/26 22:42, Zi Yan wrote:
>>>>
>>>>
>>>
>>> I assume such a change should come before patch #4, as it seems to affect
>>> the functionality that depended on CONFIG_READ_ONLY_THP_FOR_FS.
>>
>> If the goal is to have a knob of khugepaged for all files, yes I will move
>> the change before Patch 4.
>>
>>>
>>>> I thought about this, but it means khugepaged is turned on regardless of
>>>> anon and shmem configs. I tend to think the original code was a bug,
>>>> since enabling CONFIG_READ_ONLY_THP_FOR_FS would enable khugepaged all
>>>> the time.
>>>
>>> There might be some FS mapping to collapse? So that makes sense to
>>> some degree.
>>>
>>> I really don't like the side-effects of "/sys/kernel/mm/transparent_hugepage/enabled".
>>> Like, enabling khugepaged+PMD for files.
>>>
>>
>> I am not a fan either, but I was not sure about another sysfs knob.
>>
>
> Yeah, it would be better if we could avoid it. But the dependency on the
> global toggle as it is today is a bit weird.
>
>>>>
>>>>
>>>> Alternatives could be:
>>>> 1. to add a file-backed khhugepaged config, but another sysfs?
>>>
>>> Maybe that would be the time to decouple file THP logic from
>>> hugepage_global_enabled()/hugepage_global_always().
>>>
>>> In particular, as pagecache folio allocation doesn't really care about __thp_vma_allowable_orders() IIRC.
>>>
>>> I'm thinking about something like the following:
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index b2a6060b3c20..fb3a4fd84fe0 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -184,15 +184,6 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>                                                    forced_collapse);
>>>
>>>         if (!vma_is_anonymous(vma)) {
>>> -               /*
>>> -                * Enforce THP collapse requirements as necessary. Anonymous vmas
>>> -                * were already handled in thp_vma_allowable_orders().
>>> -                */
>>> -               if (!forced_collapse &&
>>> -                   (!hugepage_global_enabled() || (!(vm_flags & VM_HUGEPAGE) &&
>>> -                                                   !hugepage_global_always())))
>>> -                       return 0;
>>> -
>>>                 /*
>>>                  * Trust that ->huge_fault() handlers know what they are doing
>>>                  * in fault path.
>>
>> Looks reasonable.
>
> I don't think there is other interaction with FS and the global toggle
> besides this and the one you are adjusting, right?
>
>>
>>>
>>> Then, we might indeed just want a khugepaged toggle whether to enable it at
>>> all in files. (or just a toggle to disable khugeapged entirely?)
>>>
>>
>> I think hugepage_global_enabled() should be enough to decide whether khugepaged
>> should run or not.
>
> That would also be an option and would likely avoid other toggles.
>
> So __thp_vma_allowable_orders() would allows THPs in any case for FS,
> but hugepage_global_enabled() would control whether khugepaged runs (for
> fs).
>
> It gives less flexibility, but likely that's ok.
>
>>
>> Currently, we have thp_vma_allowable_orders() to filter each VMAs and I do not
>> see a reason to use hugepage_pmd_enabled() to guard khugepaged daemon. I am
>> going to just remove hugepage_pmd_enabled() and replace it with
>> hugepage_global_enabled(). Let me know your thoughts.
>
> Can you send a quick draft of what you have in mind?

From ee9e1c18b41111db7248db7fb64693b91e32255d Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Tue, 14 Apr 2026 14:17:31 -0400
Subject: [PATCH] mm/khugepaged: replace hugepage_pmd_enabled with
 hugepage_global_enabled

thp_vma_allowable_orders() is used to guard khugepaged scanning logic in
collapse_scan_mm_slot() based on enabled THP/mTHP orders by only allowing
PMD_ORDER. hugepage_pmd_enabled() is a duplication of it for khugepaged
start/stop control. Simplify the control by checking
hugepage_global_enabled() instead and let thp_vma_allowable_orders() filter
khugepaged scanning.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 36 ++++++------------------------------
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b8452dbdb043..459c486a5a75 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -406,30 +406,6 @@ static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
 		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
 }

-static bool hugepage_pmd_enabled(void)
-{
-	/*
-	 * We cover the anon, shmem and the file-backed case here; file-backed
-	 * hugepages, when configured in, are determined by the global control.
-	 * Anon pmd-sized hugepages are determined by the pmd-size control.
-	 * Shmem pmd-sized hugepages are also determined by its pmd-size control,
-	 * except when the global shmem_huge is set to SHMEM_HUGE_DENY.
-	 */
-	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-	    hugepage_global_enabled())
-		return true;
-	if (test_bit(PMD_ORDER, &huge_anon_orders_always))
-		return true;
-	if (test_bit(PMD_ORDER, &huge_anon_orders_madvise))
-		return true;
-	if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) &&
-	    hugepage_global_enabled())
-		return true;
-	if (IS_ENABLED(CONFIG_SHMEM) && shmem_hpage_pmd_enabled())
-		return true;
-	return false;
-}
-
 void __khugepaged_enter(struct mm_struct *mm)
 {
 	struct mm_slot *slot;
@@ -463,7 +439,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 			  vm_flags_t vm_flags)
 {
 	if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) &&
-	    hugepage_pmd_enabled()) {
+	    hugepage_global_enabled()) {
 		if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER))
 			__khugepaged_enter(vma->vm_mm);
 	}
@@ -2599,7 +2575,7 @@ static void collapse_scan_mm_slot(unsigned int progress_max,

 static int khugepaged_has_work(void)
 {
-	return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled();
+	return !list_empty(&khugepaged_scan.mm_head) && hugepage_global_enabled();
 }

 static int khugepaged_wait_event(void)
@@ -2672,7 +2648,7 @@ static void khugepaged_wait_work(void)
 		return;
 	}

-	if (hugepage_pmd_enabled())
+	if (hugepage_global_enabled())
 		wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
 }

@@ -2703,7 +2679,7 @@ void set_recommended_min_free_kbytes(void)
 	int nr_zones = 0;
 	unsigned long recommended_min;

-	if (!hugepage_pmd_enabled()) {
+	if (!hugepage_global_enabled()) {
 		calculate_min_free_kbytes();
 		goto update_wmarks;
 	}
@@ -2753,7 +2729,7 @@ int start_stop_khugepaged(void)
 	int err = 0;

 	mutex_lock(&khugepaged_mutex);
-	if (hugepage_pmd_enabled()) {
+	if (hugepage_global_enabled()) {
 		if (!khugepaged_thread)
 			khugepaged_thread = kthread_run(khugepaged, NULL,
 							"khugepaged");
@@ -2779,7 +2755,7 @@ int start_stop_khugepaged(void)
 void khugepaged_min_free_kbytes_update(void)
 {
 	mutex_lock(&khugepaged_mutex);
-	if (hugepage_pmd_enabled() && khugepaged_thread)
+	if (hugepage_global_enabled() && khugepaged_thread)
 		set_recommended_min_free_kbytes();
 	mutex_unlock(&khugepaged_mutex);
 }
-- 
2.43.0



>
>>
>> BTW, this conflicts with Patch 12 from Nico’s khugepaged for mTHP patchset.
>
> Right. I guess it can be handled.
>
> -- 
> Cheers,
>
> David


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (4 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:35   ` Matthew Wilcox
  2026-04-14 11:02   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space Zi Yan
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
large folio support, so that read-only THPs created in these FSes are not
seen by the FSes when the underlying fd becomes writable. Now read-only PMD
THPs only appear in a FS with large folio support and the supported orders
include PMD_ORDRE.

READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and
smp_mb() to prevent writes to a read-only THP and collapsing writable
folios into a THP. In collapse_file(), mapping->nr_thps is increased, then
smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while
do_dentry_open() first increases inode->i_writecount, then a full memory
fence, and if mapping->nr_thps > 0, all read-only THPs are truncated.

Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code,
since a dirty folio check has been added after try_to_unmap() and
try_to_unmap_flush() in collapse_file() to make sure no writable folio can
be collapsed.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 fs/open.c               | 27 ---------------------------
 include/linux/pagemap.h | 29 -----------------------------
 mm/filemap.c            |  1 -
 mm/huge_memory.c        |  1 -
 mm/khugepaged.c         | 28 ----------------------------
 5 files changed, 86 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 91f1139591ab..cef382d9d8b8 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -970,33 +970,6 @@ static int do_dentry_open(struct file *f,
 	if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT))
 		return -EINVAL;
 
-	/*
-	 * XXX: Huge page cache doesn't support writing yet. Drop all page
-	 * cache for this file before processing writes.
-	 */
-	if (f->f_mode & FMODE_WRITE) {
-		/*
-		 * Depends on full fence from get_write_access() to synchronize
-		 * against collapse_file() regarding i_writecount and nr_thps
-		 * updates. Ensures subsequent insertion of THPs into the page
-		 * cache will fail.
-		 */
-		if (filemap_nr_thps(inode->i_mapping)) {
-			struct address_space *mapping = inode->i_mapping;
-
-			filemap_invalidate_lock(inode->i_mapping);
-			/*
-			 * unmap_mapping_range just need to be called once
-			 * here, because the private pages is not need to be
-			 * unmapped mapping (e.g. data segment of dynamic
-			 * shared libraries here).
-			 */
-			unmap_mapping_range(mapping, 0, 0, 0);
-			truncate_inode_pages(mapping, 0);
-			filemap_invalidate_unlock(inode->i_mapping);
-		}
-	}
-
 	return 0;
 
 cleanup_all:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ec442af3f886..dad3f8846cdc 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -530,35 +530,6 @@ static inline size_t mapping_max_folio_size(const struct address_space *mapping)
 	return PAGE_SIZE << mapping_max_folio_order(mapping);
 }
 
-static inline int filemap_nr_thps(const struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	return atomic_read(&mapping->nr_thps);
-#else
-	return 0;
-#endif
-}
-
-static inline void filemap_nr_thps_inc(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	if (!mapping_large_folio_support(mapping))
-		atomic_inc(&mapping->nr_thps);
-#else
-	WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
-static inline void filemap_nr_thps_dec(struct address_space *mapping)
-{
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	if (!mapping_large_folio_support(mapping))
-		atomic_dec(&mapping->nr_thps);
-#else
-	WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0);
-#endif
-}
-
 struct address_space *folio_mapping(const struct folio *folio);
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index c568d9058ff8..e7da925ae310 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -189,7 +189,6 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 			lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
 	} else if (folio_test_pmd_mappable(folio)) {
 		lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
-		filemap_nr_thps_dec(mapping);
 	}
 	if (test_bit(AS_KERNEL_FILE, &folio->mapping->flags))
 		mod_node_page_state(folio_pgdat(folio),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a22bb2364bdc..5c9ee900ed90 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3926,7 +3926,6 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 				} else {
 					lruvec_stat_mod_folio(folio,
 							NR_FILE_THPS, -nr);
-					filemap_nr_thps_dec(mapping);
 				}
 			}
 		}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 79c985d7fa03..afd52e4c7ccd 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2092,21 +2092,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		goto xa_unlocked;
 	}
 
-	if (!is_shmem) {
-		filemap_nr_thps_inc(mapping);
-		/*
-		 * Paired with the fence in do_dentry_open() -> get_write_access()
-		 * to ensure i_writecount is up to date and the update to nr_thps
-		 * is visible. Ensures the page cache will be truncated if the
-		 * file is opened writable.
-		 */
-		smp_mb();
-		if (inode_is_open_for_write(mapping->host)) {
-			result = SCAN_FAIL;
-			filemap_nr_thps_dec(mapping);
-		}
-	}
-
 xa_locked:
 	xas_unlock_irq(&xas);
 xa_unlocked:
@@ -2302,19 +2287,6 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		folio_putback_lru(folio);
 		folio_put(folio);
 	}
-	/*
-	 * Undo the updates of filemap_nr_thps_inc for non-SHMEM
-	 * file only. This undo is not needed unless failure is
-	 * due to SCAN_COPY_MC.
-	 */
-	if (!is_shmem && result == SCAN_COPY_MC) {
-		filemap_nr_thps_dec(mapping);
-		/*
-		 * Paired with the fence in do_dentry_open() -> get_write_access()
-		 * to ensure the update to nr_thps is visible.
-		 */
-		smp_mb();
-	}
 
 	new_folio->mapping = NULL;
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
@ 2026-04-13 20:35   ` Matthew Wilcox
  2026-04-14 11:02   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:35 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:24PM -0400, Zi Yan wrote:
> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
> large folio support, so that read-only THPs created in these FSes are not
> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
> THPs only appear in a FS with large folio support and the supported orders
> include PMD_ORDRE.
> 
> READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and
> smp_mb() to prevent writes to a read-only THP and collapsing writable
> folios into a THP. In collapse_file(), mapping->nr_thps is increased, then
> smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while
> do_dentry_open() first increases inode->i_writecount, then a full memory
> fence, and if mapping->nr_thps > 0, all read-only THPs are truncated.
> 
> Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code,
> since a dirty folio check has been added after try_to_unmap() and
> try_to_unmap_flush() in collapse_file() to make sure no writable folio can
> be collapsed.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  fs/open.c               | 27 ---------------------------
>  include/linux/pagemap.h | 29 -----------------------------
>  mm/filemap.c            |  1 -
>  mm/huge_memory.c        |  1 -
>  mm/khugepaged.c         | 28 ----------------------------
>  5 files changed, 86 deletions(-)

This is great.

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users
  2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
  2026-04-13 20:35   ` Matthew Wilcox
@ 2026-04-14 11:02   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:02 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without
> large folio support, so that read-only THPs created in these FSes are not
> seen by the FSes when the underlying fd becomes writable. Now read-only PMD
> THPs only appear in a FS with large folio support and the supported orders
> include PMD_ORDRE.
> 
> READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and
> smp_mb() to prevent writes to a read-only THP and collapsing writable
> folios into a THP. In collapse_file(), mapping->nr_thps is increased, then
> smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while
> do_dentry_open() first increases inode->i_writecount, then a full memory
> fence, and if mapping->nr_thps > 0, all read-only THPs are truncated.
> 
> Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code,
> since a dirty folio check has been added after try_to_unmap() and
> try_to_unmap_flush() in collapse_file() to make sure no writable folio can
> be collapsed.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (5 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:38   ` Matthew Wilcox
  2026-04-13 19:20 ` [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
is no longer needed. Remove it.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 fs/inode.c         | 3 ---
 include/linux/fs.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index cc12b68e021b..16ab0a345419 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -280,9 +280,6 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
 	mapping->flags = 0;
 	mapping->wb_err = 0;
 	atomic_set(&mapping->i_mmap_writable, 0);
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	atomic_set(&mapping->nr_thps, 0);
-#endif
 	mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
 	mapping->i_private_data = NULL;
 	mapping->writeback_index = 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0bdccfa70b44..35875696fb4c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -455,7 +455,6 @@ extern const struct address_space_operations empty_aops;
  *   memory mappings.
  * @gfp_mask: Memory allocation flags to use for allocating pages.
  * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
- * @nr_thps: Number of THPs in the pagecache (non-shmem only).
  * @i_mmap: Tree of private and shared mappings.
  * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
  * @nrpages: Number of page entries, protected by the i_pages lock.
@@ -473,10 +472,6 @@ struct address_space {
 	struct rw_semaphore	invalidate_lock;
 	gfp_t			gfp_mask;
 	atomic_t		i_mmap_writable;
-#ifdef CONFIG_READ_ONLY_THP_FOR_FS
-	/* number of thp, only for non-shmem files */
-	atomic_t		nr_thps;
-#endif
 	struct rb_root_cached	i_mmap;
 	unsigned long		nrpages;
 	pgoff_t			writeback_index;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space
  2026-04-13 19:20 ` [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space Zi Yan
@ 2026-04-13 20:38   ` Matthew Wilcox
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:38 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:25PM -0400, Zi Yan wrote:
> filemap_nr_thps*() are removed, the related field, address_space->nr_thps,
> is no longer needed. Remove it.

... this shrinks struct address_space by 8 bytes on 64-bit systems
which may increase the number of inodes we can cache.

> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (6 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:41   ` Matthew Wilcox
  2026-04-13 19:20 ` [PATCH 7.2 v2 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Without READ_ONLY_THP_FOR_FS, large file-backed folios cannot be created by
a FS without large folio support. The check is no longer needed.

Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 30 +++---------------------------
 1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5c9ee900ed90..4de38c6c6d06 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3821,33 +3821,9 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
 	if (!folio->mapping && !folio_test_anon(folio))
 		return -EBUSY;
 
-	if (folio_test_anon(folio)) {
-		/* order-1 is not supported for anonymous THP. */
-		if (new_order == 1)
-			return -EINVAL;
-	} else if (split_type == SPLIT_TYPE_NON_UNIFORM || new_order) {
-		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-		    !mapping_large_folio_support(folio->mapping)) {
-			/*
-			 * We can always split a folio down to a single page
-			 * (new_order == 0) uniformly.
-			 *
-			 * For any other scenario
-			 *   a) uniform split targeting a large folio
-			 *      (new_order > 0)
-			 *   b) any non-uniform split
-			 * we must confirm that the file system supports large
-			 * folios.
-			 *
-			 * Note that we might still have THPs in such
-			 * mappings, which is created from khugepaged when
-			 * CONFIG_READ_ONLY_THP_FOR_FS is enabled. But in that
-			 * case, the mapping does not actually support large
-			 * folios properly.
-			 */
-			return -EINVAL;
-		}
-	}
+	/* order-1 is not supported for anonymous THP. */
+	if (folio_test_anon(folio) && new_order == 1)
+		return -EINVAL;
 
 	/*
 	 * swapcache folio could only be split to order 0
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  2026-04-13 19:20 ` [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-13 20:41   ` Matthew Wilcox
  2026-04-13 20:46     ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:41 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:26PM -0400, Zi Yan wrote:
> +	/* order-1 is not supported for anonymous THP. */

It's also not supported for file folios, right?  Or do we just skip
adding order-1 file folios to the deferred split list?  Either way,
we need to correct _something_, whether it's the code or the comment.

> +	if (folio_test_anon(folio) && new_order == 1)
> +		return -EINVAL;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  2026-04-13 20:41   ` Matthew Wilcox
@ 2026-04-13 20:46     ` Zi Yan
  2026-04-14 11:03       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 20:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 13 Apr 2026, at 16:41, Matthew Wilcox wrote:

> On Mon, Apr 13, 2026 at 03:20:26PM -0400, Zi Yan wrote:
>> +	/* order-1 is not supported for anonymous THP. */
>
> It's also not supported for file folios, right?  Or do we just skip
> adding order-1 file folios to the deferred split list?  Either way,

order <= 1 folios are not added to deferred split list. See
deferred_split_folio().

IIUC, we have order-1 file folios but not order-1 anon folios.

> we need to correct _something_, whether it's the code or the comment.
>
>> +	if (folio_test_anon(folio) && new_order == 1)
>> +		return -EINVAL;


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
  2026-04-13 20:46     ` Zi Yan
@ 2026-04-14 11:03       ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:03 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 22:46, Zi Yan wrote:
> On 13 Apr 2026, at 16:41, Matthew Wilcox wrote:
> 
>> On Mon, Apr 13, 2026 at 03:20:26PM -0400, Zi Yan wrote:
>>> +	/* order-1 is not supported for anonymous THP. */
>>
>> It's also not supported for file folios, right?  Or do we just skip
>> adding order-1 file folios to the deferred split list?  Either way,
> 
> order <= 1 folios are not added to deferred split list. See
> deferred_split_folio().
> 
> IIUC, we have order-1 file folios but not order-1 anon folios.

And only anon folios are ever added to the deferred split list.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio()
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (7 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 19:20 ` [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or
not. folio_split() can be used on a FS with large folio support without
worrying about getting a THP on a FS without large folio support.

When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can
appear in a FS without large folio support after khugepaged or
madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(),
such a PMD large pagecache folio is split and if the FS does not support
large folio, it needs to be split to order-0 ones and could not be split
non uniformly to ones with various orders. try_folio_split_to_order() was
added to handle this situation by checking folio_check_splittable(...,
SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to
READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now
READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created
with FSes supporting large folio, this function is no longer needed and all
large pagecache folios can be split non uniformly.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/huge_mm.h | 25 ++-----------------------
 mm/truncate.c           |  8 ++++----
 2 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2949e5acff35..164d6edf1b65 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -389,27 +389,6 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
 	return split_huge_page_to_list_to_order(page, NULL, new_order);
 }
 
-/**
- * try_folio_split_to_order() - try to split a @folio at @page to @new_order
- * using non uniform split.
- * @folio: folio to be split
- * @page: split to @new_order at the given page
- * @new_order: the target split order
- *
- * Try to split a @folio at @page using non uniform split to @new_order, if
- * non uniform split is not supported, fall back to uniform split. After-split
- * folios are put back to LRU list. Use min_order_for_split() to get the lower
- * bound of @new_order.
- *
- * Return: 0 - split is successful, otherwise split failed.
- */
-static inline int try_folio_split_to_order(struct folio *folio,
-		struct page *page, unsigned int new_order)
-{
-	if (folio_check_splittable(folio, new_order, SPLIT_TYPE_NON_UNIFORM))
-		return split_huge_page_to_order(&folio->page, new_order);
-	return folio_split(folio, new_order, page, NULL);
-}
 static inline int split_huge_page(struct page *page)
 {
 	return split_huge_page_to_list_to_order(page, NULL, 0);
@@ -642,8 +621,8 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis
 	return -EINVAL;
 }
 
-static inline int try_folio_split_to_order(struct folio *folio,
-		struct page *page, unsigned int new_order)
+static inline int folio_split(struct folio *folio, unsigned int new_order,
+		struct page *page, struct list_head *list)
 {
 	VM_WARN_ON_ONCE_FOLIO(1, folio);
 	return -EINVAL;
diff --git a/mm/truncate.c b/mm/truncate.c
index 2931d66c16d0..6973b05ec4b8 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -177,7 +177,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
 	return 0;
 }
 
-static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
+static int folio_split_or_unmap(struct folio *folio, struct page *split_at,
 				    unsigned long min_order)
 {
 	enum ttu_flags ttu_flags =
@@ -186,7 +186,7 @@ static int try_folio_split_or_unmap(struct folio *folio, struct page *split_at,
 		TTU_IGNORE_MLOCK;
 	int ret;
 
-	ret = try_folio_split_to_order(folio, split_at, min_order);
+	ret = folio_split(folio, min_order, split_at, NULL);
 
 	/*
 	 * If the split fails, unmap the folio, so it will be refaulted
@@ -252,7 +252,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 
 	min_order = mapping_min_folio_order(folio->mapping);
 	split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
-	if (!try_folio_split_or_unmap(folio, split_at, min_order)) {
+	if (!folio_split_or_unmap(folio, split_at, min_order)) {
 		/*
 		 * try to split at offset + length to make sure folios within
 		 * the range can be dropped, especially to avoid memory waste
@@ -279,7 +279,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 		/* make sure folio2 is large and does not change its mapping */
 		if (folio_test_large(folio2) &&
 		    folio2->mapping == folio->mapping)
-			try_folio_split_or_unmap(folio2, split_at2, min_order);
+			folio_split_or_unmap(folio2, split_at2, min_order);
 
 		folio_unlock(folio2);
 out:
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (8 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-14 11:06   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
  2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

READ_ONLY_THP_FOR_FS is no longer present, remove related comment.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/defrag.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index 7e2db5d3a4d4..a8d49d9ca981 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -860,9 +860,6 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t
 		return folio;
 
 	/*
-	 * Since we can defragment files opened read-only, we can encounter
-	 * transparent huge pages here (see CONFIG_READ_ONLY_THP_FOR_FS).
-	 *
 	 * The IO for such large folios is not fully tested, thus return
 	 * an error to reject such folios unless it's an experimental build.
 	 *
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
  2026-04-13 19:20 ` [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-14 11:06   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:06 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> READ_ONLY_THP_FOR_FS is no longer present, remove related comment.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/defrag.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
> index 7e2db5d3a4d4..a8d49d9ca981 100644
> --- a/fs/btrfs/defrag.c
> +++ b/fs/btrfs/defrag.c
> @@ -860,9 +860,6 @@ static struct folio *defrag_prepare_one_folio(struct btrfs_inode *inode, pgoff_t
>  		return folio;
>  
>  	/*
> -	 * Since we can defragment files opened read-only, we can encounter
> -	 * transparent huge pages here (see CONFIG_READ_ONLY_THP_FOR_FS).
> -	 *
>  	 * The IO for such large folios is not fully tested, thus return
>  	 * an error to reject such folios unless it's an experimental build.
>  	 *

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (9 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-14 11:06   ` David Hildenbrand (Arm)
  2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
  11 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Change the requirement to a file system with large folio support and the
supported order needs to include PMD_ORDER.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 tools/testing/selftests/mm/khugepaged.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 3fe7ef04ac62..bdcdd31beb1e 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -1086,8 +1086,8 @@ static void usage(void)
 	fprintf(stderr, "\t<context>\t: [all|khugepaged|madvise]\n");
 	fprintf(stderr, "\t<mem_type>\t: [all|anon|file|shmem]\n");
 	fprintf(stderr, "\n\t\"file,all\" mem_type requires [dir] argument\n");
-	fprintf(stderr, "\n\t\"file,all\" mem_type requires kernel built with\n");
-	fprintf(stderr,	"\tCONFIG_READ_ONLY_THP_FOR_FS=y\n");
+	fprintf(stderr, "\n\t\"file,all\" mem_type requires a file system\n");
+	fprintf(stderr,	"\twith large folio support (order >= PMD order)\n");
 	fprintf(stderr, "\n\tif [dir] is a (sub)directory of a tmpfs mount, tmpfs must be\n");
 	fprintf(stderr,	"\tmounted with huge=advise option for khugepaged tests to work\n");
 	fprintf(stderr,	"\n\tSupported Options:\n");
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
  2026-04-13 19:20 ` [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
@ 2026-04-14 11:06   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:06 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> Change the requirement to a file system with large folio support and the
> supported order needs to include PMD_ORDER.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
                   ` (10 preceding siblings ...)
  2026-04-13 19:20 ` [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
@ 2026-04-13 19:20 ` Zi Yan
  2026-04-13 20:47   ` Matthew Wilcox
  2026-04-14 11:07   ` David Hildenbrand (Arm)
  11 siblings, 2 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-13 19:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Zi Yan, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

Any file system with large folio support and the supported orders include
PMD_ORDER can be used.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 tools/testing/selftests/mm/guard-regions.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
index 48e8b1539be3..13e77e48b6ef 100644
--- a/tools/testing/selftests/mm/guard-regions.c
+++ b/tools/testing/selftests/mm/guard-regions.c
@@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
 
 	/*
 	 * We must close and re-open local-file backed as read-only for
-	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
+	 * MADV_COLLAPSE to work.
 	 */
 	if (variant->backing == LOCAL_FILE_BACKED) {
 		ASSERT_EQ(close(self->fd), 0);
@@ -2237,9 +2237,10 @@ TEST_F(guard_regions, collapse)
 	/*
 	 * Now collapse the entire region. This should fail in all cases.
 	 *
-	 * The madvise() call will also fail if CONFIG_READ_ONLY_THP_FOR_FS is
-	 * not set for the local file case, but we can't differentiate whether
-	 * this occurred or if the collapse was rightly rejected.
+	 * The madvise() call will also fail if the file system does not support
+	 * large folio or the supported orders do not include PMD_ORDER for the
+	 * local file case, but we can't differentiate whether this occurred or
+	 * if the collapse was rightly rejected.
 	 */
 	EXPECT_NE(madvise(ptr, size, MADV_COLLAPSE), 0);
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
@ 2026-04-13 20:47   ` Matthew Wilcox
  2026-04-13 20:51     ` Zi Yan
  2026-04-14 11:07   ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 20:47 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 03:20:30PM -0400, Zi Yan wrote:
> +++ b/tools/testing/selftests/mm/guard-regions.c
> @@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
>  
>  	/*
>  	 * We must close and re-open local-file backed as read-only for
> -	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
> +	 * MADV_COLLAPSE to work.

Is this true?  Does MADV_COLLAPSE refuse to work on writable files?
Should we delete some code here as well as fix the comment?  ;-)



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-13 20:47   ` Matthew Wilcox
@ 2026-04-13 20:51     ` Zi Yan
  2026-04-13 22:28       ` Matthew Wilcox
  0 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-13 20:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On 13 Apr 2026, at 16:47, Matthew Wilcox wrote:

> On Mon, Apr 13, 2026 at 03:20:30PM -0400, Zi Yan wrote:
>> +++ b/tools/testing/selftests/mm/guard-regions.c
>> @@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
>>
>>  	/*
>>  	 * We must close and re-open local-file backed as read-only for
>> -	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
>> +	 * MADV_COLLAPSE to work.
>
> Is this true?  Does MADV_COLLAPSE refuse to work on writable files?
> Should we delete some code here as well as fix the comment?  ;-)

file_thp_enabled() used by __thp_vma_allowable_orders() refuses
writable files with inode_is_open_for_write(). That should prevent
MADV_COLLAPSE from working on writable files.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-13 20:51     ` Zi Yan
@ 2026-04-13 22:28       ` Matthew Wilcox
  2026-04-14 11:09         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-13 22:28 UTC (permalink / raw)
  To: Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	linux-btrfs, linux-kernel, linux-fsdevel, linux-mm,
	linux-kselftest

On Mon, Apr 13, 2026 at 04:51:28PM -0400, Zi Yan wrote:
> On 13 Apr 2026, at 16:47, Matthew Wilcox wrote:
> 
> > On Mon, Apr 13, 2026 at 03:20:30PM -0400, Zi Yan wrote:
> >> +++ b/tools/testing/selftests/mm/guard-regions.c
> >> @@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
> >>
> >>  	/*
> >>  	 * We must close and re-open local-file backed as read-only for
> >> -	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
> >> +	 * MADV_COLLAPSE to work.
> >
> > Is this true?  Does MADV_COLLAPSE refuse to work on writable files?
> > Should we delete some code here as well as fix the comment?  ;-)
> 
> file_thp_enabled() used by __thp_vma_allowable_orders() refuses
> writable files with inode_is_open_for_write(). That should prevent
> MADV_COLLAPSE from working on writable files.

That sounds like more code that was added for RO_THP and should be
deleted?  See commit e6be37b2e7bd


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-13 22:28       ` Matthew Wilcox
@ 2026-04-14 11:09         ` David Hildenbrand (Arm)
  2026-04-14 16:45           ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:09 UTC (permalink / raw)
  To: Matthew Wilcox, Zi Yan
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 4/14/26 00:28, Matthew Wilcox wrote:
> On Mon, Apr 13, 2026 at 04:51:28PM -0400, Zi Yan wrote:
>> On 13 Apr 2026, at 16:47, Matthew Wilcox wrote:
>>
>>>
>>> Is this true?  Does MADV_COLLAPSE refuse to work on writable files?
>>> Should we delete some code here as well as fix the comment?  ;-)
>>
>> file_thp_enabled() used by __thp_vma_allowable_orders() refuses
>> writable files with inode_is_open_for_write(). That should prevent
>> MADV_COLLAPSE from working on writable files.
> 
> That sounds like more code that was added for RO_THP and should be
> deleted?  See commit e6be37b2e7bd

Sounds like something to implement on top of this patch set?

But with the added dirty checks in patch #2, maybe it's trivial to do it
in this patchset already.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-14 11:09         ` David Hildenbrand (Arm)
@ 2026-04-14 16:45           ` Zi Yan
  2026-04-14 17:40             ` Matthew Wilcox
  0 siblings, 1 reply; 50+ messages in thread
From: Zi Yan @ 2026-04-14 16:45 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Matthew Wilcox
  Cc: Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 7:09, David Hildenbrand (Arm) wrote:

> On 4/14/26 00:28, Matthew Wilcox wrote:
>> On Mon, Apr 13, 2026 at 04:51:28PM -0400, Zi Yan wrote:
>>> On 13 Apr 2026, at 16:47, Matthew Wilcox wrote:
>>>
>>>>
>>>> Is this true?  Does MADV_COLLAPSE refuse to work on writable files?
>>>> Should we delete some code here as well as fix the comment?  ;-)
>>>
>>> file_thp_enabled() used by __thp_vma_allowable_orders() refuses
>>> writable files with inode_is_open_for_write(). That should prevent
>>> MADV_COLLAPSE from working on writable files.
>>
>> That sounds like more code that was added for RO_THP and should be
>> deleted?  See commit e6be37b2e7bd
>
> Sounds like something to implement on top of this patch set?
>
> But with the added dirty checks in patch #2, maybe it's trivial to do it
> in this patchset already.

Can you two elaborate? The code from commit e6be37b2e7bd is moved around
and replaced by functions like __thp_vma_allowable_orders().

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-14 16:45           ` Zi Yan
@ 2026-04-14 17:40             ` Matthew Wilcox
  2026-04-14 17:53               ` Zi Yan
  0 siblings, 1 reply; 50+ messages in thread
From: Matthew Wilcox @ 2026-04-14 17:40 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand (Arm),
	Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On Tue, Apr 14, 2026 at 12:45:18PM -0400, Zi Yan wrote:
> On 14 Apr 2026, at 7:09, David Hildenbrand (Arm) wrote:
> > On 4/14/26 00:28, Matthew Wilcox wrote:
> >> That sounds like more code that was added for RO_THP and should be
> >> deleted?  See commit e6be37b2e7bd
> >
> > Sounds like something to implement on top of this patch set?
> >
> > But with the added dirty checks in patch #2, maybe it's trivial to do it
> > in this patchset already.
> 
> Can you two elaborate? The code from commit e6be37b2e7bd is moved around
> and replaced by functions like __thp_vma_allowable_orders().

What I was trying to say is that this restriction was added because
of ROTHP.  Since we're getting rid of ROTHP, we can remove this
restriction.  No matter where the code has now migrated to.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-14 17:40             ` Matthew Wilcox
@ 2026-04-14 17:53               ` Zi Yan
  0 siblings, 0 replies; 50+ messages in thread
From: Zi Yan @ 2026-04-14 17:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Hildenbrand (Arm),
	Song Liu, Chris Mason, David Sterba, Alexander Viro,
	Christian Brauner, Jan Kara, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, linux-btrfs,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest

On 14 Apr 2026, at 13:40, Matthew Wilcox wrote:

> On Tue, Apr 14, 2026 at 12:45:18PM -0400, Zi Yan wrote:
>> On 14 Apr 2026, at 7:09, David Hildenbrand (Arm) wrote:
>>> On 4/14/26 00:28, Matthew Wilcox wrote:
>>>> That sounds like more code that was added for RO_THP and should be
>>>> deleted?  See commit e6be37b2e7bd
>>>
>>> Sounds like something to implement on top of this patch set?
>>>
>>> But with the added dirty checks in patch #2, maybe it's trivial to do it
>>> in this patchset already.
>>
>> Can you two elaborate? The code from commit e6be37b2e7bd is moved around
>> and replaced by functions like __thp_vma_allowable_orders().
>
> What I was trying to say is that this restriction was added because
> of ROTHP.  Since we're getting rid of ROTHP, we can remove this
> restriction.  No matter where the code has now migrated to.

Got it. Thanks. Will clean this up.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions
  2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
  2026-04-13 20:47   ` Matthew Wilcox
@ 2026-04-14 11:07   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 50+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-14 11:07 UTC (permalink / raw)
  To: Zi Yan, Matthew Wilcox (Oracle), Song Liu
  Cc: Chris Mason, David Sterba, Alexander Viro, Christian Brauner,
	Jan Kara, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Shuah Khan, linux-btrfs, linux-kernel,
	linux-fsdevel, linux-mm, linux-kselftest

On 4/13/26 21:20, Zi Yan wrote:
> Any file system with large folio support and the supported orders include
> PMD_ORDER can be used.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  tools/testing/selftests/mm/guard-regions.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c
> index 48e8b1539be3..13e77e48b6ef 100644
> --- a/tools/testing/selftests/mm/guard-regions.c
> +++ b/tools/testing/selftests/mm/guard-regions.c
> @@ -2205,7 +2205,7 @@ TEST_F(guard_regions, collapse)
>  
>  	/*
>  	 * We must close and re-open local-file backed as read-only for
> -	 * CONFIG_READ_ONLY_THP_FOR_FS to work.
> +	 * MADV_COLLAPSE to work.
>  	 */
>  	if (variant->backing == LOCAL_FILE_BACKED) {
>  		ASSERT_EQ(close(self->fd), 0);
> @@ -2237,9 +2237,10 @@ TEST_F(guard_regions, collapse)
>  	/*
>  	 * Now collapse the entire region. This should fail in all cases.
>  	 *
> -	 * The madvise() call will also fail if CONFIG_READ_ONLY_THP_FOR_FS is
> -	 * not set for the local file case, but we can't differentiate whether
> -	 * this occurred or if the collapse was rightly rejected.
> +	 * The madvise() call will also fail if the file system does not support
> +	 * large folio or the supported orders do not include PMD_ORDER for the

"folios"

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2026-04-14 18:25 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-13 20:20   ` Matthew Wilcox
2026-04-13 20:34     ` Zi Yan
2026-04-14 10:19       ` David Hildenbrand (Arm)
2026-04-14 10:20       ` David Hildenbrand (Arm)
2026-04-14 10:29   ` David Hildenbrand (Arm)
2026-04-14 15:37     ` Lance Yang
2026-04-14 15:43       ` Lance Yang
2026-04-14 15:59         ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
2026-04-13 20:23   ` Matthew Wilcox
2026-04-13 20:28     ` Zi Yan
2026-04-14 10:38   ` David Hildenbrand (Arm)
2026-04-14 15:55     ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-14 10:40   ` David Hildenbrand (Arm)
2026-04-14 15:59     ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-14 10:40   ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-13 20:33   ` Matthew Wilcox
2026-04-13 20:42     ` Zi Yan
2026-04-14 11:02       ` David Hildenbrand (Arm)
2026-04-14 16:30         ` Zi Yan
2026-04-14 18:14           ` David Hildenbrand (Arm)
2026-04-14 18:25             ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-13 20:35   ` Matthew Wilcox
2026-04-14 11:02   ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-13 20:38   ` Matthew Wilcox
2026-04-13 19:20 ` [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-13 20:41   ` Matthew Wilcox
2026-04-13 20:46     ` Zi Yan
2026-04-14 11:03       ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-14 11:06   ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-14 11:06   ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
2026-04-13 20:47   ` Matthew Wilcox
2026-04-13 20:51     ` Zi Yan
2026-04-13 22:28       ` Matthew Wilcox
2026-04-14 11:09         ` David Hildenbrand (Arm)
2026-04-14 16:45           ` Zi Yan
2026-04-14 17:40             ` Matthew Wilcox
2026-04-14 17:53               ` Zi Yan
2026-04-14 11:07   ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox