linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: thp: deny THP for files on anonymous inodes
@ 2026-02-14  0:15 Deepanshu Kartikey
  2026-02-14 11:27 ` Lance Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Deepanshu Kartikey @ 2026-02-14  0:15 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, i, shy828301,
	ackerleytng
  Cc: linux-mm, linux-kernel, Deepanshu Kartikey,
	syzbot+33a04338019ac7e43a44, stable, Deepanshu Kartikey

file_thp_enabled() incorrectly allows THP for files on anonymous inodes
(e.g. guest_memfd and secretmem). These files are created via
alloc_file_pseudo(), which does not call get_write_access() and leaves
inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
true, they appear as read-only regular files when
CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
collapse.

Anonymous inodes can never pass the inode_is_open_for_write() check
since their i_writecount is never incremented through the normal VFS
open path. The right thing to do is to exclude them from THP eligibility
altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
filesystem files (e.g. shared libraries), not for pseudo-filesystem
inodes.

For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
large folios in the page cache via the collapse path, but the
guest_memfd fault handler does not support large folios. This triggers
WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().

For secretmem, collapse_file() tries to copy page contents through the
direct map, but secretmem pages are removed from the direct map. This
can result in a kernel crash:

    BUG: unable to handle page fault for address: ffff88810284d000
    RIP: 0010:memcpy_orig+0x16/0x130
    Call Trace:
     collapse_file
     hpage_collapse_scan_file
     madvise_collapse

Secretmem is not affected by the crash on upstream as the memory failure
recovery handles the failed copy gracefully, but it still triggers
confusing false memory failure reports:

    Memory failure: 0x106d96f: recovery action for clean unevictable
    LRU page: Recovered

Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
anonymous inode files.

Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
---
v2:
  - Use IS_ANON_FILE(inode) to deny THP for all anonymous inode files
    instead of checking for specific subsystems (David Hildenbrand)
  - Updated Fixes tag to 7fbb5e188248 which removed the VM_EXEC
    requirement that accidentally protected secretmem
  - Expanded commit message with implications for both guest_memfd
    and secretmem
---
 mm/huge_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..d3beddd8cc30 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 
 	inode = file_inode(vma->vm_file);
 
+	if (IS_ANON_FILE(inode))
+		return false;
+
 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
  2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
@ 2026-02-14 11:27 ` Lance Yang
  2026-02-15 22:48   ` Ackerley Tng
  2026-02-15 12:41 ` David Hildenbrand (Arm)
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Lance Yang @ 2026-02-14 11:27 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: baolin.wang, lorenzo.stoakes, ackerleytng, linux-mm, npache,
	linux-kernel, Liam.Howlett, syzbot+33a04338019ac7e43a44,
	ryan.roberts, stable, ziy, dev.jain, i, baohua, shy828301, akpm,
	david



On 2026/2/14 08:15, Deepanshu Kartikey wrote:
> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
> (e.g. guest_memfd and secretmem). These files are created via
> alloc_file_pseudo(), which does not call get_write_access() and leaves
> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
> true, they appear as read-only regular files when
> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
> collapse.
> 
> Anonymous inodes can never pass the inode_is_open_for_write() check
> since their i_writecount is never incremented through the normal VFS
> open path. The right thing to do is to exclude them from THP eligibility
> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
> filesystem files (e.g. shared libraries), not for pseudo-filesystem
> inodes.
> 
> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
> large folios in the page cache via the collapse path, but the
> guest_memfd fault handler does not support large folios. This triggers
> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
> 
> For secretmem, collapse_file() tries to copy page contents through the
> direct map, but secretmem pages are removed from the direct map. This
> can result in a kernel crash:

Good catch, thanks!

For secretmem, file_thp_enabled() can incorrectly return true
(i_writecount=0, S_ISREG=1), so the mapping becomes eligible for file
THP collapse ...

However, if any folio is dirty, collapse bails out early with
SCAN_PAGE_DIRTY_OR_WRITEBACK, as secretmem doesn't support normal
writeback, IIUC.

> 
>      BUG: unable to handle page fault for address: ffff88810284d000
>      RIP: 0010:memcpy_orig+0x16/0x130
>      Call Trace:
>       collapse_file
>       hpage_collapse_scan_file
>       madvise_collapse
> 
> Secretmem is not affected by the crash on upstream as the memory failure
> recovery handles the failed copy gracefully, but it still triggers
> confusing false memory failure reports:
> 
>      Memory failure: 0x106d96f: recovery action for clean unevictable
>      LRU page: Recovered

Right. On my setup, that would hit SCAN_COPY_MC in 
hpage_collapse_scan_file()
rather than a hard crash.

> 
> Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
> anonymous inode files.
> 
> Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
> ---

Confirmed that file_thp_enabled() is working as expected now with this fix.

Tested-by: Lance Yang <lance.yang@linux.dev>


Cheers,
Lance


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
  2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
  2026-02-14 11:27 ` Lance Yang
@ 2026-02-15 12:41 ` David Hildenbrand (Arm)
  2026-02-15 20:29 ` Barry Song
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-15 12:41 UTC (permalink / raw)
  To: Deepanshu Kartikey, akpm, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	i, shy828301, ackerleytng
  Cc: linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, stable

On 2/14/26 01:15, Deepanshu Kartikey wrote:
> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
> (e.g. guest_memfd and secretmem). These files are created via
> alloc_file_pseudo(), which does not call get_write_access() and leaves
> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
> true, they appear as read-only regular files when
> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
> collapse.
> 
> Anonymous inodes can never pass the inode_is_open_for_write() check
> since their i_writecount is never incremented through the normal VFS
> open path. The right thing to do is to exclude them from THP eligibility
> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
> filesystem files (e.g. shared libraries), not for pseudo-filesystem
> inodes.
> 
> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
> large folios in the page cache via the collapse path, but the
> guest_memfd fault handler does not support large folios. This triggers
> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
> 
> For secretmem, collapse_file() tries to copy page contents through the
> direct map, but secretmem pages are removed from the direct map. This
> can result in a kernel crash:
> 
>      BUG: unable to handle page fault for address: ffff88810284d000
>      RIP: 0010:memcpy_orig+0x16/0x130
>      Call Trace:
>       collapse_file
>       hpage_collapse_scan_file
>       madvise_collapse
> 
> Secretmem is not affected by the crash on upstream as the memory failure
> recovery handles the failed copy gracefully, but it still triggers
> confusing false memory failure reports:
> 
>      Memory failure: 0x106d96f: recovery action for clean unevictable
>      LRU page: Recovered
> 
> Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
> anonymous inode files.
> 
> Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
  2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
  2026-02-14 11:27 ` Lance Yang
  2026-02-15 12:41 ` David Hildenbrand (Arm)
@ 2026-02-15 20:29 ` Barry Song
  2026-02-16  6:47 ` Ackerley Tng
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Barry Song @ 2026-02-15 20:29 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, lance.yang, i, shy828301,
	ackerleytng, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44,
	stable

On Sat, Feb 14, 2026 at 8:15 AM Deepanshu Kartikey
<kartikey406@gmail.com> wrote:
>
> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
> (e.g. guest_memfd and secretmem). These files are created via
> alloc_file_pseudo(), which does not call get_write_access() and leaves
> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
> true, they appear as read-only regular files when
> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
> collapse.
>
> Anonymous inodes can never pass the inode_is_open_for_write() check
> since their i_writecount is never incremented through the normal VFS
> open path. The right thing to do is to exclude them from THP eligibility
> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
> filesystem files (e.g. shared libraries), not for pseudo-filesystem
> inodes.
>
> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
> large folios in the page cache via the collapse path, but the
> guest_memfd fault handler does not support large folios. This triggers
> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
>
> For secretmem, collapse_file() tries to copy page contents through the
> direct map, but secretmem pages are removed from the direct map. This
> can result in a kernel crash:
>
>     BUG: unable to handle page fault for address: ffff88810284d000
>     RIP: 0010:memcpy_orig+0x16/0x130
>     Call Trace:
>      collapse_file
>      hpage_collapse_scan_file
>      madvise_collapse
>
> Secretmem is not affected by the crash on upstream as the memory failure
> recovery handles the failed copy gracefully, but it still triggers
> confusing false memory failure reports:
>
>     Memory failure: 0x106d96f: recovery action for clean unevictable
>     LRU page: Recovered
>
> Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
> anonymous inode files.
>
> Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>

LGTM,

Reviewed-by: Barry Song <baohua@kernel.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
  2026-02-14 11:27 ` Lance Yang
@ 2026-02-15 22:48   ` Ackerley Tng
  0 siblings, 0 replies; 10+ messages in thread
From: Ackerley Tng @ 2026-02-15 22:48 UTC (permalink / raw)
  To: Lance Yang, Deepanshu Kartikey
  Cc: baolin.wang, lorenzo.stoakes, linux-mm, npache, linux-kernel,
	Liam.Howlett, syzbot+33a04338019ac7e43a44, ryan.roberts, stable,
	ziy, dev.jain, i, baohua, shy828301, akpm, david

Lance Yang <lance.yang@linux.dev> writes:

> On 2026/2/14 08:15, Deepanshu Kartikey wrote:
>> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
>> (e.g. guest_memfd and secretmem). These files are created via
>> alloc_file_pseudo(), which does not call get_write_access() and leaves
>> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
>> true, they appear as read-only regular files when
>> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
>> collapse.
>>
>> Anonymous inodes can never pass the inode_is_open_for_write() check
>> since their i_writecount is never incremented through the normal VFS
>> open path. The right thing to do is to exclude them from THP eligibility
>> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
>> filesystem files (e.g. shared libraries), not for pseudo-filesystem
>> inodes.
>>
>> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
>> large folios in the page cache via the collapse path, but the
>> guest_memfd fault handler does not support large folios. This triggers
>> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
>>
>> For secretmem, collapse_file() tries to copy page contents through the
>> direct map, but secretmem pages are removed from the direct map. This
>> can result in a kernel crash:
>
> Good catch, thanks!
>
> For secretmem, file_thp_enabled() can incorrectly return true
> (i_writecount=0, S_ISREG=1), so the mapping becomes eligible for file
> THP collapse ...
>
> However, if any folio is dirty, collapse bails out early with
> SCAN_PAGE_DIRTY_OR_WRITEBACK, as secretmem doesn't support normal
> writeback, IIUC.
>

Yup! In the reproducers [1] I had to try to avoid setting the dirty flag
on the pages.

[1] https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com

>>
>>      BUG: unable to handle page fault for address: ffff88810284d000
>>      RIP: 0010:memcpy_orig+0x16/0x130
>>      Call Trace:
>>       collapse_file
>>       hpage_collapse_scan_file
>>       madvise_collapse
>>
>> Secretmem is not affected by the crash on upstream as the memory failure
>> recovery handles the failed copy gracefully, but it still triggers
>> confusing false memory failure reports:
>>
>>      Memory failure: 0x106d96f: recovery action for clean unevictable
>>      LRU page: Recovered
>
> Right. On my setup, that would hit SCAN_COPY_MC in
> hpage_collapse_scan_file()
> rather than a hard crash.
>

Deepanshu, were you able to trigger a hard crash on some earlier kernel?
I only saw this false memory failure log.

>>
>> Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
>> anonymous inode files.
>>
>> Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
>> Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
>> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
>> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
>> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
>> ---
>
> Confirmed that file_thp_enabled() is working as expected now with this fix.
>
> Tested-by: Lance Yang <lance.yang@linux.dev>
>
>
> Cheers,
> Lance


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
  2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
                   ` (2 preceding siblings ...)
  2026-02-15 20:29 ` Barry Song
@ 2026-02-16  6:47 ` Ackerley Tng
  2026-02-16 15:01 ` Lorenzo Stoakes
  2026-02-17  1:44 ` [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD Ackerley Tng
  5 siblings, 0 replies; 10+ messages in thread
From: Ackerley Tng @ 2026-02-16  6:47 UTC (permalink / raw)
  To: Deepanshu Kartikey, akpm, david, lorenzo.stoakes, ziy,
	baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, lance.yang, i, shy828301
  Cc: linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, stable

Deepanshu Kartikey <kartikey406@gmail.com> writes:

> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
> (e.g. guest_memfd and secretmem). These files are created via
> alloc_file_pseudo(), which does not call get_write_access() and leaves
> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
> true, they appear as read-only regular files when
> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
> collapse.
>
> Anonymous inodes can never pass the inode_is_open_for_write() check
> since their i_writecount is never incremented through the normal VFS
> open path. The right thing to do is to exclude them from THP eligibility
> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
> filesystem files (e.g. shared libraries), not for pseudo-filesystem
> inodes.
>
> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
> large folios in the page cache via the collapse path, but the
> guest_memfd fault handler does not support large folios. This triggers
> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
>
> For secretmem, collapse_file() tries to copy page contents through the
> direct map, but secretmem pages are removed from the direct map. This
> can result in a kernel crash:
>
>     BUG: unable to handle page fault for address: ffff88810284d000
>     RIP: 0010:memcpy_orig+0x16/0x130
>     Call Trace:
>      collapse_file
>      hpage_collapse_scan_file
>      madvise_collapse
>

I couldn't reproduce this crash, I could only reproduce the false memory
failure report below.

> Secretmem is not affected by the crash on upstream as the memory failure
> recovery handles the failed copy gracefully, but it still triggers
> confusing false memory failure reports:
>
>     Memory failure: 0x106d96f: recovery action for clean unevictable
>     LRU page: Recovered
>
> Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
> anonymous inode files.
>
> Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
> ---
> v2:
>   - Use IS_ANON_FILE(inode) to deny THP for all anonymous inode files
>     instead of checking for specific subsystems (David Hildenbrand)
>   - Updated Fixes tag to 7fbb5e188248 which removed the VM_EXEC
>     requirement that accidentally protected secretmem
>   - Expanded commit message with implications for both guest_memfd
>     and secretmem
> ---
>  mm/huge_memory.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 40cf59301c21..d3beddd8cc30 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
>  	inode = file_inode(vma->vm_file);
>
> +	if (IS_ANON_FILE(inode))
> +		return false;
> +

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>

>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>  }
>
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
  2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
                   ` (3 preceding siblings ...)
  2026-02-16  6:47 ` Ackerley Tng
@ 2026-02-16 15:01 ` Lorenzo Stoakes
  2026-02-17  1:44 ` [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD Ackerley Tng
  5 siblings, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2026-02-16 15:01 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, i, shy828301,
	ackerleytng, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44,
	stable

On Sat, Feb 14, 2026 at 05:45:35AM +0530, Deepanshu Kartikey wrote:
> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
> (e.g. guest_memfd and secretmem). These files are created via
> alloc_file_pseudo(), which does not call get_write_access() and leaves
> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
> true, they appear as read-only regular files when
> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
> collapse.
>
> Anonymous inodes can never pass the inode_is_open_for_write() check
> since their i_writecount is never incremented through the normal VFS
> open path. The right thing to do is to exclude them from THP eligibility
> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
> filesystem files (e.g. shared libraries), not for pseudo-filesystem
> inodes.
>
> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
> large folios in the page cache via the collapse path, but the
> guest_memfd fault handler does not support large folios. This triggers
> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
>
> For secretmem, collapse_file() tries to copy page contents through the
> direct map, but secretmem pages are removed from the direct map. This
> can result in a kernel crash:
>
>     BUG: unable to handle page fault for address: ffff88810284d000
>     RIP: 0010:memcpy_orig+0x16/0x130
>     Call Trace:
>      collapse_file
>      hpage_collapse_scan_file
>      madvise_collapse
>
> Secretmem is not affected by the crash on upstream as the memory failure
> recovery handles the failed copy gracefully, but it still triggers
> confusing false memory failure reports:
>
>     Memory failure: 0x106d96f: recovery action for clean unevictable
>     LRU page: Recovered
>
> Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all
> anonymous inode files.

Great commit msg!

>
> Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> v2:
>   - Use IS_ANON_FILE(inode) to deny THP for all anonymous inode files
>     instead of checking for specific subsystems (David Hildenbrand)
>   - Updated Fixes tag to 7fbb5e188248 which removed the VM_EXEC
>     requirement that accidentally protected secretmem
>   - Expanded commit message with implications for both guest_memfd
>     and secretmem
> ---
>  mm/huge_memory.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 40cf59301c21..d3beddd8cc30 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
>  	inode = file_inode(vma->vm_file);
>
> +	if (IS_ANON_FILE(inode))
> +		return false;
> +
>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>  }
>
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD
  2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
                   ` (4 preceding siblings ...)
  2026-02-16 15:01 ` Lorenzo Stoakes
@ 2026-02-17  1:44 ` Ackerley Tng
  2026-02-17 15:15   ` Sean Christopherson
  5 siblings, 1 reply; 10+ messages in thread
From: Ackerley Tng @ 2026-02-17  1:44 UTC (permalink / raw)
  To: kartikey406, seanjc, pbonzini, shuah, kvm, linux-kselftest
  Cc: vannapurve, Liam.Howlett, ackerleytng, akpm, baohua, baolin.wang,
	david, dev.jain, i, lance.yang, linux-kernel, linux-mm,
	lorenzo.stoakes, npache, ryan.roberts, shy828301, stable,
	syzbot+33a04338019ac7e43a44, ziy

guest_memfd only supports PAGE_SIZE pages, and khugepaged or MADV_COLLAPSE
collapsing pages may result in private memory regions being mapped into
host page tables.

Add test to verify that MADV_COLLAPSE fails on guest_memfd folios, and any
subsequent usage of guest_memfd memory faults in PAGE_SIZE folios. Running
this test should not result in any memory failure logs or kernel WARNings.

This selftest was added as a result of a syzbot-reported issue where
khugepaged operating on guest_memfd memory with MADV_HUGEPAGE caused the
collapse of folios, which then subsequently resulted in a WARNing.

Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 .../testing/selftests/kvm/guest_memfd_test.c  | 72 +++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 618c937f3c90f..d16341a4a315d 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -171,6 +171,77 @@ static void test_numa_allocation(int fd, size_t total_size)
 	kvm_munmap(mem, total_size);
 }
 
+static size_t getpmdsize(void)
+{
+	const char *path = "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size";
+	static size_t pmd_size = -1;
+	FILE *fp;
+
+	if (pmd_size != -1)
+		return pmd_size;
+
+	fp = fopen(path, "r");
+	TEST_ASSERT(fp, "Couldn't open %s to read PMD size.", path);
+
+	TEST_ASSERT_EQ(fscanf(fp, "%lu", &pmd_size), 1);
+
+	TEST_ASSERT_EQ(fclose(fp), 0);
+
+	return pmd_size;
+}
+
+static void test_collapse(struct kvm_vm *vm, uint64_t flags)
+{
+	const size_t pmd_size = getpmdsize();
+	char *mem;
+	off_t i;
+	int fd;
+
+	fd = vm_create_guest_memfd(vm, pmd_size * 2,
+				   GUEST_MEMFD_FLAG_MMAP |
+				   GUEST_MEMFD_FLAG_INIT_SHARED);
+
+	/*
+	 * Use aligned address so that MADV_COLLAPSE will not be
+	 * filtered out early in the collapsing routine.
+	 */
+#define ALIGNED_ADDRESS ((void *)0x4000000000UL)
+	mem = mmap(ALIGNED_ADDRESS, pmd_size, PROT_READ | PROT_WRITE,
+		   MAP_FIXED | MAP_SHARED, fd, 0);
+	TEST_ASSERT_EQ(mem, ALIGNED_ADDRESS);
+
+	/*
+	 * Use reads to populate page table to avoid setting dirty
+	 * flag on page.
+	 */
+	for (i = 0; i < pmd_size; i += getpagesize())
+		READ_ONCE(mem[i]);
+
+	/*
+	 * Advising the use of huge pages in guest_memfd should be
+	 * fine...
+	 */
+	TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_HUGEPAGE), 0);
+
+	/*
+	 * ... but collapsing folios must not be supported to avoid
+	 * mapping beyond shared ranges into host userspace page
+	 * tables.
+	 */
+	TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
+	TEST_ASSERT_EQ(errno, EINVAL);
+
+	/*
+	 * Removing from host page tables and re-faulting should be
+	 * fine; should not end up faulting in a collapsed/huge folio.
+	 */
+	TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_DONTNEED), 0);
+	READ_ONCE(mem[0]);
+
+	kvm_munmap(mem, pmd_size);
+	kvm_close(fd);
+}
+
 static void test_fault_sigbus(int fd, size_t accessible_size, size_t map_size)
 {
 	const char val = 0xaa;
@@ -370,6 +441,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 			gmem_test(mmap_supported, vm, flags);
 			gmem_test(fault_overflow, vm, flags);
 			gmem_test(numa_allocation, vm, flags);
+			test_collapse(vm, flags);
 		} else {
 			gmem_test(fault_private, vm, flags);
 		}
-- 
2.53.0.273.g2a3d683680-goog



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD
  2026-02-17  1:44 ` [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD Ackerley Tng
@ 2026-02-17 15:15   ` Sean Christopherson
  2026-02-20 23:59     ` Ackerley Tng
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-02-17 15:15 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: kartikey406, pbonzini, shuah, kvm, linux-kselftest, vannapurve,
	Liam.Howlett, akpm, baohua, baolin.wang, david, dev.jain, i,
	lance.yang, linux-kernel, linux-mm, lorenzo.stoakes, npache,
	ryan.roberts, shy828301, stable, syzbot+33a04338019ac7e43a44,
	ziy

On Tue, Feb 17, 2026, Ackerley Tng wrote:
> diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
> index 618c937f3c90f..d16341a4a315d 100644
> --- a/tools/testing/selftests/kvm/guest_memfd_test.c
> +++ b/tools/testing/selftests/kvm/guest_memfd_test.c
> @@ -171,6 +171,77 @@ static void test_numa_allocation(int fd, size_t total_size)
>  	kvm_munmap(mem, total_size);
>  }
>  
> +static size_t getpmdsize(void)

This absolutely belongs in library/utility code.

> +{
> +	const char *path = "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size";
> +	static size_t pmd_size = -1;
> +	FILE *fp;
> +
> +	if (pmd_size != -1)
> +		return pmd_size;
> +
> +	fp = fopen(path, "r");
> +	TEST_ASSERT(fp, "Couldn't open %s to read PMD size.", path);

This will likely assert on a kernel without THP support.

> +	TEST_ASSERT_EQ(fscanf(fp, "%lu", &pmd_size), 1);
> +
> +	TEST_ASSERT_EQ(fclose(fp), 0);

Please try to extend tools/testing/selftests/kvm/include/kvm_syscalls.h.

> +
> +	return pmd_size;
> +}
> +
> +static void test_collapse(struct kvm_vm *vm, uint64_t flags)
> +{
> +	const size_t pmd_size = getpmdsize();
> +	char *mem;
> +	off_t i;
> +	int fd;
> +
> +	fd = vm_create_guest_memfd(vm, pmd_size * 2,
> +				   GUEST_MEMFD_FLAG_MMAP |
> +				   GUEST_MEMFD_FLAG_INIT_SHARED);
> +
> +	/*
> +	 * Use aligned address so that MADV_COLLAPSE will not be
> +	 * filtered out early in the collapsing routine.

Please elaborate, the value below is way more magical than just being aligned.

> +	 */
> +#define ALIGNED_ADDRESS ((void *)0x4000000000UL)

Use a "const void *" instead of #define inside a function.  And use one of the
appropriate size macros, e.g.

	const void *ALIGNED_ADDRESS = (void *)(SZ_1G * <some magic value>);

But why hardcode a virtual address in the first place?  If you a specific
alignment, just allocate enough virtual memory to be able to meet those alignment
requirements.

> +	mem = mmap(ALIGNED_ADDRESS, pmd_size, PROT_READ | PROT_WRITE,
> +		   MAP_FIXED | MAP_SHARED, fd, 0);
> +	TEST_ASSERT_EQ(mem, ALIGNED_ADDRESS);
> +
> +	/*
> +	 * Use reads to populate page table to avoid setting dirty
> +	 * flag on page.
> +	 */
> +	for (i = 0; i < pmd_size; i += getpagesize())
> +		READ_ONCE(mem[i]);
> +
> +	/*
> +	 * Advising the use of huge pages in guest_memfd should be
> +	 * fine...
> +	 */
> +	TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_HUGEPAGE), 0);
> +
> +	/*
> +	 * ... but collapsing folios must not be supported to avoid
> +	 * mapping beyond shared ranges into host userspace page
> +	 * tables.
> +	 */
> +	TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
> +	TEST_ASSERT_EQ(errno, EINVAL);
> +
> +	/*
> +	 * Removing from host page tables and re-faulting should be
> +	 * fine; should not end up faulting in a collapsed/huge folio.
> +	 */
> +	TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_DONTNEED), 0);
> +	READ_ONCE(mem[0]);
> +
> +	kvm_munmap(mem, pmd_size);
> +	kvm_close(fd);
> +}
> +
>  static void test_fault_sigbus(int fd, size_t accessible_size, size_t map_size)
>  {
>  	const char val = 0xaa;
> @@ -370,6 +441,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>  			gmem_test(mmap_supported, vm, flags);
>  			gmem_test(fault_overflow, vm, flags);
>  			gmem_test(numa_allocation, vm, flags);
> +			test_collapse(vm, flags);

Why diverge from everything else?  Yeah, the size is different, but that's easy
enough to handle.  And presumably the THP query needs to be able to fail gracefully,
so something like this?

diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index 618c937f3c90..e942adae1f59 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -350,14 +350,28 @@ static void test_guest_memfd_flags(struct kvm_vm *vm)
        }
 }
 
-#define gmem_test(__test, __vm, __flags)                               \
+#define __gmem_test(__test, __vm, __flags, __size)                     \
 do {                                                                   \
-       int fd = vm_create_guest_memfd(__vm, page_size * 4, __flags);   \
+       int fd = vm_create_guest_memfd(__vm, __size, __flags);          \
                                                                        \
-       test_##__test(fd, page_size * 4);                               \
+       test_##__test(fd, __size);                                      \
        close(fd);                                                      \
 } while (0)
 
+#define gmem_test(__test, __vm, __flags)                               \
+       __gmem_test(__test, __vm, __flags, page_size * 4)
+
+#define gmem_test_huge_pmd(__test, __vm, __flags)                      \
+do {                                                                   \
+       size_t pmd_size = kvm_get_thp_pmd_size();                       \
+                                                                       \
+       if (!pmd_size)                                                  \
+               break;                                                  \
+                                                                       \
+       __gmem_test(__test, __vm, __flags, pmd_size * 2);               \
+} while (0)
+
+
 static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
 {
        test_create_guest_memfd_multiple(vm);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD
  2026-02-17 15:15   ` Sean Christopherson
@ 2026-02-20 23:59     ` Ackerley Tng
  0 siblings, 0 replies; 10+ messages in thread
From: Ackerley Tng @ 2026-02-20 23:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kartikey406, pbonzini, shuah, kvm, linux-kselftest, vannapurve,
	Liam.Howlett, akpm, baohua, baolin.wang, david, dev.jain, i,
	lance.yang, linux-kernel, linux-mm, lorenzo.stoakes, npache,
	ryan.roberts, shy828301, stable, syzbot+33a04338019ac7e43a44,
	ziy

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Feb 17, 2026, Ackerley Tng wrote:
>>
>> [...snip...]
>>
>> +
>> +	/*
>> +	 * Use aligned address so that MADV_COLLAPSE will not be
>> +	 * filtered out early in the collapsing routine.
>
> Please elaborate, the value below is way more magical than just being aligned.
>
>> +	 */
>> +#define ALIGNED_ADDRESS ((void *)0x4000000000UL)
>
> Use a "const void *" instead of #define inside a function.  And use one of the
> appropriate size macros, e.g.
>
> 	const void *ALIGNED_ADDRESS = (void *)(SZ_1G * <some magic value>);
>
> But why hardcode a virtual address in the first place?  If you a specific
> alignment, just allocate enough virtual memory to be able to meet those alignment
> requirements.
>
>> +	mem = mmap(ALIGNED_ADDRESS, pmd_size, PROT_READ | PROT_WRITE,
>> +		   MAP_FIXED | MAP_SHARED, fd, 0);
>>
>> [...snip...]
>>
>> @@ -370,6 +441,7 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
>>  			gmem_test(mmap_supported, vm, flags);
>>  			gmem_test(fault_overflow, vm, flags);
>>  			gmem_test(numa_allocation, vm, flags);
>> +			test_collapse(vm, flags);
>
> Why diverge from everything else?  Yeah, the size is different, but that's easy
> enough to handle.  And presumably the THP query needs to be able to fail gracefully,
> so something like this?
>
>
> [...snip...]
>

Addressed your comments in a v2 [*], thanks for reviewing!

[*] https://lore.kernel.org/all/cover.1771630983.git.ackerleytng@google.com/T/


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-20 23:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-14  0:15 [PATCH v2] mm: thp: deny THP for files on anonymous inodes Deepanshu Kartikey
2026-02-14 11:27 ` Lance Yang
2026-02-15 22:48   ` Ackerley Tng
2026-02-15 12:41 ` David Hildenbrand (Arm)
2026-02-15 20:29 ` Barry Song
2026-02-16  6:47 ` Ackerley Tng
2026-02-16 15:01 ` Lorenzo Stoakes
2026-02-17  1:44 ` [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD Ackerley Tng
2026-02-17 15:15   ` Sean Christopherson
2026-02-20 23:59     ` Ackerley Tng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox