* [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
@ 2026-02-09 3:35 Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-09 3:35 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini,
michael.roth, vannapurve
Cc: ziy, linux-mm, linux-kernel, Deepanshu Kartikey,
syzbot+33a04338019ac7e43a44, Deepanshu Kartikey
file_thp_enabled() incorrectly returns true for guest_memfd and secretmem
inodes because they appear as regular read-only files when
CONFIG_READ_ONLY_THP_FOR_FS is enabled. This allows khugepaged and
MADV_COLLAPSE to create large folios in the page cache, but their fault
handlers do not support large folios.
Add explicit checks for GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC to reject
these filesystems early in file_thp_enabled().
Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
---
mm/huge_memory.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..4f57c78b57dd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
return false;
inode = file_inode(vma->vm_file);
+ if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
+ inode->i_sb->s_magic == SECRETMEM_MAGIC)
+ return false;
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
--
2.43.0
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey
@ 2026-02-09 10:24 ` David Hildenbrand (Arm)
2026-02-09 10:41 ` David Hildenbrand (Arm)
2026-02-09 23:37 ` kernel test robot
2026-02-10 17:51 ` kernel test robot
2 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 10:24 UTC (permalink / raw)
To: Deepanshu Kartikey, akpm, lorenzo.stoakes, baolin.wang,
Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
ackerleytng, seanjc, pbonzini, michael.roth, vannapurve
Cc: ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44
On 2/9/26 04:35, Deepanshu Kartikey wrote:
> file_thp_enabled() incorrectly returns true for guest_memfd and secretmem
> inodes because they appear as regular read-only files when
> CONFIG_READ_ONLY_THP_FOR_FS is enabled. This allows khugepaged and
> MADV_COLLAPSE to create large folios in the page cache, but their fault
> handlers do not support large folios.
>
> Add explicit checks for GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC to reject
> these filesystems early in file_thp_enabled().
>
> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
So we were able to reproduce this with secretmem, right?
We want to add "Fixes:" for the introducing commits, which would be he
commits that enable secretmem and mapping of guest_memfd pages to user
space. Can you identify them?
And also
Cc: stable@vger.kernel.org
> ---
> mm/huge_memory.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 40cf59301c21..4f57c78b57dd 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> return false;
>
> inode = file_inode(vma->vm_file);
> + if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
> + inode->i_sb->s_magic == SECRETMEM_MAGIC)
> + return false;
That's nasty. We want some way to identify that through the mapping.
Unfortunately CONFIG_READ_ONLY_THP_FOR_FS ignores any
mapping_set_large_folios() configs by design.
And CONFIG_READ_ONLY_THP_FOR_FS might go away soon, but we need a fix
until then.
While we can identify secretmem through vma_is_secretmem(), we can't do
the same for guest_memfd as it's built as a module.
Unfortunately AS_NO_DIRECT_MAP[1] won't work.
Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily
rip out along with CONFIG_READ_ONLY_THP_FOR_FS later?
[1] https://lore.kernel.org/r/20260126164445.11867-6-kalyazin@amazon.com
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 10:24 ` David Hildenbrand (Arm)
@ 2026-02-09 10:41 ` David Hildenbrand (Arm)
2026-02-09 13:06 ` Deepanshu Kartikey
0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 10:41 UTC (permalink / raw)
To: Deepanshu Kartikey, akpm, lorenzo.stoakes, baolin.wang,
Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
ackerleytng, seanjc, pbonzini, michael.roth, vannapurve
Cc: ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44
On 2/9/26 11:24, David Hildenbrand (Arm) wrote:
> On 2/9/26 04:35, Deepanshu Kartikey wrote:
>> file_thp_enabled() incorrectly returns true for guest_memfd and secretmem
>> inodes because they appear as regular read-only files when
>> CONFIG_READ_ONLY_THP_FOR_FS is enabled. This allows khugepaged and
>> MADV_COLLAPSE to create large folios in the page cache, but their fault
>> handlers do not support large folios.
>>
>> Add explicit checks for GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC to reject
>> these filesystems early in file_thp_enabled().
>>
>> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
>> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
>> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
>
> So we were able to reproduce this with secretmem, right?
>
> We want to add "Fixes:" for the introducing commits, which would be he
> commits that enable secretmem and mapping of guest_memfd pages to user
> space. Can you identify them?
>
> And also
>
> Cc: stable@vger.kernel.org
>
>> ---
>> mm/huge_memory.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 40cf59301c21..4f57c78b57dd 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct
>> vm_area_struct *vma)
>> return false;
>> inode = file_inode(vma->vm_file);
>> + if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
>> + inode->i_sb->s_magic == SECRETMEM_MAGIC)
>> + return false;
>
> That's nasty. We want some way to identify that through the mapping.
>
> Unfortunately CONFIG_READ_ONLY_THP_FOR_FS ignores any
> mapping_set_large_folios() configs by design.
>
> And CONFIG_READ_ONLY_THP_FOR_FS might go away soon, but we need a fix
> until then.
>
> While we can identify secretmem through vma_is_secretmem(), we can't do
> the same for guest_memfd as it's built as a module.
>
> Unfortunately AS_NO_DIRECT_MAP[1] won't work.
>
> Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily
> rip out along with CONFIG_READ_ONLY_THP_FOR_FS later?
On second thought, why do we pass the
!inode_is_open_for_write(inode)
in file_thp_enabled()?
Isn't that the main problem for these memfd things?
Maybe a get_write_access() is missing somewhere?
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 10:41 ` David Hildenbrand (Arm)
@ 2026-02-09 13:06 ` Deepanshu Kartikey
2026-02-09 18:22 ` Ackerley Tng
0 siblings, 1 reply; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-09 13:06 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini,
michael.roth, vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> > Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily
> > rip out along with CONFIG_READ_ONLY_THP_FOR_FS later?
>
> On second thought, why do we pass the
>
> !inode_is_open_for_write(inode)
>
> in file_thp_enabled()?
>
> Isn't that the main problem for these memfd things?
>
> Maybe a get_write_access() is missing somewhere?
>
Hi David,
Thanks for the suggestion. I looked into the get_write_access() path.
Both guest_memfd and secretmem use alloc_file_pseudo() which skips
calling get_write_access(), so i_writecount stays 0. That's why
file_thp_enabled() sees them as read-only files.
We could add get_write_access() after alloc_file_pseudo() in both, but
I think that would be a hack rather than a proper fix:
- i_writecount has a specific semantic: tracking how many fds have the
file open for writing. We'd be bumping it just to influence
file_thp_enabled() behavior.
- It doesn't express the actual intent. The real issue is that
CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
backed files.
I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
the cleaner approach. It is explicit, has no side effects, and is easy
to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
Here is the diff:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ec442af3f886..23f559fc1a4c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -211,6 +211,7 @@ enum mapping_flags {
AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
account usage to user cgroups */
AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
+ AS_NO_READ_ONLY_THP_FOR_FS = 12,
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..4bdda92ce01e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct
vm_area_struct *vma)
inode = file_inode(vma->vm_file);
+ if (test_bit(AS_NO_READ_ONLY_THP_FOR_FS, &inode->i_mapping->flags))
+ return false;
+
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
diff --git a/mm/secretmem.c b/mm/secretmem.c
index edf111e0a1bb..56d93a74f5fc 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -205,7 +205,8 @@ static struct file *secretmem_file_create(unsigned
long flags)
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
mapping_set_unevictable(inode->i_mapping);
+ set_bit(AS_NO_READ_ONLY_THP_FOR_FS, &inode->i_mapping->flags);
inode->i_op = &secretmem_iops;
inode->i_mapping->a_ops = &secretmem_aops;
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fdaea3422c30..b93a324c81bd 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -597,6 +597,7 @@ static int __kvm_gmem_create(struct kvm *kvm,
loff_t size, u64 flags)
inode->i_size = size;
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
mapping_set_inaccessible(inode->i_mapping);
+ set_bit(AS_NO_READ_ONLY_THP_FOR_FS, &inode->i_mapping->flags);
/* Unmovable mappings are supposed to be marked unevictable as well. */
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
Please let me know if this looks good and I will send a formal v2.
Thanks,
Deepanshu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 13:06 ` Deepanshu Kartikey
@ 2026-02-09 18:22 ` Ackerley Tng
2026-02-09 19:45 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 29+ messages in thread
From: Ackerley Tng @ 2026-02-09 18:22 UTC (permalink / raw)
To: Deepanshu Kartikey, David Hildenbrand (Arm)
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
Deepanshu Kartikey <kartikey406@gmail.com> writes:
> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> > Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily
>> > rip out along with CONFIG_READ_ONLY_THP_FOR_FS later?
>>
>> On second thought, why do we pass the
>>
>> !inode_is_open_for_write(inode)
>>
>> in file_thp_enabled()?
>>
>> Isn't that the main problem for these memfd things?
>>
>> Maybe a get_write_access() is missing somewhere?
>>
>
> Hi David,
>
> Thanks for the suggestion. I looked into the get_write_access() path.
>
> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
> calling get_write_access(), so i_writecount stays 0. That's why
> file_thp_enabled() sees them as read-only files.
>
> We could add get_write_access() after alloc_file_pseudo() in both, but
> I think that would be a hack rather than a proper fix:
>
> - i_writecount has a specific semantic: tracking how many fds have the
> file open for writing. We'd be bumping it just to influence
> file_thp_enabled() behavior.
>
I agree re-using i_writecount feels odd since it is abusing the idea of
being written to. I might have misunderstood the full context of
i_writecount though.
> - It doesn't express the actual intent. The real issue is that
> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
> backed files.
>
> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
> the cleaner approach. It is explicit, has no side effects, and is easy
> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>
I was considering other address space flags and I think the best might
be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
__vma_thp_allowable_orders() check the maximum allowed order for the
address space.
khugepaged is about consolidating memory to huge pages, so if the
address space doesn't allow a larger folio order, then khugepaged should
not operate on that memory.
The other options are
+ AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
but IIUC evictability is more closely related to swapping and
khugepaged might operate on swappable memory? Both guest_memfd and
secretmem set AS_UNEVICTABLE.
+ AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
to block migration. khugepaged kind of migrates the memory contents
too, but someday we want guest_memfd to support migration, and at that
time we would still want to block khugepaged, so I don't think we want
to reuse a flag that couples khugepaged to migration.
>
> [...snip...]
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 18:22 ` Ackerley Tng
@ 2026-02-09 19:45 ` David Hildenbrand (Arm)
2026-02-09 20:13 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 19:45 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On 2/9/26 19:22, Ackerley Tng wrote:
> Deepanshu Kartikey <kartikey406@gmail.com> writes:
>
>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@kernel.org> wrote:
>>>
>>>
>>> On second thought, why do we pass the
>>>
>>> !inode_is_open_for_write(inode)
>>>
>>> in file_thp_enabled()?
>>>
>>> Isn't that the main problem for these memfd things?
>>>
>>> Maybe a get_write_access() is missing somewhere?
>>>
>>
>> Hi David,
>>
>> Thanks for the suggestion. I looked into the get_write_access() path.
>>
>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>> calling get_write_access(), so i_writecount stays 0. That's why
>> file_thp_enabled() sees them as read-only files.
>>
>> We could add get_write_access() after alloc_file_pseudo() in both, but
>> I think that would be a hack rather than a proper fix:
>>
>> - i_writecount has a specific semantic: tracking how many fds have the
>> file open for writing. We'd be bumping it just to influence
>> file_thp_enabled() behavior.
>>
>
> I agree re-using i_writecount feels odd since it is abusing the idea of
> being written to. I might have misunderstood the full context of
> i_writecount though.
i_writecount means "the file is open with write access" IIUC. So one can
mmap(PROT_WRITE) it etc.
And that's kind of the thing: the virtual file is open with write
access. That's why I am still wondering whether mimicking that is
actually the right fix.
>
>> - It doesn't express the actual intent. The real issue is that
>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>> backed files.
>>
>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>> the cleaner approach. It is explicit, has no side effects, and is easy
>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>
>
> I was considering other address space flags and I think the best might
> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
> __vma_thp_allowable_orders() check the maximum allowed order for the
> address space.
The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these
folio order checks. Changing it would degrade filesystems that do not
support large folios yet. IOW, it would be similar to ripping out
CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
>
> khugepaged is about consolidating memory to huge pages, so if the
> address space doesn't allow a larger folio order, then khugepaged should
> not operate on that memory.
>
> The other options are
>
> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
> but IIUC evictability is more closely related to swapping and
> khugepaged might operate on swappable memory?
Right, it does not really make sense
> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
> to block migration. khugepaged kind of migrates the memory contents
> too, but someday we want guest_memfd to support migration, and at that
> time we would still want to block khugepaged, so I don't think we want
> to reuse a flag that couples khugepaged to migration.
It could be used at least for the time being and to fix the issue.
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 19:45 ` David Hildenbrand (Arm)
@ 2026-02-09 20:13 ` David Hildenbrand (Arm)
2026-02-09 21:31 ` Ackerley Tng
2026-02-10 1:51 ` Deepanshu Kartikey
0 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 20:13 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On 2/9/26 20:45, David Hildenbrand (Arm) wrote:
> On 2/9/26 19:22, Ackerley Tng wrote:
>> Deepanshu Kartikey <kartikey406@gmail.com> writes:
>>
>>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm)
>>> <david@kernel.org> wrote:
>>>
>>> Hi David,
>>>
>>> Thanks for the suggestion. I looked into the get_write_access() path.
>>>
>>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>>> calling get_write_access(), so i_writecount stays 0. That's why
>>> file_thp_enabled() sees them as read-only files.
>>>
>>> We could add get_write_access() after alloc_file_pseudo() in both, but
>>> I think that would be a hack rather than a proper fix:
>>>
>>> - i_writecount has a specific semantic: tracking how many fds have the
>>> file open for writing. We'd be bumping it just to influence
>>> file_thp_enabled() behavior.
>>>
>>
>> I agree re-using i_writecount feels odd since it is abusing the idea of
>> being written to. I might have misunderstood the full context of
>> i_writecount though.
>
> i_writecount means "the file is open with write access" IIUC. So one can
> mmap(PROT_WRITE) it etc.
>
> And that's kind of the thing: the virtual file is open with write
> access. That's why I am still wondering whether mimicking that is
> actually the right fix.
>
>>
>>> - It doesn't express the actual intent. The real issue is that
>>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>>> backed files.
>>>
>>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>>> the cleaner approach. It is explicit, has no side effects, and is easy
>>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>>
>>
>> I was considering other address space flags and I think the best might
>> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
>> __vma_thp_allowable_orders() check the maximum allowed order for the
>> address space.
>
> The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these
> folio order checks. Changing it would degrade filesystems that do not
> support large folios yet. IOW, it would be similar to ripping out
> CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
>
>>
>> khugepaged is about consolidating memory to huge pages, so if the
>> address space doesn't allow a larger folio order, then khugepaged should
>> not operate on that memory.
>>
>> The other options are
>>
>> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
>> but IIUC evictability is more closely related to swapping and
>> khugepaged might operate on swappable memory?
> Right, it does not really make sense
>
>> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
>> to block migration. khugepaged kind of migrates the memory contents
>> too, but someday we want guest_memfd to support migration, and at that
>> time we would still want to block khugepaged, so I don't think we want
>> to reuse a flag that couples khugepaged to migration.
>
> It could be used at least for the time being and to fix the issue.
mapping_inaccessible(mapping) indeed looks like the easiest fix, given that
shmem "somehow" works, lol.
BUT, something just occurred to me.
We added the mc-handling in
commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
Author: Jiaqi Yan <jiaqiyan@google.com>
Date: Wed Mar 29 08:11:19 2023 -0700
mm/khugepaged: recover from poisoned anonymous memory
..
So I assume kernels before that would crash when collapsing?
Looking at 5.15.199, it does not contain 98c76c9f1e [1].
So I suspect we need a fix+stable backport.
Who volunteers to try a secretmem reproducer on a stable kernel? :)
The following is a bit nasty as well but should do the trick until we rip
out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 03886d4ccecc..4ac1cb36b861 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -40,6 +40,7 @@
#include <linux/pgalloc.h>
#include <linux/pgalloc_tag.h>
#include <linux/pagewalk.h>
+#include <linux/secretmem.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
inode = file_inode(vma->vm_file);
+ if (mapping_inaccessible(inode->i_mapping) ||
+ secretmem_mapping(inode->i_mapping))
+ return false;
+
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 20:13 ` David Hildenbrand (Arm)
@ 2026-02-09 21:31 ` Ackerley Tng
2026-02-10 9:33 ` David Hildenbrand (Arm)
2026-02-10 1:51 ` Deepanshu Kartikey
1 sibling, 1 reply; 29+ messages in thread
From: Ackerley Tng @ 2026-02-09 21:31 UTC (permalink / raw)
To: David Hildenbrand (Arm), Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
"David Hildenbrand (Arm)" <david@kernel.org> writes:
> On 2/9/26 20:45, David Hildenbrand (Arm) wrote:
>> On 2/9/26 19:22, Ackerley Tng wrote:
>>> Deepanshu Kartikey <kartikey406@gmail.com> writes:
>>>
>>>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm)
>>>> <david@kernel.org> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thanks for the suggestion. I looked into the get_write_access() path.
>>>>
>>>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>>>> calling get_write_access(), so i_writecount stays 0. That's why
>>>> file_thp_enabled() sees them as read-only files.
>>>>
>>>> We could add get_write_access() after alloc_file_pseudo() in both, but
>>>> I think that would be a hack rather than a proper fix:
>>>>
>>>> - i_writecount has a specific semantic: tracking how many fds have the
>>>> file open for writing. We'd be bumping it just to influence
>>>> file_thp_enabled() behavior.
>>>>
>>>
>>> I agree re-using i_writecount feels odd since it is abusing the idea of
>>> being written to. I might have misunderstood the full context of
>>> i_writecount though.
>>
>> i_writecount means "the file is open with write access" IIUC. So one can
>> mmap(PROT_WRITE) it etc.
>>
>> And that's kind of the thing: the virtual file is open with write
>> access. That's why I am still wondering whether mimicking that is
>> actually the right fix.
>>
>>>
>>>> - It doesn't express the actual intent. The real issue is that
>>>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>>>> backed files.
>>>>
>>>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>>>> the cleaner approach. It is explicit, has no side effects, and is easy
>>>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>>>
>>>
>>> I was considering other address space flags and I think the best might
>>> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
>>> __vma_thp_allowable_orders() check the maximum allowed order for the
>>> address space.
>>
>> The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these
>> folio order checks.
Ah that's true.
>> Changing it would degrade filesystems that do not
>> support large folios yet. IOW, it would be similar to ripping out
>> CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
>>
>>>
>>> khugepaged is about consolidating memory to huge pages, so if the
>>> address space doesn't allow a larger folio order, then khugepaged should
>>> not operate on that memory.
>>>
>>> The other options are
>>>
>>> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
>>> but IIUC evictability is more closely related to swapping and
>>> khugepaged might operate on swappable memory?
>> Right, it does not really make sense
>>
>>> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
>>> to block migration. khugepaged kind of migrates the memory contents
>>> too, but someday we want guest_memfd to support migration, and at that
>>> time we would still want to block khugepaged, so I don't think we want
>>> to reuse a flag that couples khugepaged to migration.
>>
>> It could be used at least for the time being and to fix the issue.
>
> mapping_inaccessible(mapping) indeed looks like the easiest fix, given that
> shmem "somehow" works, lol.
>
I could also check shmem, but I'm not sure which conditions to set up
shmem for, since shmem could be used in so many ways. Any suggestions?
Off the top of my head, shmem lots of special-casing in the khugepaged
flow...
> BUT, something just occurred to me.
>
> We added the mc-handling in
>
> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
> Author: Jiaqi Yan <jiaqiyan@google.com>
> Date: Wed Mar 29 08:11:19 2023 -0700
>
> mm/khugepaged: recover from poisoned anonymous memory
>
> ..
>
> So I assume kernels before that would crash when collapsing?
>
> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
>
> So I suspect we need a fix+stable backport.
>
> Who volunteers to try a secretmem reproducer on a stable kernel? :)
>
I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
special-casing secretmem like you suggested below?
>
> The following is a bit nasty as well but should do the trick until we rip
> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 03886d4ccecc..4ac1cb36b861 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -40,6 +40,7 @@
> #include <linux/pgalloc.h>
> #include <linux/pgalloc_tag.h>
> #include <linux/pagewalk.h>
> +#include <linux/secretmem.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
> inode = file_inode(vma->vm_file);
>
> + if (mapping_inaccessible(inode->i_mapping) ||
> + secretmem_mapping(inode->i_mapping))
> + return false;
> +
Regarding the degradation of filesystems that don't support large folios
yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX
would disable collapsing for filesystems that actually want pages to be
collapsed, but don't update max folio order and hence appear to not
support large folios yet?
What about a check like this instead
if (!mapping_large_folio_support())
return false;
And then when CONFIG_READ_ONLY_THP_FOR_FS is removed, part of that work
would involve getting filesystems to update AS_FOLIO_ORDER_MAX?
> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
@ 2026-02-09 23:37 ` kernel test robot
2026-02-10 17:51 ` kernel test robot
2 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2026-02-09 23:37 UTC (permalink / raw)
To: Deepanshu Kartikey, akpm, david, lorenzo.stoakes, baolin.wang,
Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
ackerleytng, seanjc, pbonzini, michael.roth, vannapurve
Cc: oe-kbuild-all, ziy, linux-mm, linux-kernel, Deepanshu Kartikey,
syzbot+33a04338019ac7e43a44
Hi Deepanshu,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Deepanshu-Kartikey/mm-thp-Deny-THP-for-guest_memfd-and-secretmem-in-file_thp_enabled/20260209-113800
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20260209033558.22943-1-kartikey406%40gmail.com
patch subject: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
config: arc-randconfig-001-20260210 (https://download.01.org/0day-ci/archive/20260210/202602100727.b1U4CHAA-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 14.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260210/202602100727.b1U4CHAA-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602100727.b1U4CHAA-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/huge_memory.c: In function 'file_thp_enabled':
>> mm/huge_memory.c:96:37: error: 'GUEST_MEMFD_MAGIC' undeclared (first use in this function)
96 | if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
| ^~~~~~~~~~~~~~~~~
mm/huge_memory.c:96:37: note: each undeclared identifier is reported only once for each function it appears in
>> mm/huge_memory.c:97:37: error: 'SECRETMEM_MAGIC' undeclared (first use in this function)
97 | inode->i_sb->s_magic == SECRETMEM_MAGIC)
| ^~~~~~~~~~~~~~~
vim +/GUEST_MEMFD_MAGIC +96 mm/huge_memory.c
84
85 static inline bool file_thp_enabled(struct vm_area_struct *vma)
86 {
87 struct inode *inode;
88
89 if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
90 return false;
91
92 if (!vma->vm_file)
93 return false;
94
95 inode = file_inode(vma->vm_file);
> 96 if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
> 97 inode->i_sb->s_magic == SECRETMEM_MAGIC)
98 return false;
99
100 return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
101 }
102
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 20:13 ` David Hildenbrand (Arm)
2026-02-09 21:31 ` Ackerley Tng
@ 2026-02-10 1:51 ` Deepanshu Kartikey
2026-02-10 9:33 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-10 1:51 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini,
michael.roth, vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On Tue, Feb 10, 2026 at 1:43 AM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> The following is a bit nasty as well but should do the trick until we rip
> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 03886d4ccecc..4ac1cb36b861 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -40,6 +40,7 @@
> #include <linux/pgalloc.h>
> #include <linux/pgalloc_tag.h>
> #include <linux/pagewalk.h>
> +#include <linux/secretmem.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
> inode = file_inode(vma->vm_file);
>
> + if (mapping_inaccessible(inode->i_mapping) ||
> + secretmem_mapping(inode->i_mapping))
> + return false;
> +
> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
>
Hi David,
Agreed, using mapping_inaccessible() for guest_memfd and
secretmem_mapping() for secretmem is much simpler than introducing a
new AS flag. No changes needed outside of file_thp_enabled().
I will send a v2 with your suggested diff and test it on syzbot.
Thanks,
Deepanshu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 21:31 ` Ackerley Tng
@ 2026-02-10 9:33 ` David Hildenbrand (Arm)
2026-02-10 23:00 ` Ackerley Tng
0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-10 9:33 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
>> BUT, something just occurred to me.
>>
>> We added the mc-handling in
>>
>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
>> Author: Jiaqi Yan <jiaqiyan@google.com>
>> Date: Wed Mar 29 08:11:19 2023 -0700
>>
>> mm/khugepaged: recover from poisoned anonymous memory
>>
>> ..
>>
>> So I assume kernels before that would crash when collapsing?
>>
>> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
>>
>> So I suspect we need a fix+stable backport.
>>
>> Who volunteers to try a secretmem reproducer on a stable kernel? :)
>>
>
> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
> special-casing secretmem like you suggested below?
Yes. If there is no guest_memfd we wouldn't need it.
>
>>
>> The following is a bit nasty as well but should do the trick until we rip
>> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 03886d4ccecc..4ac1cb36b861 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -40,6 +40,7 @@
>> #include <linux/pgalloc.h>
>> #include <linux/pgalloc_tag.h>
>> #include <linux/pagewalk.h>
>> +#include <linux/secretmem.h>
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>
>> inode = file_inode(vma->vm_file);
>>
>> + if (mapping_inaccessible(inode->i_mapping) ||
>> + secretmem_mapping(inode->i_mapping))
>> + return false;
>> +
>
> Regarding the degradation of filesystems that don't support large folios
> yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX
> would disable collapsing for filesystems that actually want pages to be
> collapsed, but don't update max folio order and hence appear to not
> support large folios yet?
>
> What about a check like this instead
>
> if (!mapping_large_folio_support())
> return false;
That would essentially disable CONFIG_READ_ONLY_THP_FOR_FS (support for
THP before filesystems started supporting large folios officially), no?
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-10 1:51 ` Deepanshu Kartikey
@ 2026-02-10 9:33 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-10 9:33 UTC (permalink / raw)
To: Deepanshu Kartikey
Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini,
michael.roth, vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On 2/10/26 02:51, Deepanshu Kartikey wrote:
> On Tue, Feb 10, 2026 at 1:43 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>
>> The following is a bit nasty as well but should do the trick until we rip
>> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 03886d4ccecc..4ac1cb36b861 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -40,6 +40,7 @@
>> #include <linux/pgalloc.h>
>> #include <linux/pgalloc_tag.h>
>> #include <linux/pagewalk.h>
>> +#include <linux/secretmem.h>
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>
>> inode = file_inode(vma->vm_file);
>>
>> + if (mapping_inaccessible(inode->i_mapping) ||
>> + secretmem_mapping(inode->i_mapping))
>> + return false;
>> +
>> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>> }
>>
>>
>
> Hi David,
>
> Agreed, using mapping_inaccessible() for guest_memfd and
> secretmem_mapping() for secretmem is much simpler than introducing a
> new AS flag. No changes needed outside of file_thp_enabled().
>
> I will send a v2 with your suggested diff and test it on syzbot.
Let's wait a bit until we are in agreement that this is the right thing
to do :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
2026-02-09 23:37 ` kernel test robot
@ 2026-02-10 17:51 ` kernel test robot
2 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2026-02-10 17:51 UTC (permalink / raw)
To: Deepanshu Kartikey, akpm, david, lorenzo.stoakes, baolin.wang,
Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
ackerleytng, seanjc, pbonzini, michael.roth, vannapurve
Cc: llvm, oe-kbuild-all, ziy, linux-mm, linux-kernel,
Deepanshu Kartikey, syzbot+33a04338019ac7e43a44
Hi Deepanshu,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Deepanshu-Kartikey/mm-thp-Deny-THP-for-guest_memfd-and-secretmem-in-file_thp_enabled/20260209-113800
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20260209033558.22943-1-kartikey406%40gmail.com
patch subject: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20260211/202602110124.Y72YFz1K-lkp@intel.com/config)
compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260211/202602110124.Y72YFz1K-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602110124.Y72YFz1K-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/huge_memory.c:96:30: error: use of undeclared identifier 'GUEST_MEMFD_MAGIC'
96 | if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
| ^
>> mm/huge_memory.c:97:30: error: use of undeclared identifier 'SECRETMEM_MAGIC'
97 | inode->i_sb->s_magic == SECRETMEM_MAGIC)
| ^
2 errors generated.
vim +/GUEST_MEMFD_MAGIC +96 mm/huge_memory.c
84
85 static inline bool file_thp_enabled(struct vm_area_struct *vma)
86 {
87 struct inode *inode;
88
89 if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
90 return false;
91
92 if (!vma->vm_file)
93 return false;
94
95 inode = file_inode(vma->vm_file);
> 96 if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
> 97 inode->i_sb->s_magic == SECRETMEM_MAGIC)
98 return false;
99
100 return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
101 }
102
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-10 9:33 ` David Hildenbrand (Arm)
@ 2026-02-10 23:00 ` Ackerley Tng
2026-02-11 0:58 ` Ackerley Tng
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Ackerley Tng @ 2026-02-10 23:00 UTC (permalink / raw)
To: David Hildenbrand (Arm), Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
"David Hildenbrand (Arm)" <david@kernel.org> writes:
>>> BUT, something just occurred to me.
>>>
>>> We added the mc-handling in
>>>
>>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
>>> Author: Jiaqi Yan <jiaqiyan@google.com>
>>> Date: Wed Mar 29 08:11:19 2023 -0700
>>>
>>> mm/khugepaged: recover from poisoned anonymous memory
>>>
>>> ..
>>>
>>> So I assume kernels before that would crash when collapsing?
>>>
>>> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
>>>
>>> So I suspect we need a fix+stable backport.
>>>
>>> Who volunteers to try a secretmem reproducer on a stable kernel? :)
>>>
>>
>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
>> special-casing secretmem like you suggested below?
>
> Yes. If there is no guest_memfd we wouldn't need it.
>
Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
skipped.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469
>>
>>>
>>> The following is a bit nasty as well but should do the trick until we rip
>>> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>>>
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 03886d4ccecc..4ac1cb36b861 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -40,6 +40,7 @@
>>> #include <linux/pgalloc.h>
>>> #include <linux/pgalloc_tag.h>
>>> #include <linux/pagewalk.h>
>>> +#include <linux/secretmem.h>
>>>
>>> #include <asm/tlb.h>
>>> #include "internal.h"
>>> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>
>>> inode = file_inode(vma->vm_file);
>>>
>>> + if (mapping_inaccessible(inode->i_mapping) ||
>>> + secretmem_mapping(inode->i_mapping))
>>> + return false;
>>> +
Regarding checking mapping, is there any chance of racing with inode
release? (Might the mapping be freed?)
>>
>> Regarding the degradation of filesystems that don't support large folios
>> yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX
>> would disable collapsing for filesystems that actually want pages to be
>> collapsed, but don't update max folio order and hence appear to not
>> support large folios yet?
>>
>> What about a check like this instead
>>
>> if (!mapping_large_folio_support())
>> return false;
>
> That would essentially disable CONFIG_READ_ONLY_THP_FOR_FS (support for
> THP before filesystems started supporting large folios officially), no?
>
I think I get what you mean now. I was thinking to also update the
filesystems to specify AS_FOLIO_ORDER_MAX, but I think that is better
separated out as a different patch series, and this should focus on just
fixing the bug.
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-10 23:00 ` Ackerley Tng
@ 2026-02-11 0:58 ` Ackerley Tng
2026-02-11 2:01 ` Deepanshu Kartikey
2026-02-11 9:29 ` David Hildenbrand (Arm)
2026-02-11 1:59 ` Deepanshu Kartikey
2026-02-11 9:28 ` David Hildenbrand (Arm)
2 siblings, 2 replies; 29+ messages in thread
From: Ackerley Tng @ 2026-02-11 0:58 UTC (permalink / raw)
To: David Hildenbrand (Arm), Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
Ackerley Tng <ackerleytng@google.com> writes:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
>>>> BUT, something just occurred to me.
>>>>
>>>> We added the mc-handling in
>>>>
>>>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
>>>> Author: Jiaqi Yan <jiaqiyan@google.com>
>>>> Date: Wed Mar 29 08:11:19 2023 -0700
>>>>
>>>> mm/khugepaged: recover from poisoned anonymous memory
>>>>
>>>> ..
>>>>
>>>> So I assume kernels before that would crash when collapsing?
>>>>
>>>> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
>>>>
>>>> So I suspect we need a fix+stable backport.
>>>>
>>>> Who volunteers to try a secretmem reproducer on a stable kernel? :)
>>>>
>>>
>>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
>>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
>>> special-casing secretmem like you suggested below?
>>
>> Yes. If there is no guest_memfd we wouldn't need it.
>>
>
> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
> skipped.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469
>
On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not
anonymous [2].
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135
Same for 6.6.123 [3].
[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125
It breaks in 6.12.69 [4].
[4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.12.69#n159
IIUC the patch that enabled khugepaged for secretmem is
commit 7a81751fcdeb833acc858e59082688e3020bfe12
Author: Zach O'Keefe <zokeefe@google.com>
Date: Mon Sep 25 13:01:10 2023 -0700
mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
...
@@ -132,12 +132,18 @@ bool hugepage_vma_check(struct vm_area_struct
*vma, unsigned long vm_flags,
!hugepage_flags_always())))
return false;
- /* Only regular file is valid */
- if (!in_pf && file_thp_enabled(vma))
- return true;
-
- if (!vma_is_anonymous(vma))
+ if (!vma_is_anonymous(vma)) {
+ /*
+ * Trust that ->huge_fault() handlers know what they are doing
+ * in fault path.
+ */
+ if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
+ return true;
+ /* Only regular file is valid in collapse path */
+ if (((!in_pf || smaps)) && file_thp_enabled(vma))
+ return true;
return false;
+ }
if (vma_is_temporary_stack(vma))
return false;
Because file_thp_enabled() would return true for secretmem.
>>>
>>>>
>>>>
>>>> [...snip...]
>>>>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-10 23:00 ` Ackerley Tng
2026-02-11 0:58 ` Ackerley Tng
@ 2026-02-11 1:59 ` Deepanshu Kartikey
2026-02-11 9:28 ` David Hildenbrand (Arm)
2 siblings, 0 replies; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-11 1:59 UTC (permalink / raw)
To: Ackerley Tng
Cc: David Hildenbrand (Arm),
akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On Wed, Feb 11, 2026 at 4:30 AM Ackerley Tng <ackerleytng@google.com> wrote:
>
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
> >>> BUT, something just occurred to me.
> >>>
> >>> We added the mc-handling in
> >>>
> >>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
> >>> Author: Jiaqi Yan <jiaqiyan@google.com>
> >>> Date: Wed Mar 29 08:11:19 2023 -0700
> >>>
> >>> mm/khugepaged: recover from poisoned anonymous memory
> >>>
> >>> ..
> >>>
> >>> So I assume kernels before that would crash when collapsing?
> >>>
> >>> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
> >>>
> >>> So I suspect we need a fix+stable backport.
> >>>
> >>> Who volunteers to try a secretmem reproducer on a stable kernel? :)
> >>>
> >>
> >> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
> >> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
> >> special-casing secretmem like you suggested below?
> >
> > Yes. If there is no guest_memfd we wouldn't need it.
> >
>
> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
> skipped.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469
>
> >>
> >>>
> >>> The following is a bit nasty as well but should do the trick until we rip
> >>> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
> >>>
> >>>
> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>> index 03886d4ccecc..4ac1cb36b861 100644
> >>> --- a/mm/huge_memory.c
> >>> +++ b/mm/huge_memory.c
> >>> @@ -40,6 +40,7 @@
> >>> #include <linux/pgalloc.h>
> >>> #include <linux/pgalloc_tag.h>
> >>> #include <linux/pagewalk.h>
> >>> +#include <linux/secretmem.h>
> >>>
> >>> #include <asm/tlb.h>
> >>> #include "internal.h"
> >>> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>>
> >>> inode = file_inode(vma->vm_file);
> >>>
> >>> + if (mapping_inaccessible(inode->i_mapping) ||
> >>> + secretmem_mapping(inode->i_mapping))
> >>> + return false;
> >>> +
>
> Regarding checking mapping, is there any chance of racing with inode
> release? (Might the mapping be freed?)
>
> >>
I don't think so. file_thp_enabled() is called from
__thp_vma_allowable_orders(), which is reached via khugepaged,
MADV_COLLAPSE, or page faults. All these paths hold mmap_lock and
operate on a valid VMA. The VMA holds a reference to the file
(vma->vm_file), which holds a reference on the inode, so the inode
and its mapping cannot be freed while we are checking it..
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 0:58 ` Ackerley Tng
@ 2026-02-11 2:01 ` Deepanshu Kartikey
2026-02-11 9:29 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-11 2:01 UTC (permalink / raw)
To: Ackerley Tng
Cc: David Hildenbrand (Arm),
akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On Wed, Feb 11, 2026 at 6:28 AM Ackerley Tng <ackerleytng@google.com> wrote:
>
> Ackerley Tng <ackerleytng@google.com> writes:
>
> > "David Hildenbrand (Arm)" <david@kernel.org> writes:
> >
> >>>> BUT, something just occurred to me.
> >>>>
> >>>> We added the mc-handling in
> >>>>
> >>>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
> >>>> Author: Jiaqi Yan <jiaqiyan@google.com>
> >>>> Date: Wed Mar 29 08:11:19 2023 -0700
> >>>>
> >>>> mm/khugepaged: recover from poisoned anonymous memory
> >>>>
> >>>> ..
> >>>>
> >>>> So I assume kernels before that would crash when collapsing?
> >>>>
> >>>> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
> >>>>
> >>>> So I suspect we need a fix+stable backport.
> >>>>
> >>>> Who volunteers to try a secretmem reproducer on a stable kernel? :)
> >>>>
> >>>
> >>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
> >>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
> >>> special-casing secretmem like you suggested below?
> >>
> >> Yes. If there is no guest_memfd we wouldn't need it.
> >>
> >
> > Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
> > false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
> > skipped.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469
> >
>
> On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not
> anonymous [2].
>
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135
>
> Same for 6.6.123 [3].
>
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125
>
> It breaks in 6.12.69 [4].
>
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.12.69#n159
>
> IIUC the patch that enabled khugepaged for secretmem is
>
> commit 7a81751fcdeb833acc858e59082688e3020bfe12
> Author: Zach O'Keefe <zokeefe@google.com>
> Date: Mon Sep 25 13:01:10 2023 -0700
>
> mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
>
> ...
>
> @@ -132,12 +132,18 @@ bool hugepage_vma_check(struct vm_area_struct
> *vma, unsigned long vm_flags,
> !hugepage_flags_always())))
> return false;
>
> - /* Only regular file is valid */
> - if (!in_pf && file_thp_enabled(vma))
> - return true;
> -
> - if (!vma_is_anonymous(vma))
> + if (!vma_is_anonymous(vma)) {
> + /*
> + * Trust that ->huge_fault() handlers know what they are doing
> + * in fault path.
> + */
> + if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> + return true;
> + /* Only regular file is valid in collapse path */
> + if (((!in_pf || smaps)) && file_thp_enabled(vma))
> + return true;
> return false;
> + }
>
> if (vma_is_temporary_stack(vma))
> return false;
>
> Because file_thp_enabled() would return true for secretmem.
>
Thanks for the analysis on stable kernels, Ackerley. So the fix only
needs to target 6.12+ since that's where 7a81751fcdeb ("mm/thp: fix
'mm: thp: kill __transhuge_page_enabled()'") started routing secretmem
through file_thp_enabled().
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-10 23:00 ` Ackerley Tng
2026-02-11 0:58 ` Ackerley Tng
2026-02-11 1:59 ` Deepanshu Kartikey
@ 2026-02-11 9:28 ` David Hildenbrand (Arm)
2026-02-11 14:50 ` Deepanshu Kartikey
2026-02-11 15:38 ` Ackerley Tng
2 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 9:28 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
On 2/11/26 00:00, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
>>>
>>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
>>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
>>> special-casing secretmem like you suggested below?
>>
>> Yes. If there is no guest_memfd we wouldn't need it.
>>
>
> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
> skipped.
Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that:
/* Only regular file is valid */
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
(vm_flags & VM_EXEC)) {
struct inode *inode = vma->vm_file->f_inode;
return !inode_is_open_for_write(inode) &&
S_ISREG(inode->i_mode);
}
So if you have VM_EXEC on the VMA (mmaped with PROT_EXEC), it would work.
I think secretmem sets SB_I_NOEXEC, which prevents that. Same for guest_memfd.
v6.6.123 still has that VM_EXEC check in file_thp_enabled().
The check was dropped in commit:
commit 7fbb5e188248c50f737720825da1864ce42536d1
Author: Fangrui Song <i@maskray.me>
Date: Tue Dec 19 21:41:23 2023 -0800
mm: remove VM_EXEC requirement for THP eligibility
Commit e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP checking
in transparent_hugepage_enabled()") introduced the VM_EXEC requirement,
which is not strictly needed.
lld's default --rosegment option and GNU ld's -z separate-code option
(default on Linux/x86 since binutils 2.31) create a read-only PT_LOAD
segment without the PF_X flag, which should be eligible for THP.
So that one broke secretmem.
So when we fix it, we should
Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
What about the following:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 44ff8a648afd..9fbe5c28a6bc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
inode = file_inode(vma->vm_file);
+ if (IS_ANON_FILE(inode))
+ return false;
+
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 0:58 ` Ackerley Tng
2026-02-11 2:01 ` Deepanshu Kartikey
@ 2026-02-11 9:29 ` David Hildenbrand (Arm)
2026-02-11 16:16 ` Ackerley Tng
1 sibling, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 9:29 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On 2/11/26 01:58, Ackerley Tng wrote:
> Ackerley Tng <ackerleytng@google.com> writes:
>
>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>
>>>
>>> Yes. If there is no guest_memfd we wouldn't need it.
>>>
>>
>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
>> skipped.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469
>>
>
> On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not
> anonymous [2].
>
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135
>
> Same for 6.6.123 [3].
>
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125
>
> It breaks in 6.12.69 [4].
Do you have a reproducer? If so, which behavior does it trigger?
I would assume that we would suddenly have secretmem pages (THP) that
have a directmap. Or some page copy would crash the kernel.
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 9:28 ` David Hildenbrand (Arm)
@ 2026-02-11 14:50 ` Deepanshu Kartikey
2026-02-11 15:38 ` Ackerley Tng
1 sibling, 0 replies; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-11 14:50 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini,
michael.roth, vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
On Wed, Feb 11, 2026 at 2:58 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
>
> What about the following:
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 44ff8a648afd..9fbe5c28a6bc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
> inode = file_inode(vma->vm_file);
>
> + if (IS_ANON_FILE(inode))
> + return false;
> +
> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
This is an elegant solution. Instead of depending on specific subsystems,
IS_ANON_FILE() handles all pseudo-filesystem inodes generically, so any
future pseudo-fs won't run into the same issue.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 9:28 ` David Hildenbrand (Arm)
2026-02-11 14:50 ` Deepanshu Kartikey
@ 2026-02-11 15:38 ` Ackerley Tng
2026-02-11 16:45 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 29+ messages in thread
From: Ackerley Tng @ 2026-02-11 15:38 UTC (permalink / raw)
To: David Hildenbrand (Arm), Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
"David Hildenbrand (Arm)" <david@kernel.org> writes:
> On 2/11/26 00:00, Ackerley Tng wrote:
>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>
>>>>
>>>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
>>>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
>>>> special-casing secretmem like you suggested below?
>>>
>>> Yes. If there is no guest_memfd we wouldn't need it.
>>>
>>
>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
>> skipped.
>
> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that:
>
Ah... I was working on a reproducer then I realized 5.15 doesn't have
MADV_COLLAPSE, then I tried to hack in an ioctl to trigger
khugepaged. That turned out to be awkward but it got me to look at
hugepage_vma_check(), and then I went down the rabbit hole to keep
looking for the similar check function throughout the other stable
kernels... and amongst all of that forgot that
CONFIG_READ_ONLY_THP_FOR_FS was unset :(
You're probably right about VM_EXEC.
Here's the reproducer for 6.12, I put this in
tools/testing/selftests/mm/memfd_secret.c and called repro() from
main(). This time I enabled CONFIG_READ_ONLY_THP_FOR_FS :).
void repro(void)
{
uint8_t *mem;
int ret;
int fd;
int i;
printf("%d triggering secretmem\n", __LINE__);
fd = memfd_secret(0);
if (fd < 0) {
if (errno == ENOSYS)
ksft_exit_skip("memfd_secret is not supported\n");
else
ksft_exit_fail_msg("memfd_secret failed: %s\n",
strerror(errno));
}
if (ftruncate(fd, SZ_2M))
ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
#define ALIGNED_ADDRESS ((void*)0x400000000UL)
mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE, MAP_FIXED
| MAP_SHARED, fd, 0);
if (mem != ALIGNED_ADDRESS)
ksft_exit_fail_msg("Couldn't allocate memory\n");
ret = madvise(mem, SZ_2M, MADV_HUGEPAGE);
if (ret)
ksft_exit_fail_msg("MADV_HUGEPAGE failed mem=%p ret=%d errno=%d\n",
mem, ret, errno);
#define READ_ONCE(x) (*(volatile typeof(x) *) &(x))
for (i = 0; i < SZ_2M; i += getpagesize())
READ_ONCE(mem[i]);
ret = madvise(mem, SZ_2M, MADV_COLLAPSE);
if (ret)
ksft_exit_fail_msg("MADV_COLLAPSE failed ret=%d errno=%d\n", ret, errno);
munmap(mem, SZ_2M);
close(fd);
}
This reproducer gets us to madvise_collapse() ->
hpage_collapse_scan_file() -> collapse_file(), and copy_mc_highpage()
fails because copy_mc_to_kernel() returns 4096.
memory_failure_queue() causes this to be printed on the console
[ 1068.322578] Memory failure: 0x106d96f: recovery action for clean
unevictable LRU page: Recovered
No crash :) Is a crash the requirement for a backport to stable kernels?
> /* Only regular file is valid */
> if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> (vm_flags & VM_EXEC)) {
> struct inode *inode = vma->vm_file->f_inode;
>
> return !inode_is_open_for_write(inode) &&
> S_ISREG(inode->i_mode);
> }
>
>
> So if you have VM_EXEC on the VMA (mmaped with PROT_EXEC), it would work.
> I think secretmem sets SB_I_NOEXEC, which prevents that. Same for guest_memfd.
>
> v6.6.123 still has that VM_EXEC check in file_thp_enabled().
>
> The check was dropped in commit:
>
> commit 7fbb5e188248c50f737720825da1864ce42536d1
> Author: Fangrui Song <i@maskray.me>
> Date: Tue Dec 19 21:41:23 2023 -0800
>
> mm: remove VM_EXEC requirement for THP eligibility
>
> Commit e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP checking
> in transparent_hugepage_enabled()") introduced the VM_EXEC requirement,
> which is not strictly needed.
>
> lld's default --rosegment option and GNU ld's -z separate-code option
> (default on Linux/x86 since binutils 2.31) create a read-only PT_LOAD
> segment without the PF_X flag, which should be eligible for THP.
>
>
> So that one broke secretmem.
>
>
> So when we fix it, we should
>
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
>
>
> What about the following:
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 44ff8a648afd..9fbe5c28a6bc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
> inode = file_inode(vma->vm_file);
>
> + if (IS_ANON_FILE(inode))
> + return false;
> +
> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
>
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 9:29 ` David Hildenbrand (Arm)
@ 2026-02-11 16:16 ` Ackerley Tng
2026-02-11 16:35 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 29+ messages in thread
From: Ackerley Tng @ 2026-02-11 16:16 UTC (permalink / raw)
To: David Hildenbrand (Arm), Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
"David Hildenbrand (Arm)" <david@kernel.org> writes:
> On 2/11/26 01:58, Ackerley Tng wrote:
>> Ackerley Tng <ackerleytng@google.com> writes:
>>
>>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>>
>>>>
>>>> Yes. If there is no guest_memfd we wouldn't need it.
>>>>
>>>
>>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
>>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
>>> skipped.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469
>>>
>>
>> On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not
>> anonymous [2].
>>
>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135
>>
>> Same for 6.6.123 [3].
>>
>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125
>>
>> It breaks in 6.12.69 [4].
>
>
> Do you have a reproducer? If so, which behavior does it trigger?
>
> I would assume that we would suddenly have secretmem pages (THP) that
> have a directmap. Or some page copy would crash the kernel.
>
Is there a good way to verify from userspace that the directmap hasn't
been restored? Should I use CONFIG_PTDUMP_DEBUGFS?
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 16:16 ` Ackerley Tng
@ 2026-02-11 16:35 ` David Hildenbrand (Arm)
2026-02-11 16:44 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 16:35 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On 2/11/26 17:16, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
>> On 2/11/26 01:58, Ackerley Tng wrote:
>>> Ackerley Tng <ackerleytng@google.com> writes:
>>>
>>>
>>> On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not
>>> anonymous [2].
>>>
>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135
>>>
>>> Same for 6.6.123 [3].
>>>
>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125
>>>
>>> It breaks in 6.12.69 [4].
>>
>>
>> Do you have a reproducer? If so, which behavior does it trigger?
>>
>> I would assume that we would suddenly have secretmem pages (THP) that
>> have a directmap. Or some page copy would crash the kernel.
>>
>
> Is there a good way to verify from userspace that the directmap hasn't
> been restored? Should I use CONFIG_PTDUMP_DEBUGFS?
Anything that uses GUP must fail on that secretmem memory. Like doing an
O_DIRECT read/write or using vmsplice.
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 16:35 ` David Hildenbrand (Arm)
@ 2026-02-11 16:44 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 16:44 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44
On 2/11/26 17:35, David Hildenbrand (Arm) wrote:
> On 2/11/26 17:16, Ackerley Tng wrote:
>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>
>>>
>>>
>>> Do you have a reproducer? If so, which behavior does it trigger?
>>>
>>> I would assume that we would suddenly have secretmem pages (THP) that
>>> have a directmap. Or some page copy would crash the kernel.
>>>
>>
>> Is there a good way to verify from userspace that the directmap hasn't
>> been restored? Should I use CONFIG_PTDUMP_DEBUGFS?
>
> Anything that uses GUP must fail on that secretmem memory. Like doing an
> O_DIRECT read/write or using vmsplice.
>
Ah, but that might still fail, because we can identify the page as such.
Hm ... we'd need some introspection interface indeed.
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 15:38 ` Ackerley Tng
@ 2026-02-11 16:45 ` David Hildenbrand (Arm)
2026-02-12 22:19 ` Ackerley Tng
0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 16:45 UTC (permalink / raw)
To: Ackerley Tng, Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
On 2/11/26 16:38, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
>> On 2/11/26 00:00, Ackerley Tng wrote:
>>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>>
>>>
>>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
>>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
>>> skipped.
>>
>> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that:
>>
>
> Ah... I was working on a reproducer then I realized 5.15 doesn't have
> MADV_COLLAPSE, then I tried to hack in an ioctl to trigger
> khugepaged. That turned out to be awkward but it got me to look at
> hugepage_vma_check(), and then I went down the rabbit hole to keep
> looking for the similar check function throughout the other stable
> kernels... and amongst all of that forgot that
> CONFIG_READ_ONLY_THP_FOR_FS was unset :(
>
> You're probably right about VM_EXEC.
>
> Here's the reproducer for 6.12, I put this in
> tools/testing/selftests/mm/memfd_secret.c and called repro() from
> main(). This time I enabled CONFIG_READ_ONLY_THP_FOR_FS :).
>
> void repro(void)
> {
> uint8_t *mem;
> int ret;
> int fd;
> int i;
>
> printf("%d triggering secretmem\n", __LINE__);
>
> fd = memfd_secret(0);
> if (fd < 0) {
> if (errno == ENOSYS)
> ksft_exit_skip("memfd_secret is not supported\n");
> else
> ksft_exit_fail_msg("memfd_secret failed: %s\n",
> strerror(errno));
> }
>
> if (ftruncate(fd, SZ_2M))
> ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
>
> #define ALIGNED_ADDRESS ((void*)0x400000000UL)
>
> mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE, MAP_FIXED
> | MAP_SHARED, fd, 0);
> if (mem != ALIGNED_ADDRESS)
> ksft_exit_fail_msg("Couldn't allocate memory\n");
>
> ret = madvise(mem, SZ_2M, MADV_HUGEPAGE);
> if (ret)
> ksft_exit_fail_msg("MADV_HUGEPAGE failed mem=%p ret=%d errno=%d\n",
> mem, ret, errno);
>
> #define READ_ONCE(x) (*(volatile typeof(x) *) &(x))
> for (i = 0; i < SZ_2M; i += getpagesize())
> READ_ONCE(mem[i]);
>
> ret = madvise(mem, SZ_2M, MADV_COLLAPSE);
> if (ret)
> ksft_exit_fail_msg("MADV_COLLAPSE failed ret=%d errno=%d\n", ret, errno);
>
> munmap(mem, SZ_2M);
> close(fd);
> }
>
> This reproducer gets us to madvise_collapse() ->
> hpage_collapse_scan_file() -> collapse_file(), and copy_mc_highpage()
> fails because copy_mc_to_kernel() returns 4096.
>
> memory_failure_queue() causes this to be printed on the console
>
> [ 1068.322578] Memory failure: 0x106d96f: recovery action for clean
> unevictable LRU page: Recovered
>
> No crash :) Is a crash the requirement for a backport to stable kernels?
I'd say being able to trigger that is sufficient. There is no real
memory failure :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-11 16:45 ` David Hildenbrand (Arm)
@ 2026-02-12 22:19 ` Ackerley Tng
2026-02-13 5:02 ` Deepanshu Kartikey
0 siblings, 1 reply; 29+ messages in thread
From: Ackerley Tng @ 2026-02-12 22:19 UTC (permalink / raw)
To: David Hildenbrand (Arm), Deepanshu Kartikey
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
"David Hildenbrand (Arm)" <david@kernel.org> writes:
Going to try and summarize the findings/discussions here, copying from a
few earlier emails. David, you can jump directly to [Question].
> On 2/11/26 16:38, Ackerley Tng wrote:
>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>
>>> On 2/11/26 00:00, Ackerley Tng wrote:
>>>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>>>
>>>>
>>>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
>>>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
>>>> skipped.
>>>
>>> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that:
>>>
>>
>> Ah... I was working on a reproducer then I realized 5.15 doesn't have
>> MADV_COLLAPSE, then I tried to hack in an ioctl to trigger
>> khugepaged. That turned out to be awkward but it got me to look at
>> hugepage_vma_check(), and then I went down the rabbit hole to keep
>> looking for the similar check function throughout the other stable
>> kernels... and amongst all of that forgot that
>> CONFIG_READ_ONLY_THP_FOR_FS was unset :(
>>
>> You're probably right about VM_EXEC.
>>
[Bug]
khugepaged (and MADV_COLLAPSE) will try to collapse secretmem pages with
MADV_HUGEPAGE applied. There is no crash, but there is a false memory
failure printout that looks like
[ 1068.322578] Memory failure: 0x106d96f: recovery action for
clean unevictable LRU page: Recovered
The correct Fixes tag should be:
Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
I was able to reproduce this on 6.12, 6.18 and HEAD
[Stable Backports]
The first stable version this affects is 6.12.
In 6.12, S_ANON_INODE does not yet exist, so I think in
file_thp_enabled() we can return false if vma_is_secretmem(vma).
6.18 needs a fix for both secretmem and guest_memfd.
[Solution]
For 6.18 and later, David's suggestion of using IS_ANON_FILE() seems to
work. This affects more filesystems than just secretmem and guest_memfd
though.
[Question]
I'm not familiar with the concept of anonymous inodes. What does that
entail? Why is it suitable in deciding THP eligibility?
[Next Steps]
I'm going to be traveling over the next few weeks, so perhaps Deepanshu
can help with the fixup patches for 6.12, 6.18 and HEAD?
[Details]
Here's a reproducer for 6.18 for guest_memfd x MADV_COLLAPSE
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c
b/tools/testing/selftests/kvm/guest_memfd_test.c
index e7d9aeb418d3..8760fe6fa482 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -371,10 +371,45 @@ static void test_guest_memfd_guest(void)
kvm_vm_free(vm);
}
+#define ALIGNED_ADDRESS ((void *)0x400000000UL)
+
+static void repro(void)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ uint8_t *mem;
+ int fd, i;
+
+ vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu,
1, guest_code);
+
+ fd = vm_create_guest_memfd(vm, SZ_2M, GUEST_MEMFD_FLAG_MMAP |
+ GUEST_MEMFD_FLAG_INIT_SHARED);
+
+ mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_SHARED, fd, 0);
+ TEST_ASSERT_EQ(mem, ALIGNED_ADDRESS);
+
+ for (i = 0; i < SZ_2M; i += getpagesize())
+ READ_ONCE(mem[i]);
+
+ TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_HUGEPAGE), 0);
+
+ TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_COLLAPSE), 0);
+
+ TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_DONTNEED), 0);
+
+ READ_ONCE(mem[0]);
+
+ close(fd);
+ kvm_vm_free(vm);
+}
+
int main(int argc, char *argv[])
{
unsigned long vm_types, vm_type;
+ repro();
+ return 1;
+
TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
page_size = getpagesize();
console shows warning:
[ 558.315452] WARNING: CPU: 1 PID: 252 at
arch/x86/kvm/../../../virt/kvm/guest_memfd.c:372
kvm_gmem_fault_user_mapping+0x120/0x1c0
stdout output:
# /mnt/host/kvm/guest_memfd_test
Random seed: 0x6b8b4567
__vm_create: mode='PA-bits:ANY, VA-bits:48, 4K pages' type='0', pages='657'
Guest physical address width detected: 46
Bus error
#
Here's a more complete reproducer for 6.12 for secretmem x MADV_COLLAPSE
diff --git a/tools/testing/selftests/mm/memfd_secret.c
b/tools/testing/selftests/mm/memfd_secret.c
index 9a0597310a76..2a0c5cc9fe20 100644
--- a/tools/testing/selftests/mm/memfd_secret.c
+++ b/tools/testing/selftests/mm/memfd_secret.c
@@ -21,6 +21,7 @@
#include <errno.h>
#include <stdio.h>
#include <fcntl.h>
+#include <stddef.h>
#include "../kselftest.h"
@@ -299,10 +300,145 @@ static void prepare(void)
#define NUM_TESTS 6
+#define SZ_2M (2UL << 20)
+#define ALIGNED_ADDRESS ((void *)0x400000000UL)
+#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
+
+uint64_t get_pfn(void *addr) {
+ uint64_t pagemap_entry;
+ static int fd = -1;
+ uintptr_t offset;
+ uintptr_t vaddr;
+
+ if (fd < 0) {
+ fd = open("/proc/self/pagemap", O_RDONLY);
+ if (fd < 0)
+ ksft_exit_fail_msg("open pagemap\n");
+ }
+
+ vaddr = (uintptr_t)addr;
+ offset = (vaddr / getpagesize()) * sizeof(uint64_t);
+
+ if (pread(fd, &pagemap_entry, sizeof(uint64_t), offset) !=
sizeof(uint64_t))
+ ksft_exit_fail_msg("pread pagemap\n");
+
+
+ /* Bit 63 is "present" */
+ if (!(pagemap_entry & (1ULL << 63)))
+ ksft_exit_fail_msg("Page not present in userspace pagemap\n");
+
+ /* Bits 0-54 are the PFN */
+ return pagemap_entry & ((1ULL << 55) - 1);
+}
+
+bool in_direct_map(uint64_t pfn) {
+ static int devmem_fd = -1;
+ uint8_t bounce;
+
+ if (devmem_fd < 0) {
+ devmem_fd = open("/dev/mem", O_RDONLY);
+ if (devmem_fd < 0)
+ ksft_exit_fail_msg("Can't open /dev/mem:
%s\n", strerror(errno));
+ }
+
+ if (pread(devmem_fd, &bounce, 1, pfn * getpagesize()) == 1) {
+ return true;
+ } else {
+ if (errno == EFAULT)
+ return false;
+ else if (errno == EPERM)
+ ksft_exit_fail_msg("Access probably blocked:
%s\n", strerror(errno));
+ else
+ perror("pread /dev/mem");
+
+ return false;
+ }
+}
+
+void check(void)
+{
+ uint64_t pfn;
+ uint8_t *mem;
+
+ mem = mmap(NULL, SZ_2M, PROT_READ | PROT_WRITE, MAP_PRIVATE |
MAP_ANONYMOUS, -1, 0);
+ if (mem == MAP_FAILED)
+ ksft_exit_fail_msg("Couldn't allocate memory\n");
+
+ mem[0] = 'A';
+
+ pfn = get_pfn(mem);
+ printf("%d pfn=%lx in_direct_map=%d\n", __LINE__, pfn,
in_direct_map(pfn));
+
+ munmap(mem, SZ_2M);
+}
+
+void repro(void)
+{
+ uint64_t pfn;
+ uint8_t *mem;
+ int ret;
+ int fd;
+ int i;
+
+ printf("%d triggering secretmem\n", __LINE__);
+
+ fd = memfd_secret(0);
+ if (fd < 0) {
+ if (errno == ENOSYS)
+ ksft_exit_skip("memfd_secret is not supported\n");
+ else
+ ksft_exit_fail_msg("memfd_secret failed: %s\n",
+ strerror(errno));
+ }
+
+ if (ftruncate(fd, SZ_2M))
+ ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
+
+ mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_SHARED, fd, 0);
+ if (mem != ALIGNED_ADDRESS)
+ ksft_exit_fail_msg("Couldn't allocate memory\n");
+
+ for (i = 0; i < SZ_2M; i += getpagesize())
+ READ_ONCE(mem[i]);
+
+ pfn = get_pfn(mem);
+ printf("%d pfn=%lx in_direct_map=%d\n", __LINE__, pfn,
in_direct_map(pfn));
+
+ ret = madvise(mem, SZ_2M, MADV_HUGEPAGE);
+ if (ret)
+ ksft_exit_fail_msg("MADV_HUGEPAGE failed mem=%p ret=%d
errno=%d\n", mem, ret, errno);
+
+ ret = madvise(mem, SZ_2M, MADV_COLLAPSE);
+ if (ret != -1 || errno != EINVAL)
+ ksft_exit_fail_msg("MADV_COLLAPSE should have failed
ret=%d errno=%d\n", ret, errno);
+
+ /*
+ * Sleep allows memory_failure to complete, IIUC. If memory
+ * failure handling doesn't complete, faulting in memory in
+ * the next step fails with SIGBUS, as expected.
+ */
+ sleep(1);
+
+ for (i = 0; i < SZ_2M; i += getpagesize())
+ READ_ONCE(mem[i]);
+
+ printf("%d pfn=%lx in_direct_map=%d\n", __LINE__, pfn,
in_direct_map(pfn));
+
+ pfn = get_pfn(mem);
+ printf("%d new pfn=%lx in_direct_map=%d\n", __LINE__, pfn,
in_direct_map(pfn));
+
+ munmap(mem, SZ_2M);
+ close(fd);
+}
+
int main(int argc, char *argv[])
{
int fd;
+ check();
+ repro();
+ return 1;
+
prepare();
ksft_print_header();
Special configs:
+ Enable CONFIG_READ_ONLY_THP_FOR_FS
+ Disable CONFIG_STRICT_DEVMEM (so that reading /dev/mem will return
-EFAULT for memory not in the direct map, just for testing)
stdout output with annotations:
# /mnt/host/mm/memfd_secret
370 pfn=106a600 in_direct_map=1 <<== my check that direct map check works
383 triggering secretmem
405 pfn=106f568 in_direct_map=0 <<== secretmem is indeed not in
the direct map
425 pfn=106f568 in_direct_map=1 <<== after memory failure
handling, folio is restored to direct map
428 new pfn=106be67 in_direct_map=0 <<== next fault: secretmem has a
new folio not in the direct map
#
>>
>> [...snip...]
>>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-12 22:19 ` Ackerley Tng
@ 2026-02-13 5:02 ` Deepanshu Kartikey
2026-02-13 9:06 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-13 5:02 UTC (permalink / raw)
To: Ackerley Tng
Cc: David Hildenbrand (Arm),
akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
On Fri, Feb 13, 2026 at 3:49 AM Ackerley Tng <ackerleytng@google.com> wrote:
>
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>
> Going to try and summarize the findings/discussions here, copying from a
> few earlier emails. David, you can jump directly to [Question].
>
> > On 2/11/26 16:38, Ackerley Tng wrote:
> >> "David Hildenbrand (Arm)" <david@kernel.org> writes:
> >>
> >>> On 2/11/26 00:00, Ackerley Tng wrote:
> >>>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
> >>>>
> >>>>
> >>>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return
> >>>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are
> >>>> skipped.
> >>>
> >>> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that:
> >>>
> >>
> >> Ah... I was working on a reproducer then I realized 5.15 doesn't have
> >> MADV_COLLAPSE, then I tried to hack in an ioctl to trigger
> >> khugepaged. That turned out to be awkward but it got me to look at
> >> hugepage_vma_check(), and then I went down the rabbit hole to keep
> >> looking for the similar check function throughout the other stable
> >> kernels... and amongst all of that forgot that
> >> CONFIG_READ_ONLY_THP_FOR_FS was unset :(
> >>
> >> You're probably right about VM_EXEC.
> >>
>
> [Bug]
> khugepaged (and MADV_COLLAPSE) will try to collapse secretmem pages with
> MADV_HUGEPAGE applied. There is no crash, but there is a false memory
> failure printout that looks like
>
> [ 1068.322578] Memory failure: 0x106d96f: recovery action for
> clean unevictable LRU page: Recovered
>
> The correct Fixes tag should be:
>
> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
>
> I was able to reproduce this on 6.12, 6.18 and HEAD
>
> [Stable Backports]
> The first stable version this affects is 6.12.
>
> In 6.12, S_ANON_INODE does not yet exist, so I think in
> file_thp_enabled() we can return false if vma_is_secretmem(vma).
>
> 6.18 needs a fix for both secretmem and guest_memfd.
>
> [Solution]
> For 6.18 and later, David's suggestion of using IS_ANON_FILE() seems to
> work. This affects more filesystems than just secretmem and guest_memfd
> though.
>
> [Question]
> I'm not familiar with the concept of anonymous inodes. What does that
> entail? Why is it suitable in deciding THP eligibility?
>
> [Next Steps]
> I'm going to be traveling over the next few weeks, so perhaps Deepanshu
> can help with the fixup patches for 6.12, 6.18 and HEAD?
>
Hi David,
Thanks Ackerley for the reproducer and analysis. Since Ackerley will be
traveling, I can take this forward.
Here is the approach I am planning:
For HEAD / 6.18:
- Add IS_ANON_FILE(inode) check in file_thp_enabled() as you suggested
- Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
- Cc: stable@vger.kernel.org
For 6.12 stable backport:
- IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, so use
mapping_inaccessible() || secretmem_mapping() in file_thp_enabled()
instead
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-13 5:02 ` Deepanshu Kartikey
@ 2026-02-13 9:06 ` David Hildenbrand (Arm)
2026-02-21 4:37 ` Deepanshu Kartikey
0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-13 9:06 UTC (permalink / raw)
To: Deepanshu Kartikey, Ackerley Tng
Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth,
vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
On 2/13/26 06:02, Deepanshu Kartikey wrote:
> On Fri, Feb 13, 2026 at 3:49 AM Ackerley Tng <ackerleytng@google.com> wrote:
>>
>> "David Hildenbrand (Arm)" <david@kernel.org> writes:
>>
>> Going to try and summarize the findings/discussions here, copying from a
>> few earlier emails. David, you can jump directly to [Question].
>>
>>
>> [Bug]
>> khugepaged (and MADV_COLLAPSE) will try to collapse secretmem pages with
>> MADV_HUGEPAGE applied. There is no crash, but there is a false memory
>> failure printout that looks like
>>
>> [ 1068.322578] Memory failure: 0x106d96f: recovery action for
>> clean unevictable LRU page: Recovered
>>
>> The correct Fixes tag should be:
>>
>> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
>>
>> I was able to reproduce this on 6.12, 6.18 and HEAD
>>
>> [Stable Backports]
>> The first stable version this affects is 6.12.
>>
>> In 6.12, S_ANON_INODE does not yet exist, so I think in
>> file_thp_enabled() we can return false if vma_is_secretmem(vma).
>>
>> 6.18 needs a fix for both secretmem and guest_memfd.
>>
>> [Solution]
>> For 6.18 and later, David's suggestion of using IS_ANON_FILE() seems to
>> work. This affects more filesystems than just secretmem and guest_memfd
>> though.
>>
>> [Question]
>> I'm not familiar with the concept of anonymous inodes. What does that
>> entail? Why is it suitable in deciding THP eligibility?
>>
>> [Next Steps]
>> I'm going to be traveling over the next few weeks, so perhaps Deepanshu
>> can help with the fixup patches for 6.12, 6.18 and HEAD?
>>
>
> Hi David,
>
> Thanks Ackerley for the reproducer and analysis. Since Ackerley will be
> traveling, I can take this forward.
>
> Here is the approach I am planning:
>
> For HEAD / 6.18:
> - Add IS_ANON_FILE(inode) check in file_thp_enabled() as you suggested
> - Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility")
> - Cc: stable@vger.kernel.org
Right. Please link the mail with Ackerley's reproducers and carefully
describe the implications. Then describe how anon inodes never pass the
"opened writable" check and that the clean thing to do is to revert to
disallowing anon inodes altogether.
Also describe how secretmem is not affected upstream, but triggers the
confusing memory failure errors.
>
> For 6.12 stable backport:
> - IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, so use
> mapping_inaccessible() || secretmem_mapping() in file_thp_enabled()
> instead
I think secretmem_mapping() is sufficient there given that guest_memfd
does not apply yet.
But we can discuss the details about the backport once the upstream fix
is in.
Thanks!
--
Cheers,
David
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
2026-02-13 9:06 ` David Hildenbrand (Arm)
@ 2026-02-21 4:37 ` Deepanshu Kartikey
0 siblings, 0 replies; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-21 4:37 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett,
npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini,
michael.roth, vannapurve, ziy, linux-mm, linux-kernel,
syzbot+33a04338019ac7e43a44, Fangrui Song
On Fri, Feb 13, 2026 at 2:36 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> > For 6.12 stable backport:
> > - IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, so use
> > mapping_inaccessible() || secretmem_mapping() in file_thp_enabled()
> > instead
>
> I think secretmem_mapping() is sufficient there given that guest_memfd
> does not apply yet.
>
> But we can discuss the details about the backport once the upstream fix
> is in.
>
Subject: Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
Hi David,
The upstream fix is now in mm-unstable and linux-next. Should I send a
backport for 6.12 stable?
Since IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, I was
planning to use secretmem_mapping() in file_thp_enabled() as you
suggested. guest_memfd mmap is not present in 6.12 so only secretmem
needs fixing there.
Thanks,
Deepanshu
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2026-02-21 4:37 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
2026-02-09 10:41 ` David Hildenbrand (Arm)
2026-02-09 13:06 ` Deepanshu Kartikey
2026-02-09 18:22 ` Ackerley Tng
2026-02-09 19:45 ` David Hildenbrand (Arm)
2026-02-09 20:13 ` David Hildenbrand (Arm)
2026-02-09 21:31 ` Ackerley Tng
2026-02-10 9:33 ` David Hildenbrand (Arm)
2026-02-10 23:00 ` Ackerley Tng
2026-02-11 0:58 ` Ackerley Tng
2026-02-11 2:01 ` Deepanshu Kartikey
2026-02-11 9:29 ` David Hildenbrand (Arm)
2026-02-11 16:16 ` Ackerley Tng
2026-02-11 16:35 ` David Hildenbrand (Arm)
2026-02-11 16:44 ` David Hildenbrand (Arm)
2026-02-11 1:59 ` Deepanshu Kartikey
2026-02-11 9:28 ` David Hildenbrand (Arm)
2026-02-11 14:50 ` Deepanshu Kartikey
2026-02-11 15:38 ` Ackerley Tng
2026-02-11 16:45 ` David Hildenbrand (Arm)
2026-02-12 22:19 ` Ackerley Tng
2026-02-13 5:02 ` Deepanshu Kartikey
2026-02-13 9:06 ` David Hildenbrand (Arm)
2026-02-21 4:37 ` Deepanshu Kartikey
2026-02-10 1:51 ` Deepanshu Kartikey
2026-02-10 9:33 ` David Hildenbrand (Arm)
2026-02-09 23:37 ` kernel test robot
2026-02-10 17:51 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox