* [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
@ 2026-02-09 3:35 Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Deepanshu Kartikey @ 2026-02-09 3:35 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini,
michael.roth, vannapurve
Cc: ziy, linux-mm, linux-kernel, Deepanshu Kartikey,
syzbot+33a04338019ac7e43a44, Deepanshu Kartikey
file_thp_enabled() incorrectly returns true for guest_memfd and secretmem
inodes because they appear as regular read-only files when
CONFIG_READ_ONLY_THP_FOR_FS is enabled. This allows khugepaged and
MADV_COLLAPSE to create large folios in the page cache, but their fault
handlers do not support large folios.
Add explicit checks for GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC to reject
these filesystems early in file_thp_enabled().
Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
---
mm/huge_memory.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..4f57c78b57dd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
return false;
inode = file_inode(vma->vm_file);
+ if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
+ inode->i_sb->s_magic == SECRETMEM_MAGIC)
+ return false;
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
--
2.43.0
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey @ 2026-02-09 10:24 ` David Hildenbrand (Arm) 2026-02-09 10:41 ` David Hildenbrand (Arm) 2026-02-09 23:37 ` kernel test robot 2026-02-10 17:51 ` kernel test robot 2 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 10:24 UTC (permalink / raw) To: Deepanshu Kartikey, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini, michael.roth, vannapurve Cc: ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/9/26 04:35, Deepanshu Kartikey wrote: > file_thp_enabled() incorrectly returns true for guest_memfd and secretmem > inodes because they appear as regular read-only files when > CONFIG_READ_ONLY_THP_FOR_FS is enabled. This allows khugepaged and > MADV_COLLAPSE to create large folios in the page cache, but their fault > handlers do not support large folios. > > Add explicit checks for GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC to reject > these filesystems early in file_thp_enabled(). > > Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44 > Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com > Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> So we were able to reproduce this with secretmem, right? We want to add "Fixes:" for the introducing commits, which would be he commits that enable secretmem and mapping of guest_memfd pages to user space. Can you identify them? And also Cc: stable@vger.kernel.org > --- > mm/huge_memory.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 40cf59301c21..4f57c78b57dd 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > return false; > > inode = file_inode(vma->vm_file); > + if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC || > + inode->i_sb->s_magic == SECRETMEM_MAGIC) > + return false; That's nasty. We want some way to identify that through the mapping. Unfortunately CONFIG_READ_ONLY_THP_FOR_FS ignores any mapping_set_large_folios() configs by design. And CONFIG_READ_ONLY_THP_FOR_FS might go away soon, but we need a fix until then. While we can identify secretmem through vma_is_secretmem(), we can't do the same for guest_memfd as it's built as a module. Unfortunately AS_NO_DIRECT_MAP[1] won't work. Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily rip out along with CONFIG_READ_ONLY_THP_FOR_FS later? [1] https://lore.kernel.org/r/20260126164445.11867-6-kalyazin@amazon.com -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 10:24 ` David Hildenbrand (Arm) @ 2026-02-09 10:41 ` David Hildenbrand (Arm) 2026-02-09 13:06 ` Deepanshu Kartikey 0 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 10:41 UTC (permalink / raw) To: Deepanshu Kartikey, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini, michael.roth, vannapurve Cc: ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/9/26 11:24, David Hildenbrand (Arm) wrote: > On 2/9/26 04:35, Deepanshu Kartikey wrote: >> file_thp_enabled() incorrectly returns true for guest_memfd and secretmem >> inodes because they appear as regular read-only files when >> CONFIG_READ_ONLY_THP_FOR_FS is enabled. This allows khugepaged and >> MADV_COLLAPSE to create large folios in the page cache, but their fault >> handlers do not support large folios. >> >> Add explicit checks for GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC to reject >> these filesystems early in file_thp_enabled(). >> >> Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com >> Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44 >> Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com >> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> > > So we were able to reproduce this with secretmem, right? > > We want to add "Fixes:" for the introducing commits, which would be he > commits that enable secretmem and mapping of guest_memfd pages to user > space. Can you identify them? > > And also > > Cc: stable@vger.kernel.org > >> --- >> mm/huge_memory.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 40cf59301c21..4f57c78b57dd 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct >> vm_area_struct *vma) >> return false; >> inode = file_inode(vma->vm_file); >> + if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC || >> + inode->i_sb->s_magic == SECRETMEM_MAGIC) >> + return false; > > That's nasty. We want some way to identify that through the mapping. > > Unfortunately CONFIG_READ_ONLY_THP_FOR_FS ignores any > mapping_set_large_folios() configs by design. > > And CONFIG_READ_ONLY_THP_FOR_FS might go away soon, but we need a fix > until then. > > While we can identify secretmem through vma_is_secretmem(), we can't do > the same for guest_memfd as it's built as a module. > > Unfortunately AS_NO_DIRECT_MAP[1] won't work. > > Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily > rip out along with CONFIG_READ_ONLY_THP_FOR_FS later? On second thought, why do we pass the !inode_is_open_for_write(inode) in file_thp_enabled()? Isn't that the main problem for these memfd things? Maybe a get_write_access() is missing somewhere? -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 10:41 ` David Hildenbrand (Arm) @ 2026-02-09 13:06 ` Deepanshu Kartikey 2026-02-09 18:22 ` Ackerley Tng 0 siblings, 1 reply; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-09 13:06 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@kernel.org> wrote: > > > Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily > > rip out along with CONFIG_READ_ONLY_THP_FOR_FS later? > > On second thought, why do we pass the > > !inode_is_open_for_write(inode) > > in file_thp_enabled()? > > Isn't that the main problem for these memfd things? > > Maybe a get_write_access() is missing somewhere? > Hi David, Thanks for the suggestion. I looked into the get_write_access() path. Both guest_memfd and secretmem use alloc_file_pseudo() which skips calling get_write_access(), so i_writecount stays 0. That's why file_thp_enabled() sees them as read-only files. We could add get_write_access() after alloc_file_pseudo() in both, but I think that would be a hack rather than a proper fix: - i_writecount has a specific semantic: tracking how many fds have the file open for writing. We'd be bumping it just to influence file_thp_enabled() behavior. - It doesn't express the actual intent. The real issue is that CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem backed files. I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is the cleaner approach. It is explicit, has no side effects, and is easy to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away. Here is the diff: diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ec442af3f886..23f559fc1a4c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -211,6 +211,7 @@ enum mapping_flags { AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */ AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */ + AS_NO_READ_ONLY_THP_FOR_FS = 12, /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 40cf59301c21..4bdda92ce01e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) inode = file_inode(vma->vm_file); + if (test_bit(AS_NO_READ_ONLY_THP_FOR_FS, &inode->i_mapping->flags)) + return false; + return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } diff --git a/mm/secretmem.c b/mm/secretmem.c index edf111e0a1bb..56d93a74f5fc 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -205,7 +205,8 @@ static struct file *secretmem_file_create(unsigned long flags) mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); mapping_set_unevictable(inode->i_mapping); + set_bit(AS_NO_READ_ONLY_THP_FOR_FS, &inode->i_mapping->flags); inode->i_op = &secretmem_iops; inode->i_mapping->a_ops = &secretmem_aops; diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index fdaea3422c30..b93a324c81bd 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -597,6 +597,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) inode->i_size = size; mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); mapping_set_inaccessible(inode->i_mapping); + set_bit(AS_NO_READ_ONLY_THP_FOR_FS, &inode->i_mapping->flags); /* Unmovable mappings are supposed to be marked unevictable as well. */ WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); Please let me know if this looks good and I will send a formal v2. Thanks, Deepanshu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 13:06 ` Deepanshu Kartikey @ 2026-02-09 18:22 ` Ackerley Tng 2026-02-09 19:45 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 29+ messages in thread From: Ackerley Tng @ 2026-02-09 18:22 UTC (permalink / raw) To: Deepanshu Kartikey, David Hildenbrand (Arm) Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 Deepanshu Kartikey <kartikey406@gmail.com> writes: > On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@kernel.org> wrote: >> >> > Maybe introduce a AS_NO_READ_ONLY_THP_FOR_FS, which we can just easily >> > rip out along with CONFIG_READ_ONLY_THP_FOR_FS later? >> >> On second thought, why do we pass the >> >> !inode_is_open_for_write(inode) >> >> in file_thp_enabled()? >> >> Isn't that the main problem for these memfd things? >> >> Maybe a get_write_access() is missing somewhere? >> > > Hi David, > > Thanks for the suggestion. I looked into the get_write_access() path. > > Both guest_memfd and secretmem use alloc_file_pseudo() which skips > calling get_write_access(), so i_writecount stays 0. That's why > file_thp_enabled() sees them as read-only files. > > We could add get_write_access() after alloc_file_pseudo() in both, but > I think that would be a hack rather than a proper fix: > > - i_writecount has a specific semantic: tracking how many fds have the > file open for writing. We'd be bumping it just to influence > file_thp_enabled() behavior. > I agree re-using i_writecount feels odd since it is abusing the idea of being written to. I might have misunderstood the full context of i_writecount though. > - It doesn't express the actual intent. The real issue is that > CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem > backed files. > > I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is > the cleaner approach. It is explicit, has no side effects, and is easy > to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away. > I was considering other address space flags and I think the best might be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in __vma_thp_allowable_orders() check the maximum allowed order for the address space. khugepaged is about consolidating memory to huge pages, so if the address space doesn't allow a larger folio order, then khugepaged should not operate on that memory. The other options are + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE, but IIUC evictability is more closely related to swapping and khugepaged might operate on swappable memory? Both guest_memfd and secretmem set AS_UNEVICTABLE. + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used to block migration. khugepaged kind of migrates the memory contents too, but someday we want guest_memfd to support migration, and at that time we would still want to block khugepaged, so I don't think we want to reuse a flag that couples khugepaged to migration. > > [...snip...] > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 18:22 ` Ackerley Tng @ 2026-02-09 19:45 ` David Hildenbrand (Arm) 2026-02-09 20:13 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 19:45 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/9/26 19:22, Ackerley Tng wrote: > Deepanshu Kartikey <kartikey406@gmail.com> writes: > >> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@kernel.org> wrote: >>> >>> >>> On second thought, why do we pass the >>> >>> !inode_is_open_for_write(inode) >>> >>> in file_thp_enabled()? >>> >>> Isn't that the main problem for these memfd things? >>> >>> Maybe a get_write_access() is missing somewhere? >>> >> >> Hi David, >> >> Thanks for the suggestion. I looked into the get_write_access() path. >> >> Both guest_memfd and secretmem use alloc_file_pseudo() which skips >> calling get_write_access(), so i_writecount stays 0. That's why >> file_thp_enabled() sees them as read-only files. >> >> We could add get_write_access() after alloc_file_pseudo() in both, but >> I think that would be a hack rather than a proper fix: >> >> - i_writecount has a specific semantic: tracking how many fds have the >> file open for writing. We'd be bumping it just to influence >> file_thp_enabled() behavior. >> > > I agree re-using i_writecount feels odd since it is abusing the idea of > being written to. I might have misunderstood the full context of > i_writecount though. i_writecount means "the file is open with write access" IIUC. So one can mmap(PROT_WRITE) it etc. And that's kind of the thing: the virtual file is open with write access. That's why I am still wondering whether mimicking that is actually the right fix. > >> - It doesn't express the actual intent. The real issue is that >> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem >> backed files. >> >> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is >> the cleaner approach. It is explicit, has no side effects, and is easy >> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away. >> > > I was considering other address space flags and I think the best might > be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in > __vma_thp_allowable_orders() check the maximum allowed order for the > address space. The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these folio order checks. Changing it would degrade filesystems that do not support large folios yet. IOW, it would be similar to ripping out CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :) > > khugepaged is about consolidating memory to huge pages, so if the > address space doesn't allow a larger folio order, then khugepaged should > not operate on that memory. > > The other options are > > + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE, > but IIUC evictability is more closely related to swapping and > khugepaged might operate on swappable memory? Right, it does not really make sense > + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used > to block migration. khugepaged kind of migrates the memory contents > too, but someday we want guest_memfd to support migration, and at that > time we would still want to block khugepaged, so I don't think we want > to reuse a flag that couples khugepaged to migration. It could be used at least for the time being and to fix the issue. -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 19:45 ` David Hildenbrand (Arm) @ 2026-02-09 20:13 ` David Hildenbrand (Arm) 2026-02-09 21:31 ` Ackerley Tng 2026-02-10 1:51 ` Deepanshu Kartikey 0 siblings, 2 replies; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-09 20:13 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/9/26 20:45, David Hildenbrand (Arm) wrote: > On 2/9/26 19:22, Ackerley Tng wrote: >> Deepanshu Kartikey <kartikey406@gmail.com> writes: >> >>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) >>> <david@kernel.org> wrote: >>> >>> Hi David, >>> >>> Thanks for the suggestion. I looked into the get_write_access() path. >>> >>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips >>> calling get_write_access(), so i_writecount stays 0. That's why >>> file_thp_enabled() sees them as read-only files. >>> >>> We could add get_write_access() after alloc_file_pseudo() in both, but >>> I think that would be a hack rather than a proper fix: >>> >>> - i_writecount has a specific semantic: tracking how many fds have the >>> file open for writing. We'd be bumping it just to influence >>> file_thp_enabled() behavior. >>> >> >> I agree re-using i_writecount feels odd since it is abusing the idea of >> being written to. I might have misunderstood the full context of >> i_writecount though. > > i_writecount means "the file is open with write access" IIUC. So one can > mmap(PROT_WRITE) it etc. > > And that's kind of the thing: the virtual file is open with write > access. That's why I am still wondering whether mimicking that is > actually the right fix. > >> >>> - It doesn't express the actual intent. The real issue is that >>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem >>> backed files. >>> >>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is >>> the cleaner approach. It is explicit, has no side effects, and is easy >>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away. >>> >> >> I was considering other address space flags and I think the best might >> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in >> __vma_thp_allowable_orders() check the maximum allowed order for the >> address space. > > The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these > folio order checks. Changing it would degrade filesystems that do not > support large folios yet. IOW, it would be similar to ripping out > CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :) > >> >> khugepaged is about consolidating memory to huge pages, so if the >> address space doesn't allow a larger folio order, then khugepaged should >> not operate on that memory. >> >> The other options are >> >> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE, >> but IIUC evictability is more closely related to swapping and >> khugepaged might operate on swappable memory? > Right, it does not really make sense > >> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used >> to block migration. khugepaged kind of migrates the memory contents >> too, but someday we want guest_memfd to support migration, and at that >> time we would still want to block khugepaged, so I don't think we want >> to reuse a flag that couples khugepaged to migration. > > It could be used at least for the time being and to fix the issue. mapping_inaccessible(mapping) indeed looks like the easiest fix, given that shmem "somehow" works, lol. BUT, something just occurred to me. We added the mc-handling in commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b Author: Jiaqi Yan <jiaqiyan@google.com> Date: Wed Mar 29 08:11:19 2023 -0700 mm/khugepaged: recover from poisoned anonymous memory .. So I assume kernels before that would crash when collapsing? Looking at 5.15.199, it does not contain 98c76c9f1e [1]. So I suspect we need a fix+stable backport. Who volunteers to try a secretmem reproducer on a stable kernel? :) The following is a bit nasty as well but should do the trick until we rip out the CONFIG_READ_ONLY_THP_FOR_FS stuff. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 03886d4ccecc..4ac1cb36b861 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -40,6 +40,7 @@ #include <linux/pgalloc.h> #include <linux/pgalloc_tag.h> #include <linux/pagewalk.h> +#include <linux/secretmem.h> #include <asm/tlb.h> #include "internal.h" @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) inode = file_inode(vma->vm_file); + if (mapping_inaccessible(inode->i_mapping) || + secretmem_mapping(inode->i_mapping)) + return false; + return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199 -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 20:13 ` David Hildenbrand (Arm) @ 2026-02-09 21:31 ` Ackerley Tng 2026-02-10 9:33 ` David Hildenbrand (Arm) 2026-02-10 1:51 ` Deepanshu Kartikey 1 sibling, 1 reply; 29+ messages in thread From: Ackerley Tng @ 2026-02-09 21:31 UTC (permalink / raw) To: David Hildenbrand (Arm), Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 "David Hildenbrand (Arm)" <david@kernel.org> writes: > On 2/9/26 20:45, David Hildenbrand (Arm) wrote: >> On 2/9/26 19:22, Ackerley Tng wrote: >>> Deepanshu Kartikey <kartikey406@gmail.com> writes: >>> >>>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) >>>> <david@kernel.org> wrote: >>>> >>>> Hi David, >>>> >>>> Thanks for the suggestion. I looked into the get_write_access() path. >>>> >>>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips >>>> calling get_write_access(), so i_writecount stays 0. That's why >>>> file_thp_enabled() sees them as read-only files. >>>> >>>> We could add get_write_access() after alloc_file_pseudo() in both, but >>>> I think that would be a hack rather than a proper fix: >>>> >>>> - i_writecount has a specific semantic: tracking how many fds have the >>>> file open for writing. We'd be bumping it just to influence >>>> file_thp_enabled() behavior. >>>> >>> >>> I agree re-using i_writecount feels odd since it is abusing the idea of >>> being written to. I might have misunderstood the full context of >>> i_writecount though. >> >> i_writecount means "the file is open with write access" IIUC. So one can >> mmap(PROT_WRITE) it etc. >> >> And that's kind of the thing: the virtual file is open with write >> access. That's why I am still wondering whether mimicking that is >> actually the right fix. >> >>> >>>> - It doesn't express the actual intent. The real issue is that >>>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem >>>> backed files. >>>> >>>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is >>>> the cleaner approach. It is explicit, has no side effects, and is easy >>>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away. >>>> >>> >>> I was considering other address space flags and I think the best might >>> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in >>> __vma_thp_allowable_orders() check the maximum allowed order for the >>> address space. >> >> The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these >> folio order checks. Ah that's true. >> Changing it would degrade filesystems that do not >> support large folios yet. IOW, it would be similar to ripping out >> CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :) >> >>> >>> khugepaged is about consolidating memory to huge pages, so if the >>> address space doesn't allow a larger folio order, then khugepaged should >>> not operate on that memory. >>> >>> The other options are >>> >>> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE, >>> but IIUC evictability is more closely related to swapping and >>> khugepaged might operate on swappable memory? >> Right, it does not really make sense >> >>> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used >>> to block migration. khugepaged kind of migrates the memory contents >>> too, but someday we want guest_memfd to support migration, and at that >>> time we would still want to block khugepaged, so I don't think we want >>> to reuse a flag that couples khugepaged to migration. >> >> It could be used at least for the time being and to fix the issue. > > mapping_inaccessible(mapping) indeed looks like the easiest fix, given that > shmem "somehow" works, lol. > I could also check shmem, but I'm not sure which conditions to set up shmem for, since shmem could be used in so many ways. Any suggestions? Off the top of my head, shmem lots of special-casing in the khugepaged flow... > BUT, something just occurred to me. > > We added the mc-handling in > > commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b > Author: Jiaqi Yan <jiaqiyan@google.com> > Date: Wed Mar 29 08:11:19 2023 -0700 > > mm/khugepaged: recover from poisoned anonymous memory > > .. > > So I assume kernels before that would crash when collapsing? > > Looking at 5.15.199, it does not contain 98c76c9f1e [1]. > > So I suspect we need a fix+stable backport. > > Who volunteers to try a secretmem reproducer on a stable kernel? :) > I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be special-casing secretmem like you suggested below? > > The following is a bit nasty as well but should do the trick until we rip > out the CONFIG_READ_ONLY_THP_FOR_FS stuff. > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 03886d4ccecc..4ac1cb36b861 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -40,6 +40,7 @@ > #include <linux/pgalloc.h> > #include <linux/pgalloc_tag.h> > #include <linux/pagewalk.h> > +#include <linux/secretmem.h> > > #include <asm/tlb.h> > #include "internal.h" > @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > > inode = file_inode(vma->vm_file); > > + if (mapping_inaccessible(inode->i_mapping) || > + secretmem_mapping(inode->i_mapping)) > + return false; > + Regarding the degradation of filesystems that don't support large folios yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX would disable collapsing for filesystems that actually want pages to be collapsed, but don't update max folio order and hence appear to not support large folios yet? What about a check like this instead if (!mapping_large_folio_support()) return false; And then when CONFIG_READ_ONLY_THP_FOR_FS is removed, part of that work would involve getting filesystems to update AS_FOLIO_ORDER_MAX? > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > } > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199 > > -- > Cheers, > > David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 21:31 ` Ackerley Tng @ 2026-02-10 9:33 ` David Hildenbrand (Arm) 2026-02-10 23:00 ` Ackerley Tng 0 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-10 9:33 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 >> BUT, something just occurred to me. >> >> We added the mc-handling in >> >> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b >> Author: Jiaqi Yan <jiaqiyan@google.com> >> Date: Wed Mar 29 08:11:19 2023 -0700 >> >> mm/khugepaged: recover from poisoned anonymous memory >> >> .. >> >> So I assume kernels before that would crash when collapsing? >> >> Looking at 5.15.199, it does not contain 98c76c9f1e [1]. >> >> So I suspect we need a fix+stable backport. >> >> Who volunteers to try a secretmem reproducer on a stable kernel? :) >> > > I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should > we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be > special-casing secretmem like you suggested below? Yes. If there is no guest_memfd we wouldn't need it. > >> >> The following is a bit nasty as well but should do the trick until we rip >> out the CONFIG_READ_ONLY_THP_FOR_FS stuff. >> >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 03886d4ccecc..4ac1cb36b861 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -40,6 +40,7 @@ >> #include <linux/pgalloc.h> >> #include <linux/pgalloc_tag.h> >> #include <linux/pagewalk.h> >> +#include <linux/secretmem.h> >> >> #include <asm/tlb.h> >> #include "internal.h" >> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) >> >> inode = file_inode(vma->vm_file); >> >> + if (mapping_inaccessible(inode->i_mapping) || >> + secretmem_mapping(inode->i_mapping)) >> + return false; >> + > > Regarding the degradation of filesystems that don't support large folios > yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX > would disable collapsing for filesystems that actually want pages to be > collapsed, but don't update max folio order and hence appear to not > support large folios yet? > > What about a check like this instead > > if (!mapping_large_folio_support()) > return false; That would essentially disable CONFIG_READ_ONLY_THP_FOR_FS (support for THP before filesystems started supporting large folios officially), no? -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-10 9:33 ` David Hildenbrand (Arm) @ 2026-02-10 23:00 ` Ackerley Tng 2026-02-11 0:58 ` Ackerley Tng ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Ackerley Tng @ 2026-02-10 23:00 UTC (permalink / raw) To: David Hildenbrand (Arm), Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 "David Hildenbrand (Arm)" <david@kernel.org> writes: >>> BUT, something just occurred to me. >>> >>> We added the mc-handling in >>> >>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b >>> Author: Jiaqi Yan <jiaqiyan@google.com> >>> Date: Wed Mar 29 08:11:19 2023 -0700 >>> >>> mm/khugepaged: recover from poisoned anonymous memory >>> >>> .. >>> >>> So I assume kernels before that would crash when collapsing? >>> >>> Looking at 5.15.199, it does not contain 98c76c9f1e [1]. >>> >>> So I suspect we need a fix+stable backport. >>> >>> Who volunteers to try a secretmem reproducer on a stable kernel? :) >>> >> >> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should >> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be >> special-casing secretmem like you suggested below? > > Yes. If there is no guest_memfd we wouldn't need it. > Seems like on 5.15.199 there's a hugepage_vma_check(), which will return false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are skipped. [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469 >> >>> >>> The following is a bit nasty as well but should do the trick until we rip >>> out the CONFIG_READ_ONLY_THP_FOR_FS stuff. >>> >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 03886d4ccecc..4ac1cb36b861 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -40,6 +40,7 @@ >>> #include <linux/pgalloc.h> >>> #include <linux/pgalloc_tag.h> >>> #include <linux/pagewalk.h> >>> +#include <linux/secretmem.h> >>> >>> #include <asm/tlb.h> >>> #include "internal.h" >>> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) >>> >>> inode = file_inode(vma->vm_file); >>> >>> + if (mapping_inaccessible(inode->i_mapping) || >>> + secretmem_mapping(inode->i_mapping)) >>> + return false; >>> + Regarding checking mapping, is there any chance of racing with inode release? (Might the mapping be freed?) >> >> Regarding the degradation of filesystems that don't support large folios >> yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX >> would disable collapsing for filesystems that actually want pages to be >> collapsed, but don't update max folio order and hence appear to not >> support large folios yet? >> >> What about a check like this instead >> >> if (!mapping_large_folio_support()) >> return false; > > That would essentially disable CONFIG_READ_ONLY_THP_FOR_FS (support for > THP before filesystems started supporting large folios officially), no? > I think I get what you mean now. I was thinking to also update the filesystems to specify AS_FOLIO_ORDER_MAX, but I think that is better separated out as a different patch series, and this should focus on just fixing the bug. > -- > Cheers, > > David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-10 23:00 ` Ackerley Tng @ 2026-02-11 0:58 ` Ackerley Tng 2026-02-11 2:01 ` Deepanshu Kartikey 2026-02-11 9:29 ` David Hildenbrand (Arm) 2026-02-11 1:59 ` Deepanshu Kartikey 2026-02-11 9:28 ` David Hildenbrand (Arm) 2 siblings, 2 replies; 29+ messages in thread From: Ackerley Tng @ 2026-02-11 0:58 UTC (permalink / raw) To: David Hildenbrand (Arm), Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 Ackerley Tng <ackerleytng@google.com> writes: > "David Hildenbrand (Arm)" <david@kernel.org> writes: > >>>> BUT, something just occurred to me. >>>> >>>> We added the mc-handling in >>>> >>>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b >>>> Author: Jiaqi Yan <jiaqiyan@google.com> >>>> Date: Wed Mar 29 08:11:19 2023 -0700 >>>> >>>> mm/khugepaged: recover from poisoned anonymous memory >>>> >>>> .. >>>> >>>> So I assume kernels before that would crash when collapsing? >>>> >>>> Looking at 5.15.199, it does not contain 98c76c9f1e [1]. >>>> >>>> So I suspect we need a fix+stable backport. >>>> >>>> Who volunteers to try a secretmem reproducer on a stable kernel? :) >>>> >>> >>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should >>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be >>> special-casing secretmem like you suggested below? >> >> Yes. If there is no guest_memfd we wouldn't need it. >> > > Seems like on 5.15.199 there's a hugepage_vma_check(), which will return > false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are > skipped. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469 > On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not anonymous [2]. [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135 Same for 6.6.123 [3]. [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125 It breaks in 6.12.69 [4]. [4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.12.69#n159 IIUC the patch that enabled khugepaged for secretmem is commit 7a81751fcdeb833acc858e59082688e3020bfe12 Author: Zach O'Keefe <zokeefe@google.com> Date: Mon Sep 25 13:01:10 2023 -0700 mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" ... @@ -132,12 +132,18 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, !hugepage_flags_always()))) return false; - /* Only regular file is valid */ - if (!in_pf && file_thp_enabled(vma)) - return true; - - if (!vma_is_anonymous(vma)) + if (!vma_is_anonymous(vma)) { + /* + * Trust that ->huge_fault() handlers know what they are doing + * in fault path. + */ + if (((in_pf || smaps)) && vma->vm_ops->huge_fault) + return true; + /* Only regular file is valid in collapse path */ + if (((!in_pf || smaps)) && file_thp_enabled(vma)) + return true; return false; + } if (vma_is_temporary_stack(vma)) return false; Because file_thp_enabled() would return true for secretmem. >>> >>>> >>>> >>>> [...snip...] >>>> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 0:58 ` Ackerley Tng @ 2026-02-11 2:01 ` Deepanshu Kartikey 2026-02-11 9:29 ` David Hildenbrand (Arm) 1 sibling, 0 replies; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-11 2:01 UTC (permalink / raw) To: Ackerley Tng Cc: David Hildenbrand (Arm), akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On Wed, Feb 11, 2026 at 6:28 AM Ackerley Tng <ackerleytng@google.com> wrote: > > Ackerley Tng <ackerleytng@google.com> writes: > > > "David Hildenbrand (Arm)" <david@kernel.org> writes: > > > >>>> BUT, something just occurred to me. > >>>> > >>>> We added the mc-handling in > >>>> > >>>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b > >>>> Author: Jiaqi Yan <jiaqiyan@google.com> > >>>> Date: Wed Mar 29 08:11:19 2023 -0700 > >>>> > >>>> mm/khugepaged: recover from poisoned anonymous memory > >>>> > >>>> .. > >>>> > >>>> So I assume kernels before that would crash when collapsing? > >>>> > >>>> Looking at 5.15.199, it does not contain 98c76c9f1e [1]. > >>>> > >>>> So I suspect we need a fix+stable backport. > >>>> > >>>> Who volunteers to try a secretmem reproducer on a stable kernel? :) > >>>> > >>> > >>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should > >>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be > >>> special-casing secretmem like you suggested below? > >> > >> Yes. If there is no guest_memfd we wouldn't need it. > >> > > > > Seems like on 5.15.199 there's a hugepage_vma_check(), which will return > > false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are > > skipped. > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469 > > > > On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not > anonymous [2]. > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135 > > Same for 6.6.123 [3]. > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125 > > It breaks in 6.12.69 [4]. > > [4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.12.69#n159 > > IIUC the patch that enabled khugepaged for secretmem is > > commit 7a81751fcdeb833acc858e59082688e3020bfe12 > Author: Zach O'Keefe <zokeefe@google.com> > Date: Mon Sep 25 13:01:10 2023 -0700 > > mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" > > ... > > @@ -132,12 +132,18 @@ bool hugepage_vma_check(struct vm_area_struct > *vma, unsigned long vm_flags, > !hugepage_flags_always()))) > return false; > > - /* Only regular file is valid */ > - if (!in_pf && file_thp_enabled(vma)) > - return true; > - > - if (!vma_is_anonymous(vma)) > + if (!vma_is_anonymous(vma)) { > + /* > + * Trust that ->huge_fault() handlers know what they are doing > + * in fault path. > + */ > + if (((in_pf || smaps)) && vma->vm_ops->huge_fault) > + return true; > + /* Only regular file is valid in collapse path */ > + if (((!in_pf || smaps)) && file_thp_enabled(vma)) > + return true; > return false; > + } > > if (vma_is_temporary_stack(vma)) > return false; > > Because file_thp_enabled() would return true for secretmem. > Thanks for the analysis on stable kernels, Ackerley. So the fix only needs to target 6.12+ since that's where 7a81751fcdeb ("mm/thp: fix 'mm: thp: kill __transhuge_page_enabled()'") started routing secretmem through file_thp_enabled(). ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 0:58 ` Ackerley Tng 2026-02-11 2:01 ` Deepanshu Kartikey @ 2026-02-11 9:29 ` David Hildenbrand (Arm) 2026-02-11 16:16 ` Ackerley Tng 1 sibling, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-11 9:29 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/11/26 01:58, Ackerley Tng wrote: > Ackerley Tng <ackerleytng@google.com> writes: > >> "David Hildenbrand (Arm)" <david@kernel.org> writes: >> >>> >>> Yes. If there is no guest_memfd we wouldn't need it. >>> >> >> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return >> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are >> skipped. >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469 >> > > On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not > anonymous [2]. > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135 > > Same for 6.6.123 [3]. > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125 > > It breaks in 6.12.69 [4]. Do you have a reproducer? If so, which behavior does it trigger? I would assume that we would suddenly have secretmem pages (THP) that have a directmap. Or some page copy would crash the kernel. -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 9:29 ` David Hildenbrand (Arm) @ 2026-02-11 16:16 ` Ackerley Tng 2026-02-11 16:35 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 29+ messages in thread From: Ackerley Tng @ 2026-02-11 16:16 UTC (permalink / raw) To: David Hildenbrand (Arm), Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 "David Hildenbrand (Arm)" <david@kernel.org> writes: > On 2/11/26 01:58, Ackerley Tng wrote: >> Ackerley Tng <ackerleytng@google.com> writes: >> >>> "David Hildenbrand (Arm)" <david@kernel.org> writes: >>> >>>> >>>> Yes. If there is no guest_memfd we wouldn't need it. >>>> >>> >>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return >>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are >>> skipped. >>> >>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469 >>> >> >> On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not >> anonymous [2]. >> >> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135 >> >> Same for 6.6.123 [3]. >> >> [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125 >> >> It breaks in 6.12.69 [4]. > > > Do you have a reproducer? If so, which behavior does it trigger? > > I would assume that we would suddenly have secretmem pages (THP) that > have a directmap. Or some page copy would crash the kernel. > Is there a good way to verify from userspace that the directmap hasn't been restored? Should I use CONFIG_PTDUMP_DEBUGFS? > -- > Cheers, > > David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 16:16 ` Ackerley Tng @ 2026-02-11 16:35 ` David Hildenbrand (Arm) 2026-02-11 16:44 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-11 16:35 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/11/26 17:16, Ackerley Tng wrote: > "David Hildenbrand (Arm)" <david@kernel.org> writes: > >> On 2/11/26 01:58, Ackerley Tng wrote: >>> Ackerley Tng <ackerleytng@google.com> writes: >>> >>> >>> On 6.1.162, secretmem VMAs are skipped since secretmem VMAs are not >>> anonymous [2]. >>> >>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.1.162#n135 >>> >>> Same for 6.6.123 [3]. >>> >>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/huge_memory.c?h=v6.6.123#n125 >>> >>> It breaks in 6.12.69 [4]. >> >> >> Do you have a reproducer? If so, which behavior does it trigger? >> >> I would assume that we would suddenly have secretmem pages (THP) that >> have a directmap. Or some page copy would crash the kernel. >> > > Is there a good way to verify from userspace that the directmap hasn't > been restored? Should I use CONFIG_PTDUMP_DEBUGFS? Anything that uses GUP must fail on that secretmem memory. Like doing an O_DIRECT read/write or using vmsplice. -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 16:35 ` David Hildenbrand (Arm) @ 2026-02-11 16:44 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-11 16:44 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/11/26 17:35, David Hildenbrand (Arm) wrote: > On 2/11/26 17:16, Ackerley Tng wrote: >> "David Hildenbrand (Arm)" <david@kernel.org> writes: >> >>> >>> >>> Do you have a reproducer? If so, which behavior does it trigger? >>> >>> I would assume that we would suddenly have secretmem pages (THP) that >>> have a directmap. Or some page copy would crash the kernel. >>> >> >> Is there a good way to verify from userspace that the directmap hasn't >> been restored? Should I use CONFIG_PTDUMP_DEBUGFS? > > Anything that uses GUP must fail on that secretmem memory. Like doing an > O_DIRECT read/write or using vmsplice. > Ah, but that might still fail, because we can identify the page as such. Hm ... we'd need some introspection interface indeed. -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-10 23:00 ` Ackerley Tng 2026-02-11 0:58 ` Ackerley Tng @ 2026-02-11 1:59 ` Deepanshu Kartikey 2026-02-11 9:28 ` David Hildenbrand (Arm) 2 siblings, 0 replies; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-11 1:59 UTC (permalink / raw) To: Ackerley Tng Cc: David Hildenbrand (Arm), akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On Wed, Feb 11, 2026 at 4:30 AM Ackerley Tng <ackerleytng@google.com> wrote: > > "David Hildenbrand (Arm)" <david@kernel.org> writes: > > >>> BUT, something just occurred to me. > >>> > >>> We added the mc-handling in > >>> > >>> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b > >>> Author: Jiaqi Yan <jiaqiyan@google.com> > >>> Date: Wed Mar 29 08:11:19 2023 -0700 > >>> > >>> mm/khugepaged: recover from poisoned anonymous memory > >>> > >>> .. > >>> > >>> So I assume kernels before that would crash when collapsing? > >>> > >>> Looking at 5.15.199, it does not contain 98c76c9f1e [1]. > >>> > >>> So I suspect we need a fix+stable backport. > >>> > >>> Who volunteers to try a secretmem reproducer on a stable kernel? :) > >>> > >> > >> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should > >> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be > >> special-casing secretmem like you suggested below? > > > > Yes. If there is no guest_memfd we wouldn't need it. > > > > Seems like on 5.15.199 there's a hugepage_vma_check(), which will return > false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are > skipped. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199#n469 > > >> > >>> > >>> The following is a bit nasty as well but should do the trick until we rip > >>> out the CONFIG_READ_ONLY_THP_FOR_FS stuff. > >>> > >>> > >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>> index 03886d4ccecc..4ac1cb36b861 100644 > >>> --- a/mm/huge_memory.c > >>> +++ b/mm/huge_memory.c > >>> @@ -40,6 +40,7 @@ > >>> #include <linux/pgalloc.h> > >>> #include <linux/pgalloc_tag.h> > >>> #include <linux/pagewalk.h> > >>> +#include <linux/secretmem.h> > >>> > >>> #include <asm/tlb.h> > >>> #include "internal.h" > >>> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > >>> > >>> inode = file_inode(vma->vm_file); > >>> > >>> + if (mapping_inaccessible(inode->i_mapping) || > >>> + secretmem_mapping(inode->i_mapping)) > >>> + return false; > >>> + > > Regarding checking mapping, is there any chance of racing with inode > release? (Might the mapping be freed?) > > >> I don't think so. file_thp_enabled() is called from __thp_vma_allowable_orders(), which is reached via khugepaged, MADV_COLLAPSE, or page faults. All these paths hold mmap_lock and operate on a valid VMA. The VMA holds a reference to the file (vma->vm_file), which holds a reference on the inode, so the inode and its mapping cannot be freed while we are checking it.. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-10 23:00 ` Ackerley Tng 2026-02-11 0:58 ` Ackerley Tng 2026-02-11 1:59 ` Deepanshu Kartikey @ 2026-02-11 9:28 ` David Hildenbrand (Arm) 2026-02-11 14:50 ` Deepanshu Kartikey 2026-02-11 15:38 ` Ackerley Tng 2 siblings, 2 replies; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-11 9:28 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song On 2/11/26 00:00, Ackerley Tng wrote: > "David Hildenbrand (Arm)" <david@kernel.org> writes: > >>> >>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should >>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be >>> special-casing secretmem like you suggested below? >> >> Yes. If there is no guest_memfd we wouldn't need it. >> > > Seems like on 5.15.199 there's a hugepage_vma_check(), which will return > false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are > skipped. Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that: /* Only regular file is valid */ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file && (vm_flags & VM_EXEC)) { struct inode *inode = vma->vm_file->f_inode; return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } So if you have VM_EXEC on the VMA (mmaped with PROT_EXEC), it would work. I think secretmem sets SB_I_NOEXEC, which prevents that. Same for guest_memfd. v6.6.123 still has that VM_EXEC check in file_thp_enabled(). The check was dropped in commit: commit 7fbb5e188248c50f737720825da1864ce42536d1 Author: Fangrui Song <i@maskray.me> Date: Tue Dec 19 21:41:23 2023 -0800 mm: remove VM_EXEC requirement for THP eligibility Commit e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()") introduced the VM_EXEC requirement, which is not strictly needed. lld's default --rosegment option and GNU ld's -z separate-code option (default on Linux/x86 since binutils 2.31) create a read-only PT_LOAD segment without the PF_X flag, which should be eligible for THP. So that one broke secretmem. So when we fix it, we should Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") What about the following: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 44ff8a648afd..9fbe5c28a6bc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) inode = file_inode(vma->vm_file); + if (IS_ANON_FILE(inode)) + return false; + return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 9:28 ` David Hildenbrand (Arm) @ 2026-02-11 14:50 ` Deepanshu Kartikey 2026-02-11 15:38 ` Ackerley Tng 1 sibling, 0 replies; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-11 14:50 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song On Wed, Feb 11, 2026 at 2:58 PM David Hildenbrand (Arm) <david@kernel.org> wrote: > > > What about the following: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 44ff8a648afd..9fbe5c28a6bc 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > > inode = file_inode(vma->vm_file); > > + if (IS_ANON_FILE(inode)) > + return false; > + > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > } > This is an elegant solution. Instead of depending on specific subsystems, IS_ANON_FILE() handles all pseudo-filesystem inodes generically, so any future pseudo-fs won't run into the same issue. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 9:28 ` David Hildenbrand (Arm) 2026-02-11 14:50 ` Deepanshu Kartikey @ 2026-02-11 15:38 ` Ackerley Tng 2026-02-11 16:45 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 29+ messages in thread From: Ackerley Tng @ 2026-02-11 15:38 UTC (permalink / raw) To: David Hildenbrand (Arm), Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song "David Hildenbrand (Arm)" <david@kernel.org> writes: > On 2/11/26 00:00, Ackerley Tng wrote: >> "David Hildenbrand (Arm)" <david@kernel.org> writes: >> >>>> >>>> I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should >>>> we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be >>>> special-casing secretmem like you suggested below? >>> >>> Yes. If there is no guest_memfd we wouldn't need it. >>> >> >> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return >> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are >> skipped. > > Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that: > Ah... I was working on a reproducer then I realized 5.15 doesn't have MADV_COLLAPSE, then I tried to hack in an ioctl to trigger khugepaged. That turned out to be awkward but it got me to look at hugepage_vma_check(), and then I went down the rabbit hole to keep looking for the similar check function throughout the other stable kernels... and amongst all of that forgot that CONFIG_READ_ONLY_THP_FOR_FS was unset :( You're probably right about VM_EXEC. Here's the reproducer for 6.12, I put this in tools/testing/selftests/mm/memfd_secret.c and called repro() from main(). This time I enabled CONFIG_READ_ONLY_THP_FOR_FS :). void repro(void) { uint8_t *mem; int ret; int fd; int i; printf("%d triggering secretmem\n", __LINE__); fd = memfd_secret(0); if (fd < 0) { if (errno == ENOSYS) ksft_exit_skip("memfd_secret is not supported\n"); else ksft_exit_fail_msg("memfd_secret failed: %s\n", strerror(errno)); } if (ftruncate(fd, SZ_2M)) ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno)); #define ALIGNED_ADDRESS ((void*)0x400000000UL) mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); if (mem != ALIGNED_ADDRESS) ksft_exit_fail_msg("Couldn't allocate memory\n"); ret = madvise(mem, SZ_2M, MADV_HUGEPAGE); if (ret) ksft_exit_fail_msg("MADV_HUGEPAGE failed mem=%p ret=%d errno=%d\n", mem, ret, errno); #define READ_ONCE(x) (*(volatile typeof(x) *) &(x)) for (i = 0; i < SZ_2M; i += getpagesize()) READ_ONCE(mem[i]); ret = madvise(mem, SZ_2M, MADV_COLLAPSE); if (ret) ksft_exit_fail_msg("MADV_COLLAPSE failed ret=%d errno=%d\n", ret, errno); munmap(mem, SZ_2M); close(fd); } This reproducer gets us to madvise_collapse() -> hpage_collapse_scan_file() -> collapse_file(), and copy_mc_highpage() fails because copy_mc_to_kernel() returns 4096. memory_failure_queue() causes this to be printed on the console [ 1068.322578] Memory failure: 0x106d96f: recovery action for clean unevictable LRU page: Recovered No crash :) Is a crash the requirement for a backport to stable kernels? > /* Only regular file is valid */ > if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file && > (vm_flags & VM_EXEC)) { > struct inode *inode = vma->vm_file->f_inode; > > return !inode_is_open_for_write(inode) && > S_ISREG(inode->i_mode); > } > > > So if you have VM_EXEC on the VMA (mmaped with PROT_EXEC), it would work. > I think secretmem sets SB_I_NOEXEC, which prevents that. Same for guest_memfd. > > v6.6.123 still has that VM_EXEC check in file_thp_enabled(). > > The check was dropped in commit: > > commit 7fbb5e188248c50f737720825da1864ce42536d1 > Author: Fangrui Song <i@maskray.me> > Date: Tue Dec 19 21:41:23 2023 -0800 > > mm: remove VM_EXEC requirement for THP eligibility > > Commit e6be37b2e7bd ("mm/huge_memory.c: add missing read-only THP checking > in transparent_hugepage_enabled()") introduced the VM_EXEC requirement, > which is not strictly needed. > > lld's default --rosegment option and GNU ld's -z separate-code option > (default on Linux/x86 since binutils 2.31) create a read-only PT_LOAD > segment without the PF_X flag, which should be eligible for THP. > > > So that one broke secretmem. > > > So when we fix it, we should > > Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") > > > What about the following: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 44ff8a648afd..9fbe5c28a6bc 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > > inode = file_inode(vma->vm_file); > > + if (IS_ANON_FILE(inode)) > + return false; > + > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > } > > > > -- > Cheers, > > David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 15:38 ` Ackerley Tng @ 2026-02-11 16:45 ` David Hildenbrand (Arm) 2026-02-12 22:19 ` Ackerley Tng 0 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-11 16:45 UTC (permalink / raw) To: Ackerley Tng, Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song On 2/11/26 16:38, Ackerley Tng wrote: > "David Hildenbrand (Arm)" <david@kernel.org> writes: > >> On 2/11/26 00:00, Ackerley Tng wrote: >>> "David Hildenbrand (Arm)" <david@kernel.org> writes: >>> >>> >>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return >>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are >>> skipped. >> >> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that: >> > > Ah... I was working on a reproducer then I realized 5.15 doesn't have > MADV_COLLAPSE, then I tried to hack in an ioctl to trigger > khugepaged. That turned out to be awkward but it got me to look at > hugepage_vma_check(), and then I went down the rabbit hole to keep > looking for the similar check function throughout the other stable > kernels... and amongst all of that forgot that > CONFIG_READ_ONLY_THP_FOR_FS was unset :( > > You're probably right about VM_EXEC. > > Here's the reproducer for 6.12, I put this in > tools/testing/selftests/mm/memfd_secret.c and called repro() from > main(). This time I enabled CONFIG_READ_ONLY_THP_FOR_FS :). > > void repro(void) > { > uint8_t *mem; > int ret; > int fd; > int i; > > printf("%d triggering secretmem\n", __LINE__); > > fd = memfd_secret(0); > if (fd < 0) { > if (errno == ENOSYS) > ksft_exit_skip("memfd_secret is not supported\n"); > else > ksft_exit_fail_msg("memfd_secret failed: %s\n", > strerror(errno)); > } > > if (ftruncate(fd, SZ_2M)) > ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno)); > > #define ALIGNED_ADDRESS ((void*)0x400000000UL) > > mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE, MAP_FIXED > | MAP_SHARED, fd, 0); > if (mem != ALIGNED_ADDRESS) > ksft_exit_fail_msg("Couldn't allocate memory\n"); > > ret = madvise(mem, SZ_2M, MADV_HUGEPAGE); > if (ret) > ksft_exit_fail_msg("MADV_HUGEPAGE failed mem=%p ret=%d errno=%d\n", > mem, ret, errno); > > #define READ_ONCE(x) (*(volatile typeof(x) *) &(x)) > for (i = 0; i < SZ_2M; i += getpagesize()) > READ_ONCE(mem[i]); > > ret = madvise(mem, SZ_2M, MADV_COLLAPSE); > if (ret) > ksft_exit_fail_msg("MADV_COLLAPSE failed ret=%d errno=%d\n", ret, errno); > > munmap(mem, SZ_2M); > close(fd); > } > > This reproducer gets us to madvise_collapse() -> > hpage_collapse_scan_file() -> collapse_file(), and copy_mc_highpage() > fails because copy_mc_to_kernel() returns 4096. > > memory_failure_queue() causes this to be printed on the console > > [ 1068.322578] Memory failure: 0x106d96f: recovery action for clean > unevictable LRU page: Recovered > > No crash :) Is a crash the requirement for a backport to stable kernels? I'd say being able to trigger that is sufficient. There is no real memory failure :) -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-11 16:45 ` David Hildenbrand (Arm) @ 2026-02-12 22:19 ` Ackerley Tng 2026-02-13 5:02 ` Deepanshu Kartikey 0 siblings, 1 reply; 29+ messages in thread From: Ackerley Tng @ 2026-02-12 22:19 UTC (permalink / raw) To: David Hildenbrand (Arm), Deepanshu Kartikey Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song "David Hildenbrand (Arm)" <david@kernel.org> writes: Going to try and summarize the findings/discussions here, copying from a few earlier emails. David, you can jump directly to [Question]. > On 2/11/26 16:38, Ackerley Tng wrote: >> "David Hildenbrand (Arm)" <david@kernel.org> writes: >> >>> On 2/11/26 00:00, Ackerley Tng wrote: >>>> "David Hildenbrand (Arm)" <david@kernel.org> writes: >>>> >>>> >>>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return >>>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are >>>> skipped. >>> >>> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that: >>> >> >> Ah... I was working on a reproducer then I realized 5.15 doesn't have >> MADV_COLLAPSE, then I tried to hack in an ioctl to trigger >> khugepaged. That turned out to be awkward but it got me to look at >> hugepage_vma_check(), and then I went down the rabbit hole to keep >> looking for the similar check function throughout the other stable >> kernels... and amongst all of that forgot that >> CONFIG_READ_ONLY_THP_FOR_FS was unset :( >> >> You're probably right about VM_EXEC. >> [Bug] khugepaged (and MADV_COLLAPSE) will try to collapse secretmem pages with MADV_HUGEPAGE applied. There is no crash, but there is a false memory failure printout that looks like [ 1068.322578] Memory failure: 0x106d96f: recovery action for clean unevictable LRU page: Recovered The correct Fixes tag should be: Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") I was able to reproduce this on 6.12, 6.18 and HEAD [Stable Backports] The first stable version this affects is 6.12. In 6.12, S_ANON_INODE does not yet exist, so I think in file_thp_enabled() we can return false if vma_is_secretmem(vma). 6.18 needs a fix for both secretmem and guest_memfd. [Solution] For 6.18 and later, David's suggestion of using IS_ANON_FILE() seems to work. This affects more filesystems than just secretmem and guest_memfd though. [Question] I'm not familiar with the concept of anonymous inodes. What does that entail? Why is it suitable in deciding THP eligibility? [Next Steps] I'm going to be traveling over the next few weeks, so perhaps Deepanshu can help with the fixup patches for 6.12, 6.18 and HEAD? [Details] Here's a reproducer for 6.18 for guest_memfd x MADV_COLLAPSE diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index e7d9aeb418d3..8760fe6fa482 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -371,10 +371,45 @@ static void test_guest_memfd_guest(void) kvm_vm_free(vm); } +#define ALIGNED_ADDRESS ((void *)0x400000000UL) + +static void repro(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + uint8_t *mem; + int fd, i; + + vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, guest_code); + + fd = vm_create_guest_memfd(vm, SZ_2M, GUEST_MEMFD_FLAG_MMAP | + GUEST_MEMFD_FLAG_INIT_SHARED); + + mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); + TEST_ASSERT_EQ(mem, ALIGNED_ADDRESS); + + for (i = 0; i < SZ_2M; i += getpagesize()) + READ_ONCE(mem[i]); + + TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_HUGEPAGE), 0); + + TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_COLLAPSE), 0); + + TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_DONTNEED), 0); + + READ_ONCE(mem[0]); + + close(fd); + kvm_vm_free(vm); +} + int main(int argc, char *argv[]) { unsigned long vm_types, vm_type; + repro(); + return 1; + TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD)); page_size = getpagesize(); console shows warning: [ 558.315452] WARNING: CPU: 1 PID: 252 at arch/x86/kvm/../../../virt/kvm/guest_memfd.c:372 kvm_gmem_fault_user_mapping+0x120/0x1c0 stdout output: # /mnt/host/kvm/guest_memfd_test Random seed: 0x6b8b4567 __vm_create: mode='PA-bits:ANY, VA-bits:48, 4K pages' type='0', pages='657' Guest physical address width detected: 46 Bus error # Here's a more complete reproducer for 6.12 for secretmem x MADV_COLLAPSE diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c index 9a0597310a76..2a0c5cc9fe20 100644 --- a/tools/testing/selftests/mm/memfd_secret.c +++ b/tools/testing/selftests/mm/memfd_secret.c @@ -21,6 +21,7 @@ #include <errno.h> #include <stdio.h> #include <fcntl.h> +#include <stddef.h> #include "../kselftest.h" @@ -299,10 +300,145 @@ static void prepare(void) #define NUM_TESTS 6 +#define SZ_2M (2UL << 20) +#define ALIGNED_ADDRESS ((void *)0x400000000UL) +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x)) + +uint64_t get_pfn(void *addr) { + uint64_t pagemap_entry; + static int fd = -1; + uintptr_t offset; + uintptr_t vaddr; + + if (fd < 0) { + fd = open("/proc/self/pagemap", O_RDONLY); + if (fd < 0) + ksft_exit_fail_msg("open pagemap\n"); + } + + vaddr = (uintptr_t)addr; + offset = (vaddr / getpagesize()) * sizeof(uint64_t); + + if (pread(fd, &pagemap_entry, sizeof(uint64_t), offset) != sizeof(uint64_t)) + ksft_exit_fail_msg("pread pagemap\n"); + + + /* Bit 63 is "present" */ + if (!(pagemap_entry & (1ULL << 63))) + ksft_exit_fail_msg("Page not present in userspace pagemap\n"); + + /* Bits 0-54 are the PFN */ + return pagemap_entry & ((1ULL << 55) - 1); +} + +bool in_direct_map(uint64_t pfn) { + static int devmem_fd = -1; + uint8_t bounce; + + if (devmem_fd < 0) { + devmem_fd = open("/dev/mem", O_RDONLY); + if (devmem_fd < 0) + ksft_exit_fail_msg("Can't open /dev/mem: %s\n", strerror(errno)); + } + + if (pread(devmem_fd, &bounce, 1, pfn * getpagesize()) == 1) { + return true; + } else { + if (errno == EFAULT) + return false; + else if (errno == EPERM) + ksft_exit_fail_msg("Access probably blocked: %s\n", strerror(errno)); + else + perror("pread /dev/mem"); + + return false; + } +} + +void check(void) +{ + uint64_t pfn; + uint8_t *mem; + + mem = mmap(NULL, SZ_2M, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mem == MAP_FAILED) + ksft_exit_fail_msg("Couldn't allocate memory\n"); + + mem[0] = 'A'; + + pfn = get_pfn(mem); + printf("%d pfn=%lx in_direct_map=%d\n", __LINE__, pfn, in_direct_map(pfn)); + + munmap(mem, SZ_2M); +} + +void repro(void) +{ + uint64_t pfn; + uint8_t *mem; + int ret; + int fd; + int i; + + printf("%d triggering secretmem\n", __LINE__); + + fd = memfd_secret(0); + if (fd < 0) { + if (errno == ENOSYS) + ksft_exit_skip("memfd_secret is not supported\n"); + else + ksft_exit_fail_msg("memfd_secret failed: %s\n", + strerror(errno)); + } + + if (ftruncate(fd, SZ_2M)) + ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno)); + + mem = mmap(ALIGNED_ADDRESS, SZ_2M, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); + if (mem != ALIGNED_ADDRESS) + ksft_exit_fail_msg("Couldn't allocate memory\n"); + + for (i = 0; i < SZ_2M; i += getpagesize()) + READ_ONCE(mem[i]); + + pfn = get_pfn(mem); + printf("%d pfn=%lx in_direct_map=%d\n", __LINE__, pfn, in_direct_map(pfn)); + + ret = madvise(mem, SZ_2M, MADV_HUGEPAGE); + if (ret) + ksft_exit_fail_msg("MADV_HUGEPAGE failed mem=%p ret=%d errno=%d\n", mem, ret, errno); + + ret = madvise(mem, SZ_2M, MADV_COLLAPSE); + if (ret != -1 || errno != EINVAL) + ksft_exit_fail_msg("MADV_COLLAPSE should have failed ret=%d errno=%d\n", ret, errno); + + /* + * Sleep allows memory_failure to complete, IIUC. If memory + * failure handling doesn't complete, faulting in memory in + * the next step fails with SIGBUS, as expected. + */ + sleep(1); + + for (i = 0; i < SZ_2M; i += getpagesize()) + READ_ONCE(mem[i]); + + printf("%d pfn=%lx in_direct_map=%d\n", __LINE__, pfn, in_direct_map(pfn)); + + pfn = get_pfn(mem); + printf("%d new pfn=%lx in_direct_map=%d\n", __LINE__, pfn, in_direct_map(pfn)); + + munmap(mem, SZ_2M); + close(fd); +} + int main(int argc, char *argv[]) { int fd; + check(); + repro(); + return 1; + prepare(); ksft_print_header(); Special configs: + Enable CONFIG_READ_ONLY_THP_FOR_FS + Disable CONFIG_STRICT_DEVMEM (so that reading /dev/mem will return -EFAULT for memory not in the direct map, just for testing) stdout output with annotations: # /mnt/host/mm/memfd_secret 370 pfn=106a600 in_direct_map=1 <<== my check that direct map check works 383 triggering secretmem 405 pfn=106f568 in_direct_map=0 <<== secretmem is indeed not in the direct map 425 pfn=106f568 in_direct_map=1 <<== after memory failure handling, folio is restored to direct map 428 new pfn=106be67 in_direct_map=0 <<== next fault: secretmem has a new folio not in the direct map # >> >> [...snip...] >> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-12 22:19 ` Ackerley Tng @ 2026-02-13 5:02 ` Deepanshu Kartikey 2026-02-13 9:06 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-13 5:02 UTC (permalink / raw) To: Ackerley Tng Cc: David Hildenbrand (Arm), akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song On Fri, Feb 13, 2026 at 3:49 AM Ackerley Tng <ackerleytng@google.com> wrote: > > "David Hildenbrand (Arm)" <david@kernel.org> writes: > > Going to try and summarize the findings/discussions here, copying from a > few earlier emails. David, you can jump directly to [Question]. > > > On 2/11/26 16:38, Ackerley Tng wrote: > >> "David Hildenbrand (Arm)" <david@kernel.org> writes: > >> > >>> On 2/11/26 00:00, Ackerley Tng wrote: > >>>> "David Hildenbrand (Arm)" <david@kernel.org> writes: > >>>> > >>>> > >>>> Seems like on 5.15.199 there's a hugepage_vma_check(), which will return > >>>> false since secretmem has vma->vm_ops defined [1], so secretmem VMAs are > >>>> skipped. > >>> > >>> Are you sure? We check for CONFIG_READ_ONLY_THP_FOR_FS before that: > >>> > >> > >> Ah... I was working on a reproducer then I realized 5.15 doesn't have > >> MADV_COLLAPSE, then I tried to hack in an ioctl to trigger > >> khugepaged. That turned out to be awkward but it got me to look at > >> hugepage_vma_check(), and then I went down the rabbit hole to keep > >> looking for the similar check function throughout the other stable > >> kernels... and amongst all of that forgot that > >> CONFIG_READ_ONLY_THP_FOR_FS was unset :( > >> > >> You're probably right about VM_EXEC. > >> > > [Bug] > khugepaged (and MADV_COLLAPSE) will try to collapse secretmem pages with > MADV_HUGEPAGE applied. There is no crash, but there is a false memory > failure printout that looks like > > [ 1068.322578] Memory failure: 0x106d96f: recovery action for > clean unevictable LRU page: Recovered > > The correct Fixes tag should be: > > Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") > > I was able to reproduce this on 6.12, 6.18 and HEAD > > [Stable Backports] > The first stable version this affects is 6.12. > > In 6.12, S_ANON_INODE does not yet exist, so I think in > file_thp_enabled() we can return false if vma_is_secretmem(vma). > > 6.18 needs a fix for both secretmem and guest_memfd. > > [Solution] > For 6.18 and later, David's suggestion of using IS_ANON_FILE() seems to > work. This affects more filesystems than just secretmem and guest_memfd > though. > > [Question] > I'm not familiar with the concept of anonymous inodes. What does that > entail? Why is it suitable in deciding THP eligibility? > > [Next Steps] > I'm going to be traveling over the next few weeks, so perhaps Deepanshu > can help with the fixup patches for 6.12, 6.18 and HEAD? > Hi David, Thanks Ackerley for the reproducer and analysis. Since Ackerley will be traveling, I can take this forward. Here is the approach I am planning: For HEAD / 6.18: - Add IS_ANON_FILE(inode) check in file_thp_enabled() as you suggested - Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") - Cc: stable@vger.kernel.org For 6.12 stable backport: - IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, so use mapping_inaccessible() || secretmem_mapping() in file_thp_enabled() instead ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-13 5:02 ` Deepanshu Kartikey @ 2026-02-13 9:06 ` David Hildenbrand (Arm) 2026-02-21 4:37 ` Deepanshu Kartikey 0 siblings, 1 reply; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-13 9:06 UTC (permalink / raw) To: Deepanshu Kartikey, Ackerley Tng Cc: akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song On 2/13/26 06:02, Deepanshu Kartikey wrote: > On Fri, Feb 13, 2026 at 3:49 AM Ackerley Tng <ackerleytng@google.com> wrote: >> >> "David Hildenbrand (Arm)" <david@kernel.org> writes: >> >> Going to try and summarize the findings/discussions here, copying from a >> few earlier emails. David, you can jump directly to [Question]. >> >> >> [Bug] >> khugepaged (and MADV_COLLAPSE) will try to collapse secretmem pages with >> MADV_HUGEPAGE applied. There is no crash, but there is a false memory >> failure printout that looks like >> >> [ 1068.322578] Memory failure: 0x106d96f: recovery action for >> clean unevictable LRU page: Recovered >> >> The correct Fixes tag should be: >> >> Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") >> >> I was able to reproduce this on 6.12, 6.18 and HEAD >> >> [Stable Backports] >> The first stable version this affects is 6.12. >> >> In 6.12, S_ANON_INODE does not yet exist, so I think in >> file_thp_enabled() we can return false if vma_is_secretmem(vma). >> >> 6.18 needs a fix for both secretmem and guest_memfd. >> >> [Solution] >> For 6.18 and later, David's suggestion of using IS_ANON_FILE() seems to >> work. This affects more filesystems than just secretmem and guest_memfd >> though. >> >> [Question] >> I'm not familiar with the concept of anonymous inodes. What does that >> entail? Why is it suitable in deciding THP eligibility? >> >> [Next Steps] >> I'm going to be traveling over the next few weeks, so perhaps Deepanshu >> can help with the fixup patches for 6.12, 6.18 and HEAD? >> > > Hi David, > > Thanks Ackerley for the reproducer and analysis. Since Ackerley will be > traveling, I can take this forward. > > Here is the approach I am planning: > > For HEAD / 6.18: > - Add IS_ANON_FILE(inode) check in file_thp_enabled() as you suggested > - Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") > - Cc: stable@vger.kernel.org Right. Please link the mail with Ackerley's reproducers and carefully describe the implications. Then describe how anon inodes never pass the "opened writable" check and that the clean thing to do is to revert to disallowing anon inodes altogether. Also describe how secretmem is not affected upstream, but triggers the confusing memory failure errors. > > For 6.12 stable backport: > - IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, so use > mapping_inaccessible() || secretmem_mapping() in file_thp_enabled() > instead I think secretmem_mapping() is sufficient there given that guest_memfd does not apply yet. But we can discuss the details about the backport once the upstream fix is in. Thanks! -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-13 9:06 ` David Hildenbrand (Arm) @ 2026-02-21 4:37 ` Deepanshu Kartikey 0 siblings, 0 replies; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-21 4:37 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44, Fangrui Song On Fri, Feb 13, 2026 at 2:36 PM David Hildenbrand (Arm) <david@kernel.org> wrote: > > > For 6.12 stable backport: > > - IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, so use > > mapping_inaccessible() || secretmem_mapping() in file_thp_enabled() > > instead > > I think secretmem_mapping() is sufficient there given that guest_memfd > does not apply yet. > > But we can discuss the details about the backport once the upstream fix > is in. > Subject: Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes Hi David, The upstream fix is now in mm-unstable and linux-next. Should I send a backport for 6.12 stable? Since IS_ANON_FILE / S_ANON_INODE does not exist in 6.12, I was planning to use secretmem_mapping() in file_thp_enabled() as you suggested. guest_memfd mmap is not present in 6.12 so only secretmem needs fixing there. Thanks, Deepanshu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 20:13 ` David Hildenbrand (Arm) 2026-02-09 21:31 ` Ackerley Tng @ 2026-02-10 1:51 ` Deepanshu Kartikey 2026-02-10 9:33 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 29+ messages in thread From: Deepanshu Kartikey @ 2026-02-10 1:51 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On Tue, Feb 10, 2026 at 1:43 AM David Hildenbrand (Arm) <david@kernel.org> wrote: > > The following is a bit nasty as well but should do the trick until we rip > out the CONFIG_READ_ONLY_THP_FOR_FS stuff. > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 03886d4ccecc..4ac1cb36b861 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -40,6 +40,7 @@ > #include <linux/pgalloc.h> > #include <linux/pgalloc_tag.h> > #include <linux/pagewalk.h> > +#include <linux/secretmem.h> > > #include <asm/tlb.h> > #include "internal.h" > @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > > inode = file_inode(vma->vm_file); > > + if (mapping_inaccessible(inode->i_mapping) || > + secretmem_mapping(inode->i_mapping)) > + return false; > + > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > } > > Hi David, Agreed, using mapping_inaccessible() for guest_memfd and secretmem_mapping() for secretmem is much simpler than introducing a new AS flag. No changes needed outside of file_thp_enabled(). I will send a v2 with your suggested diff and test it on syzbot. Thanks, Deepanshu ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-10 1:51 ` Deepanshu Kartikey @ 2026-02-10 9:33 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 29+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-10 9:33 UTC (permalink / raw) To: Deepanshu Kartikey Cc: Ackerley Tng, akpm, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, seanjc, pbonzini, michael.roth, vannapurve, ziy, linux-mm, linux-kernel, syzbot+33a04338019ac7e43a44 On 2/10/26 02:51, Deepanshu Kartikey wrote: > On Tue, Feb 10, 2026 at 1:43 AM David Hildenbrand (Arm) > <david@kernel.org> wrote: >> > >> The following is a bit nasty as well but should do the trick until we rip >> out the CONFIG_READ_ONLY_THP_FOR_FS stuff. >> >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 03886d4ccecc..4ac1cb36b861 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -40,6 +40,7 @@ >> #include <linux/pgalloc.h> >> #include <linux/pgalloc_tag.h> >> #include <linux/pagewalk.h> >> +#include <linux/secretmem.h> >> >> #include <asm/tlb.h> >> #include "internal.h" >> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) >> >> inode = file_inode(vma->vm_file); >> >> + if (mapping_inaccessible(inode->i_mapping) || >> + secretmem_mapping(inode->i_mapping)) >> + return false; >> + >> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); >> } >> >> > > Hi David, > > Agreed, using mapping_inaccessible() for guest_memfd and > secretmem_mapping() for secretmem is much simpler than introducing a > new AS flag. No changes needed outside of file_thp_enabled(). > > I will send a v2 with your suggested diff and test it on syzbot. Let's wait a bit until we are in agreement that this is the right thing to do :) -- Cheers, David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey 2026-02-09 10:24 ` David Hildenbrand (Arm) @ 2026-02-09 23:37 ` kernel test robot 2026-02-10 17:51 ` kernel test robot 2 siblings, 0 replies; 29+ messages in thread From: kernel test robot @ 2026-02-09 23:37 UTC (permalink / raw) To: Deepanshu Kartikey, akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini, michael.roth, vannapurve Cc: oe-kbuild-all, ziy, linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+33a04338019ac7e43a44 Hi Deepanshu, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Deepanshu-Kartikey/mm-thp-Deny-THP-for-guest_memfd-and-secretmem-in-file_thp_enabled/20260209-113800 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20260209033558.22943-1-kartikey406%40gmail.com patch subject: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() config: arc-randconfig-001-20260210 (https://download.01.org/0day-ci/archive/20260210/202602100727.b1U4CHAA-lkp@intel.com/config) compiler: arc-linux-gcc (GCC) 14.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260210/202602100727.b1U4CHAA-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202602100727.b1U4CHAA-lkp@intel.com/ All errors (new ones prefixed by >>): mm/huge_memory.c: In function 'file_thp_enabled': >> mm/huge_memory.c:96:37: error: 'GUEST_MEMFD_MAGIC' undeclared (first use in this function) 96 | if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC || | ^~~~~~~~~~~~~~~~~ mm/huge_memory.c:96:37: note: each undeclared identifier is reported only once for each function it appears in >> mm/huge_memory.c:97:37: error: 'SECRETMEM_MAGIC' undeclared (first use in this function) 97 | inode->i_sb->s_magic == SECRETMEM_MAGIC) | ^~~~~~~~~~~~~~~ vim +/GUEST_MEMFD_MAGIC +96 mm/huge_memory.c 84 85 static inline bool file_thp_enabled(struct vm_area_struct *vma) 86 { 87 struct inode *inode; 88 89 if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) 90 return false; 91 92 if (!vma->vm_file) 93 return false; 94 95 inode = file_inode(vma->vm_file); > 96 if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC || > 97 inode->i_sb->s_magic == SECRETMEM_MAGIC) 98 return false; 99 100 return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); 101 } 102 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() 2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey 2026-02-09 10:24 ` David Hildenbrand (Arm) 2026-02-09 23:37 ` kernel test robot @ 2026-02-10 17:51 ` kernel test robot 2 siblings, 0 replies; 29+ messages in thread From: kernel test robot @ 2026-02-10 17:51 UTC (permalink / raw) To: Deepanshu Kartikey, akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, ackerleytng, seanjc, pbonzini, michael.roth, vannapurve Cc: llvm, oe-kbuild-all, ziy, linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+33a04338019ac7e43a44 Hi Deepanshu, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Deepanshu-Kartikey/mm-thp-Deny-THP-for-guest_memfd-and-secretmem-in-file_thp_enabled/20260209-113800 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20260209033558.22943-1-kartikey406%40gmail.com patch subject: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20260211/202602110124.Y72YFz1K-lkp@intel.com/config) compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260211/202602110124.Y72YFz1K-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202602110124.Y72YFz1K-lkp@intel.com/ All errors (new ones prefixed by >>): >> mm/huge_memory.c:96:30: error: use of undeclared identifier 'GUEST_MEMFD_MAGIC' 96 | if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC || | ^ >> mm/huge_memory.c:97:30: error: use of undeclared identifier 'SECRETMEM_MAGIC' 97 | inode->i_sb->s_magic == SECRETMEM_MAGIC) | ^ 2 errors generated. vim +/GUEST_MEMFD_MAGIC +96 mm/huge_memory.c 84 85 static inline bool file_thp_enabled(struct vm_area_struct *vma) 86 { 87 struct inode *inode; 88 89 if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) 90 return false; 91 92 if (!vma->vm_file) 93 return false; 94 95 inode = file_inode(vma->vm_file); > 96 if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC || > 97 inode->i_sb->s_magic == SECRETMEM_MAGIC) 98 return false; 99 100 return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); 101 } 102 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2026-02-21 4:37 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-02-09 3:35 [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled() Deepanshu Kartikey 2026-02-09 10:24 ` David Hildenbrand (Arm) 2026-02-09 10:41 ` David Hildenbrand (Arm) 2026-02-09 13:06 ` Deepanshu Kartikey 2026-02-09 18:22 ` Ackerley Tng 2026-02-09 19:45 ` David Hildenbrand (Arm) 2026-02-09 20:13 ` David Hildenbrand (Arm) 2026-02-09 21:31 ` Ackerley Tng 2026-02-10 9:33 ` David Hildenbrand (Arm) 2026-02-10 23:00 ` Ackerley Tng 2026-02-11 0:58 ` Ackerley Tng 2026-02-11 2:01 ` Deepanshu Kartikey 2026-02-11 9:29 ` David Hildenbrand (Arm) 2026-02-11 16:16 ` Ackerley Tng 2026-02-11 16:35 ` David Hildenbrand (Arm) 2026-02-11 16:44 ` David Hildenbrand (Arm) 2026-02-11 1:59 ` Deepanshu Kartikey 2026-02-11 9:28 ` David Hildenbrand (Arm) 2026-02-11 14:50 ` Deepanshu Kartikey 2026-02-11 15:38 ` Ackerley Tng 2026-02-11 16:45 ` David Hildenbrand (Arm) 2026-02-12 22:19 ` Ackerley Tng 2026-02-13 5:02 ` Deepanshu Kartikey 2026-02-13 9:06 ` David Hildenbrand (Arm) 2026-02-21 4:37 ` Deepanshu Kartikey 2026-02-10 1:51 ` Deepanshu Kartikey 2026-02-10 9:33 ` David Hildenbrand (Arm) 2026-02-09 23:37 ` kernel test robot 2026-02-10 17:51 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox