linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Ackerley Tng <ackerleytng@google.com>,
	Deepanshu Kartikey <kartikey406@gmail.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, seanjc@google.com, pbonzini@redhat.com,
	michael.roth@amd.com, vannapurve@google.com, ziy@nvidia.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Subject: Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
Date: Mon, 9 Feb 2026 21:13:36 +0100	[thread overview]
Message-ID: <8f188d73-fc97-414b-bdaa-e72032b2bf82@kernel.org> (raw)
In-Reply-To: <4ed1b111-f2f1-4f89-9308-fdd9d706ca37@kernel.org>

On 2/9/26 20:45, David Hildenbrand (Arm) wrote:
> On 2/9/26 19:22, Ackerley Tng wrote:
>> Deepanshu Kartikey <kartikey406@gmail.com> writes:
>>
>>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) 
>>> <david@kernel.org> wrote:
>>>
>>> Hi David,
>>>
>>> Thanks for the suggestion. I looked into the get_write_access() path.
>>>
>>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>>> calling get_write_access(), so i_writecount stays 0. That's why
>>> file_thp_enabled() sees them as read-only files.
>>>
>>> We could add get_write_access() after alloc_file_pseudo() in both, but
>>> I think that would be a hack rather than a proper fix:
>>>
>>> - i_writecount has a specific semantic: tracking how many fds have the
>>> file open for writing. We'd be bumping it just to influence
>>> file_thp_enabled() behavior.
>>>
>>
>> I agree re-using i_writecount feels odd since it is abusing the idea of
>> being written to. I might have misunderstood the full context of
>> i_writecount though.
> 
> i_writecount means "the file is open with write access" IIUC. So one can 
> mmap(PROT_WRITE) it etc.
> 
> And that's kind of the thing: the virtual file is open with write 
> access. That's why I am still wondering whether mimicking that is 
> actually the right fix.
> 
>>
>>> - It doesn't express the actual intent. The real issue is that
>>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>>> backed files.
>>>
>>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>>> the cleaner approach. It is explicit, has no side effects, and is easy
>>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>>
>>
>> I was considering other address space flags and I think the best might
>> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
>> __vma_thp_allowable_orders() check the maximum allowed order for the
>> address space.
> 
> The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these 
> folio order checks. Changing it would degrade filesystems that do not 
> support large folios yet. IOW, it would be similar to ripping out 
> CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
> 
>>
>> khugepaged is about consolidating memory to huge pages, so if the
>> address space doesn't allow a larger folio order, then khugepaged should
>> not operate on that memory.
>>
>> The other options are
>>
>> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
>>    but IIUC evictability is more closely related to swapping and
>>    khugepaged might operate on swappable memory?
> Right, it does not really make sense
> 
>> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
>>    to block migration. khugepaged kind of migrates the memory contents
>>    too, but someday we want guest_memfd to support migration, and at that
>>    time we would still want to block khugepaged, so I don't think we want
>>    to reuse a flag that couples khugepaged to migration.
> 
> It could be used at least for the time being and to fix the issue.

mapping_inaccessible(mapping) indeed looks like the easiest fix, given that
shmem "somehow" works, lol.

BUT, something just occurred to me.

We added the mc-handling in

commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
Author: Jiaqi Yan <jiaqiyan@google.com>
Date:   Wed Mar 29 08:11:19 2023 -0700

     mm/khugepaged: recover from poisoned anonymous memory
     
..

So I assume kernels before that would crash when collapsing?

Looking at 5.15.199, it does not contain 98c76c9f1e [1].

So I suspect we need a fix+stable backport.

Who volunteers to try a secretmem reproducer on a stable kernel? :)


The following is a bit nasty as well but should do the trick until we rip
out the CONFIG_READ_ONLY_THP_FOR_FS stuff.


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 03886d4ccecc..4ac1cb36b861 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -40,6 +40,7 @@
  #include <linux/pgalloc.h>
  #include <linux/pgalloc_tag.h>
  #include <linux/pagewalk.h>
+#include <linux/secretmem.h>
  
  #include <asm/tlb.h>
  #include "internal.h"
@@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
  
         inode = file_inode(vma->vm_file);
  
+       if (mapping_inaccessible(inode->i_mapping) ||
+           secretmem_mapping(inode->i_mapping))
+               return false;
+
         return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
  }


[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199

-- 
Cheers,

David


  reply	other threads:[~2026-02-09 20:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09  3:35 Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
2026-02-09 10:41   ` David Hildenbrand (Arm)
2026-02-09 13:06     ` Deepanshu Kartikey
2026-02-09 18:22       ` Ackerley Tng
2026-02-09 19:45         ` David Hildenbrand (Arm)
2026-02-09 20:13           ` David Hildenbrand (Arm) [this message]
2026-02-09 21:31             ` Ackerley Tng
2026-02-10  9:33               ` David Hildenbrand (Arm)
2026-02-10 23:00                 ` Ackerley Tng
2026-02-11  0:58                   ` Ackerley Tng
2026-02-11  2:01                     ` Deepanshu Kartikey
2026-02-11  9:29                     ` David Hildenbrand (Arm)
2026-02-11 16:16                       ` Ackerley Tng
2026-02-11 16:35                         ` David Hildenbrand (Arm)
2026-02-11 16:44                           ` David Hildenbrand (Arm)
2026-02-11  1:59                   ` Deepanshu Kartikey
2026-02-11  9:28                   ` David Hildenbrand (Arm)
2026-02-11 14:50                     ` Deepanshu Kartikey
2026-02-11 15:38                     ` Ackerley Tng
2026-02-11 16:45                       ` David Hildenbrand (Arm)
2026-02-12 22:19                         ` Ackerley Tng
2026-02-13  5:02                           ` Deepanshu Kartikey
2026-02-13  9:06                             ` David Hildenbrand (Arm)
2026-02-21  4:37                               ` Deepanshu Kartikey
2026-02-10  1:51             ` Deepanshu Kartikey
2026-02-10  9:33               ` David Hildenbrand (Arm)
2026-02-09 23:37 ` kernel test robot
2026-02-10 17:51 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f188d73-fc97-414b-bdaa-e72032b2bf82@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dev.jain@arm.com \
    --cc=kartikey406@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=michael.roth@amd.com \
    --cc=npache@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=seanjc@google.com \
    --cc=syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com \
    --cc=vannapurve@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox