linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ackerley Tng <ackerleytng@google.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	Deepanshu Kartikey <kartikey406@gmail.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	 baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
	npache@redhat.com,  ryan.roberts@arm.com, dev.jain@arm.com,
	baohua@kernel.org, seanjc@google.com,  pbonzini@redhat.com,
	michael.roth@amd.com, vannapurve@google.com,  ziy@nvidia.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Subject: Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()
Date: Mon, 9 Feb 2026 13:31:03 -0800	[thread overview]
Message-ID: <CAEvNRgFw_P-GZ=Z4irGheM4O-RnXkd_bdAd0AT7fUuwXEnSqHQ@mail.gmail.com> (raw)
In-Reply-To: <8f188d73-fc97-414b-bdaa-e72032b2bf82@kernel.org>

"David Hildenbrand (Arm)" <david@kernel.org> writes:

> On 2/9/26 20:45, David Hildenbrand (Arm) wrote:
>> On 2/9/26 19:22, Ackerley Tng wrote:
>>> Deepanshu Kartikey <kartikey406@gmail.com> writes:
>>>
>>>> On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm)
>>>> <david@kernel.org> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thanks for the suggestion. I looked into the get_write_access() path.
>>>>
>>>> Both guest_memfd and secretmem use alloc_file_pseudo() which skips
>>>> calling get_write_access(), so i_writecount stays 0. That's why
>>>> file_thp_enabled() sees them as read-only files.
>>>>
>>>> We could add get_write_access() after alloc_file_pseudo() in both, but
>>>> I think that would be a hack rather than a proper fix:
>>>>
>>>> - i_writecount has a specific semantic: tracking how many fds have the
>>>> file open for writing. We'd be bumping it just to influence
>>>> file_thp_enabled() behavior.
>>>>
>>>
>>> I agree re-using i_writecount feels odd since it is abusing the idea of
>>> being written to. I might have misunderstood the full context of
>>> i_writecount though.
>>
>> i_writecount means "the file is open with write access" IIUC. So one can
>> mmap(PROT_WRITE) it etc.
>>
>> And that's kind of the thing: the virtual file is open with write
>> access. That's why I am still wondering whether mimicking that is
>> actually the right fix.
>>
>>>
>>>> - It doesn't express the actual intent. The real issue is that
>>>> CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
>>>> backed files.
>>>>
>>>> I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
>>>> the cleaner approach. It is explicit, has no side effects, and is easy
>>>> to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.
>>>>
>>>
>>> I was considering other address space flags and I think the best might
>>> be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
>>> __vma_thp_allowable_orders() check the maximum allowed order for the
>>> address space.
>>
>> The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these
>> folio order checks.

Ah that's true.

>> Changing it would degrade filesystems that do not
>> support large folios yet. IOW, it would be similar to ripping out
>> CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)
>>
>>>
>>> khugepaged is about consolidating memory to huge pages, so if the
>>> address space doesn't allow a larger folio order, then khugepaged should
>>> not operate on that memory.
>>>
>>> The other options are
>>>
>>> + AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
>>>    but IIUC evictability is more closely related to swapping and
>>>    khugepaged might operate on swappable memory?
>> Right, it does not really make sense
>>
>>> + AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
>>>    to block migration. khugepaged kind of migrates the memory contents
>>>    too, but someday we want guest_memfd to support migration, and at that
>>>    time we would still want to block khugepaged, so I don't think we want
>>>    to reuse a flag that couples khugepaged to migration.
>>
>> It could be used at least for the time being and to fix the issue.
>
> mapping_inaccessible(mapping) indeed looks like the easiest fix, given that
> shmem "somehow" works, lol.
>

I could also check shmem, but I'm not sure which conditions to set up
shmem for, since shmem could be used in so many ways. Any suggestions?
Off the top of my head, shmem lots of special-casing in the khugepaged
flow...

> BUT, something just occurred to me.
>
> We added the mc-handling in
>
> commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
> Author: Jiaqi Yan <jiaqiyan@google.com>
> Date:   Wed Mar 29 08:11:19 2023 -0700
>
>      mm/khugepaged: recover from poisoned anonymous memory
>
> ..
>
> So I assume kernels before that would crash when collapsing?
>
> Looking at 5.15.199, it does not contain 98c76c9f1e [1].
>
> So I suspect we need a fix+stable backport.
>
> Who volunteers to try a secretmem reproducer on a stable kernel? :)
>

I could give this a shot. 5.15.199 doesn't have AS_INACCESSIBLE. Should
we backport AS_INACCESSIBLE there or could the fix for 5.15.199 just be
special-casing secretmem like you suggested below?

>
> The following is a bit nasty as well but should do the trick until we rip
> out the CONFIG_READ_ONLY_THP_FOR_FS stuff.
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 03886d4ccecc..4ac1cb36b861 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -40,6 +40,7 @@
>   #include <linux/pgalloc.h>
>   #include <linux/pgalloc_tag.h>
>   #include <linux/pagewalk.h>
> +#include <linux/secretmem.h>
>
>   #include <asm/tlb.h>
>   #include "internal.h"
> @@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>
>          inode = file_inode(vma->vm_file);
>
> +       if (mapping_inaccessible(inode->i_mapping) ||
> +           secretmem_mapping(inode->i_mapping))
> +               return false;
> +

Regarding the degradation of filesystems that don't support large folios
yet: Do you mean having the collapse function respect AS_FOLIO_ORDER_MAX
would disable collapsing for filesystems that actually want pages to be
collapsed, but don't update max folio order and hence appear to not
support large folios yet?

What about a check like this instead

	if (!mapping_large_folio_support())
        	return false;

And then when CONFIG_READ_ONLY_THP_FOR_FS is removed, part of that work
would involve getting filesystems to update AS_FOLIO_ORDER_MAX?

>          return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>   }
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199
>
> --
> Cheers,
>
> David


  reply	other threads:[~2026-02-09 21:31 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-09  3:35 Deepanshu Kartikey
2026-02-09 10:24 ` David Hildenbrand (Arm)
2026-02-09 10:41   ` David Hildenbrand (Arm)
2026-02-09 13:06     ` Deepanshu Kartikey
2026-02-09 18:22       ` Ackerley Tng
2026-02-09 19:45         ` David Hildenbrand (Arm)
2026-02-09 20:13           ` David Hildenbrand (Arm)
2026-02-09 21:31             ` Ackerley Tng [this message]
2026-02-10  9:33               ` David Hildenbrand (Arm)
2026-02-10 23:00                 ` Ackerley Tng
2026-02-11  0:58                   ` Ackerley Tng
2026-02-11  2:01                     ` Deepanshu Kartikey
2026-02-11  9:29                     ` David Hildenbrand (Arm)
2026-02-11 16:16                       ` Ackerley Tng
2026-02-11 16:35                         ` David Hildenbrand (Arm)
2026-02-11 16:44                           ` David Hildenbrand (Arm)
2026-02-11  1:59                   ` Deepanshu Kartikey
2026-02-11  9:28                   ` David Hildenbrand (Arm)
2026-02-11 14:50                     ` Deepanshu Kartikey
2026-02-11 15:38                     ` Ackerley Tng
2026-02-11 16:45                       ` David Hildenbrand (Arm)
2026-02-12 22:19                         ` Ackerley Tng
2026-02-13  5:02                           ` Deepanshu Kartikey
2026-02-13  9:06                             ` David Hildenbrand (Arm)
2026-02-21  4:37                               ` Deepanshu Kartikey
2026-02-10  1:51             ` Deepanshu Kartikey
2026-02-10  9:33               ` David Hildenbrand (Arm)
2026-02-09 23:37 ` kernel test robot
2026-02-10 17:51 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEvNRgFw_P-GZ=Z4irGheM4O-RnXkd_bdAd0AT7fUuwXEnSqHQ@mail.gmail.com' \
    --to=ackerleytng@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=kartikey406@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=michael.roth@amd.com \
    --cc=npache@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=seanjc@google.com \
    --cc=syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com \
    --cc=vannapurve@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox