From: Nico Pache <npache@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
David Hildenbrand <david@kernel.org>,
Song Liu <songliubraving@fb.com>, Chris Mason <clm@fb.com>,
David Sterba <dsterba@suse.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>,
Lance Yang <lance.yang@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig
Date: Mon, 6 Apr 2026 10:17:57 -0600 [thread overview]
Message-ID: <CAA1CXcC-vggwd-Z3h_PAfSk8U0ZvoQH-FpwhD=9XZHUnEGkcTg@mail.gmail.com> (raw)
In-Reply-To: <737AA503-E522-4F01-B78E-AB6C6B2E89B0@nvidia.com>
On Sun, Apr 5, 2026 at 7:59 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 5 Apr 2026, at 13:38, Nico Pache wrote:
>
> > On Thu, Mar 26, 2026 at 7:43 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> Hi all,
> >>
> >> This patchset removes READ_ONLY_THP_FOR_FS Kconfig and enables creating
> >> read-only THPs for FSes with large folio support (the supported orders
> >> need to include PMD_ORDER) by default.
> >
> > Hi Zi,
> >
> > Thank you for tackling this :) Ill try to review the next version as
> > I'm a little behind on this thread.
>
> Sure. Thanks.
>
> >
> > Should we guard collapsing READ_ONLY_THPs with a sysctl? My fear is
> > workloads that convert READ_ONLY THPs into writable pages (assuming
> > this is common/possible; my understanding of FS is rather low),
> > leading to storms of thp splitting. Do you think this is a real
> > concern? I guess this is also true of read-only-->writable fs-THPS
> > even without khugepaged, correct?
>
> Why would a read-only THP need to be split when it becomes writable?
> After this patchset, a read-only THP can only be created on a FS that
> supports large folios (to be precise PMD THP). That means any write
> to that read-only THP would just change it to a writable THP.
Ah, okay. I was misremembering some stuff.
The concern I spotted earlier when investigating read-only THPs for
khugepaged was this:
For frequent yet short-lived writes on read-only pages (e.g., package
updates, log updates)
Wouldn't we get destructive cycles of cache invalidations and refault storms?
Imagine such pages are shared (library, execs, etc) across many processes.
When these files are marked for writing we must invalidate all of
their mappings, destroying their Page Tables and PageCache. Now all
processes must refault these mappings.
Once the write is complete, they are eligible for read-only promotion again.
The part I didn't understand (thanks Claude) is that this truncation
path in do_dentry_open is only taken for mappings/Filesystems that do
not support large folios, as only those filesystems track
mapping->nr_thps. Furthermore, with FS that natively support large
folios, khugepaged does not need to re-collapse these pages, as even
if this was the case they would be refaulted as THPs.
TLDR: My concern is not a real concern.
Cheers,
-- Nico
>
> Let me know if I miss anything.
>
> >
> > Cheers,
> > -- Nico
> >
> >>
> >> The changes are:
> >> 1. collapse_file() from mm/khugepaged.c, instead of checking
> >> CONFIG_READ_ONLY_THP_FOR_FS, makes sure the mapping_max_folio_order()
> >> of struct address_space of the file is at least PMD_ORDER.
> >> 2. file_thp_enabled() also checks mapping_max_folio_order() instead.
> >> 3. truncate_inode_partial_folio() calls folio_split() directly instead
> >> of the removed try_folio_split_to_order(), since large folios can
> >> only show up on a FS with large folio support.
> >> 4. nr_thps is removed from struct address_space, since it is no longer
> >> needed to drop all read-only THPs from a FS without large folio
> >> support when the fd becomes writable. Its related filemap_nr_thps*()
> >> are removed too.
> >> 5. folio_check_splittable() no longer checks READ_ONLY_THP_FOR_FS.
> >> 6. Updated comments in various places.
> >>
> >> Changelog
> >> ===
> >> From RFC[1]:
> >> 1. instead of removing READ_ONLY_THP_FOR_FS function entirely, turn it
> >> on by default for all FSes with large folio support and the supported
> >> orders includes PMD_ORDER.
> >>
> >> Suggestions and comments are welcome.
> >>
> >> Link: https://lore.kernel.org/all/20260323190644.1714379-1-ziy@nvidia.com/ [1]
> >>
> >> Zi Yan (10):
> >> mm: remove READ_ONLY_THP_FOR_FS Kconfig option
> >> mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
> >> mm: fs: remove filemap_nr_thps*() functions and their users
> >> fs: remove nr_thps from struct address_space
> >> mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
> >> mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS
> >> mm/truncate: use folio_split() in truncate_inode_partial_folio()
> >> fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS
> >> selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged
> >> selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in
> >> guard-regions
> >>
> >> fs/btrfs/defrag.c | 3 --
> >> fs/inode.c | 3 --
> >> fs/open.c | 27 ----------------
> >> include/linux/fs.h | 5 ---
> >> include/linux/huge_mm.h | 25 ++-------------
> >> include/linux/pagemap.h | 29 -----------------
> >> mm/Kconfig | 11 -------
> >> mm/filemap.c | 1 -
> >> mm/huge_memory.c | 29 ++---------------
> >> mm/khugepaged.c | 36 +++++-----------------
> >> mm/truncate.c | 8 ++---
> >> tools/testing/selftests/mm/guard-regions.c | 9 +++---
> >> tools/testing/selftests/mm/khugepaged.c | 4 +--
> >> 13 files changed, 23 insertions(+), 167 deletions(-)
> >>
> >> --
> >> 2.43.0
> >>
>
>
> --
> Best Regards,
> Yan, Zi
>
prev parent reply other threads:[~2026-04-06 16:18 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 1:42 Zi Yan
2026-03-27 1:42 ` [PATCH v1 01/10] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-03-27 11:45 ` Lorenzo Stoakes (Oracle)
2026-03-27 13:33 ` David Hildenbrand (Arm)
2026-03-27 14:39 ` Zi Yan
2026-03-27 1:42 ` [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-03-27 7:29 ` Lance Yang
2026-03-27 7:35 ` Lance Yang
2026-03-27 9:44 ` Baolin Wang
2026-03-27 12:02 ` Lorenzo Stoakes (Oracle)
2026-03-27 13:45 ` Baolin Wang
2026-03-27 14:12 ` Lorenzo Stoakes (Oracle)
2026-03-27 14:26 ` Baolin Wang
2026-03-27 14:31 ` Lorenzo Stoakes (Oracle)
2026-03-27 15:00 ` Zi Yan
2026-03-27 16:22 ` Lance Yang
2026-03-27 16:30 ` Zi Yan
2026-03-28 2:29 ` Baolin Wang
2026-03-27 12:07 ` Lorenzo Stoakes (Oracle)
2026-03-27 14:15 ` Lorenzo Stoakes (Oracle)
2026-03-27 14:46 ` Zi Yan
2026-03-27 13:37 ` David Hildenbrand (Arm)
2026-03-27 14:43 ` Zi Yan
2026-03-27 1:42 ` [PATCH v1 03/10] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-03-27 9:32 ` Lance Yang
2026-03-27 12:23 ` Lorenzo Stoakes (Oracle)
2026-03-27 13:58 ` David Hildenbrand (Arm)
2026-03-27 14:23 ` Lorenzo Stoakes (Oracle)
2026-03-27 15:05 ` Zi Yan
2026-04-01 14:35 ` David Hildenbrand (Arm)
2026-04-01 15:32 ` Zi Yan
2026-04-01 19:15 ` David Hildenbrand (Arm)
2026-04-01 20:33 ` Zi Yan
2026-04-02 14:35 ` David Hildenbrand (Arm)
2026-04-02 14:38 ` Zi Yan
2026-03-27 1:42 ` [PATCH v1 04/10] fs: remove nr_thps from struct address_space Zi Yan
2026-03-27 12:29 ` Lorenzo Stoakes (Oracle)
2026-03-27 14:00 ` David Hildenbrand (Arm)
2026-03-30 3:06 ` Lance Yang
2026-03-27 1:42 ` [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-03-27 12:42 ` Lorenzo Stoakes (Oracle)
2026-03-27 15:12 ` Zi Yan
2026-03-27 15:29 ` Lorenzo Stoakes (Oracle)
2026-03-27 15:43 ` Zi Yan
2026-03-27 16:08 ` Lorenzo Stoakes (Oracle)
2026-03-27 16:12 ` Zi Yan
2026-03-27 16:14 ` Lorenzo Stoakes (Oracle)
2026-03-29 4:07 ` WANG Rui
2026-03-30 11:17 ` Lorenzo Stoakes (Oracle)
2026-03-30 14:35 ` Zi Yan
2026-03-30 16:09 ` WANG Rui
2026-03-30 16:19 ` Matthew Wilcox
2026-04-01 14:38 ` David Hildenbrand (Arm)
2026-04-01 14:53 ` Darrick J. Wong
2026-03-27 1:42 ` [PATCH v1 06/10] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-03-27 12:50 ` Lorenzo Stoakes (Oracle)
2026-03-30 9:15 ` Lance Yang
2026-03-27 1:42 ` [PATCH v1 07/10] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-03-27 3:33 ` Lance Yang
2026-03-27 13:05 ` Lorenzo Stoakes (Oracle)
2026-03-27 15:35 ` Zi Yan
2026-03-28 9:54 ` kernel test robot
2026-03-28 9:54 ` kernel test robot
2026-03-27 1:42 ` [PATCH v1 08/10] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-03-27 13:05 ` Lorenzo Stoakes (Oracle)
2026-03-27 1:42 ` [PATCH v1 09/10] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-03-27 13:05 ` Lorenzo Stoakes (Oracle)
2026-03-27 1:42 ` [PATCH v1 10/10] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
2026-03-27 13:06 ` Lorenzo Stoakes (Oracle)
2026-03-27 13:46 ` [PATCH v1 00/10] Remove READ_ONLY_THP_FOR_FS Kconfig David Hildenbrand (Arm)
2026-03-27 14:26 ` Zi Yan
2026-03-27 14:27 ` Lorenzo Stoakes (Oracle)
2026-03-27 14:30 ` Zi Yan
2026-04-05 17:38 ` Nico Pache
2026-04-06 1:59 ` Zi Yan
2026-04-06 16:17 ` Nico Pache [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA1CXcC-vggwd-Z3h_PAfSk8U0ZvoQH-FpwhD=9XZHUnEGkcTg@mail.gmail.com' \
--to=npache@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=clm@fb.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=lance.yang@linux.dev \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox