From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
Suren Baghdasaryan <surenb@google.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
Mike Rapoport <rppt@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Michal Hocko <mhocko@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Nico Pache <npache@redhat.com>, Dev Jain <dev.jain@arm.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Jens Axboe <axboe@kernel.dk>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
willy@infradead.org, x86@kernel.org, linux-block@vger.kernel.org,
Ritesh Harjani <ritesh.list@gmail.com>,
linux-fsdevel@vger.kernel.org,
"Darrick J . Wong" <djwong@kernel.org>,
mcgrof@kernel.org, gost.dev@samsung.com, hch@lst.de,
Pankaj Raghav <p.raghav@samsung.com>
Subject: Re: [PATCH 3/5] mm: add static huge zero folio
Date: Mon, 4 Aug 2025 18:18:43 +0100 [thread overview]
Message-ID: <6ff6fc46-49f1-49b0-b7e4-4cb37ec10a57@lucifer.local> (raw)
In-Reply-To: <70049abc-bf79-4d04-a0a8-dd3787195986@redhat.com>
On Mon, Aug 04, 2025 at 07:07:06PM +0200, David Hildenbrand wrote:
> > Yeah I really don't like this. This seems overly complicated and too
> > fiddly. Also if I want a static PMD, do I want to wait a minute for next
> > attempt?
> >
> > Also doing things this way we might end up:
> >
> > 0. Enabling CONFIG_STATIC_HUGE_ZERO_FOLIO
> > 1. Not doing anything that needs a static PMD for a while + get fragmentation.
> > 2. Do something that needs it - oops can't get order-9 page, and waiting 60
> > seconds between attempts
> > 3. This is silent so you think you have it switched on but are actually getting
> > bad performance.
> >
> > I appreciate wanting to reuse this code, but we need to find a way to do this
> > really really early, and get rid of this arbitrary time out. It's very aribtrary
> > and we have no easy way of tracing how this might behave under workload.
> >
> > Also we end up pinning an order-9 page either way, so no harm in getting it
> > first thing?
>
> What we could do, to avoid messing with memblock and two ways of initializing a huge zero folio early, and just disable the shrinker.
Nice, I like this approach!
>
> Downside is that the page is really static (not just when actually used at least once). I like it:
Well I'm not sure this is a downside :P
User is explicitly enabling an option that says 'I'm cool to lose an order-9
page for this'.
>
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0ce86e14ab5e1..8e2aa18873098 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -153,6 +153,7 @@ config X86
> select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
> select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64
> select ARCH_WANTS_THP_SWAP if X86_64
> + select ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO if X86_64
> select ARCH_HAS_PARANOID_L1D_FLUSH
> select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
> select BUILDTIME_TABLE_SORT
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7748489fde1b7..ccfa5c95f14b1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -495,6 +495,17 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)
> struct folio *mm_get_huge_zero_folio(struct mm_struct *mm);
> void mm_put_huge_zero_folio(struct mm_struct *mm);
> +static inline struct folio *get_static_huge_zero_folio(void)
> +{
> + if (!IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO))
> + return NULL;
> +
> + if (unlikely(!huge_zero_folio))
> + return NULL;
> +
> + return huge_zero_folio;
> +}
> +
> static inline bool thp_migration_supported(void)
> {
> return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
> @@ -685,6 +696,11 @@ static inline int change_huge_pud(struct mmu_gather *tlb,
> {
> return 0;
> }
> +
> +static inline struct folio *get_static_huge_zero_folio(void)
> +{
> + return NULL;
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> static inline int split_folio_to_list_to_order(struct folio *folio,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e443fe8cd6cf2..366a6d2d771e3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -823,6 +823,27 @@ config ARCH_WANT_GENERAL_HUGETLB
> config ARCH_WANTS_THP_SWAP
> def_bool n
> +config ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO
> + def_bool n
> +
> +config STATIC_HUGE_ZERO_FOLIO
> + bool "Allocate a PMD sized folio for zeroing"
> + depends on ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO && TRANSPARENT_HUGEPAGE
> + help
> + Without this config enabled, the huge zero folio is allocated on
> + demand and freed under memory pressure once no longer in use.
> + To detect remaining users reliably, references to the huge zero folio
> + must be tracked precisely, so it is commonly only available for mapping
> + it into user page tables.
> +
> + With this config enabled, the huge zero folio can also be used
> + for other purposes that do not implement precise reference counting:
> + it is allocated statically and never freed, allowing for more
> + wide-spread use, for example, when performing I/O similar to the
> + traditional shared zeropage.
> +
> + Not suitable for memory constrained systems.
> +
> config MM_ID
> def_bool n
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ff06dee213eb2..f65ba3e6f0824 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -866,9 +866,14 @@ static int __init thp_shrinker_init(void)
> huge_zero_folio_shrinker->scan_objects = shrink_huge_zero_folio_scan;
> shrinker_register(huge_zero_folio_shrinker);
> - deferred_split_shrinker->count_objects = deferred_split_count;
> - deferred_split_shrinker->scan_objects = deferred_split_scan;
> - shrinker_register(deferred_split_shrinker);
> + if (IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO)) {
> + if (!get_huge_zero_folio())
> + pr_warn("Allocating static huge zero folio failed\n");
> + } else {
> + deferred_split_shrinker->count_objects = deferred_split_count;
> + deferred_split_shrinker->scan_objects = deferred_split_scan;
> + shrinker_register(deferred_split_shrinker);
> + }
> return 0;
> }
> --
> 2.50.1
>
>
> Now, one thing I do not like is that we have "ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO" but
> then have a user-selectable option.
>
> Should we just get rid of ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO?
Yeah, though I guess we probably need to make it need CONFIG_MMU if so?
Probably don't want to provide it if it might somehow break things?
I guess we could keep it as long as CONFIG_STATIC_HUGE_ZERO_FOLIO depend on
something sensible like CONFIG_MMU maybe 64-bit too?
Anyway this approach looks generally good!
>
> --
> Cheers,
>
> David / dhildenb
>
Cheers, Lorenzo
next prev parent reply other threads:[~2025-08-04 17:19 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-04 12:13 [PATCH 0/5] add static huge zero folio support Pankaj Raghav (Samsung)
2025-08-04 12:13 ` [PATCH 1/5] mm: rename huge_zero_page to huge_zero_folio Pankaj Raghav (Samsung)
2025-08-04 18:14 ` Zi Yan
2025-08-04 12:13 ` [PATCH 2/5] mm: rename MMF_HUGE_ZERO_PAGE to MMF_HUGE_ZERO_FOLIO Pankaj Raghav (Samsung)
2025-08-04 15:24 ` Lorenzo Stoakes
2025-08-04 16:20 ` David Hildenbrand
2025-08-04 18:04 ` Zi Yan
2025-08-04 12:13 ` [PATCH 3/5] mm: add static huge zero folio Pankaj Raghav (Samsung)
2025-08-04 16:46 ` Lorenzo Stoakes
2025-08-04 17:07 ` David Hildenbrand
2025-08-04 17:08 ` David Hildenbrand
2025-08-04 17:18 ` Lorenzo Stoakes [this message]
2025-08-05 10:55 ` David Hildenbrand
2025-08-05 11:40 ` Pankaj Raghav (Samsung)
2025-08-05 12:10 ` David Hildenbrand
2025-08-05 13:40 ` Lorenzo Stoakes
2025-08-06 12:18 ` Pankaj Raghav (Samsung)
2025-08-06 12:24 ` David Hildenbrand
2025-08-06 12:28 ` Pankaj Raghav (Samsung)
2025-08-06 12:36 ` David Hildenbrand
2025-08-06 12:43 ` Pankaj Raghav (Samsung)
2025-08-06 12:48 ` David Hildenbrand
2025-08-05 16:33 ` Dave Hansen
2025-08-06 8:26 ` Pankaj Raghav (Samsung)
2025-08-04 12:13 ` [PATCH 4/5] mm: add largest_zero_folio() routine Pankaj Raghav (Samsung)
2025-08-04 16:50 ` Lorenzo Stoakes
2025-08-05 11:24 ` Pankaj Raghav (Samsung)
2025-08-04 18:13 ` Zi Yan
2025-08-05 16:42 ` Dave Hansen
2025-08-06 7:59 ` Pankaj Raghav (Samsung)
2025-08-04 12:13 ` [PATCH 5/5] block: use largest_zero_folio in __blkdev_issue_zero_pages() Pankaj Raghav (Samsung)
2025-08-04 16:53 ` Lorenzo Stoakes
2025-08-05 16:00 ` [PATCH 0/5] add static huge zero folio support Dave Hansen
2025-08-06 8:31 ` Pankaj Raghav (Samsung)
2025-08-06 11:28 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ff6fc46-49f1-49b0-b7e4-4cb37ec10a57@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=baolin.wang@linux.alibaba.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=kernel@pankajraghav.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=npache@redhat.com \
--cc=p.raghav@samsung.com \
--cc=ritesh.list@gmail.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox