From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org
Cc: Zi Yan <ziy@nvidia.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Alexander Potapenko <glider@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Brendan Jackman <jackmanb@google.com>,
Christoph Lameter <cl@gentwo.org>,
Dennis Zhou <dennis@kernel.org>,
Dmitry Vyukov <dvyukov@google.com>,
dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
iommu@lists.linux.dev, io-uring@vger.kernel.org,
Jason Gunthorpe <jgg@nvidia.com>, Jens Axboe <axboe@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>,
John Hubbard <jhubbard@nvidia.com>,
kasan-dev@googlegroups.com, kvm@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-arm-kernel@axis.com, linux-arm-kernel@lists.infradead.org,
linux-crypto@vger.kernel.org, linux-ide@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org,
linux-mmc@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
linux-scsi@vger.kernel.org, Marco Elver <elver@google.com>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
netdev@vger.kernel.org, Oscar Salvador <osalvador@suse.de>,
Peter Xu <peterx@redhat.com>, Robin Murphy <robin.murphy@arm.com>,
Suren Baghdasaryan <surenb@google.com>, Tejun Heo <tj@kernel.org>,
virtualization@lists.linux.dev, Vlastimil Babka <vbabka@suse.cz>,
wireguard@lists.zx2c4.com, x86@kernel.org,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
Date: Thu, 9 Oct 2025 12:01:08 +0200 [thread overview]
Message-ID: <0c730c52-97ee-43ea-9697-ac11d2880ab7@csgroup.eu> (raw)
In-Reply-To: <1db15a30-72d6-4045-8aa1-68bd8411b0ba@redhat.com>
Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
> On 09.10.25 11:16, Christophe Leroy wrote:
>>
>>
>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>> --- a/mm/hugetlb.c
>>>>>>> +++ b/mm/hugetlb.c
>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>> BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>> BITS_PER_BYTE <
>>>>>>> __NR_HPAGEFLAGS);
>>>>>>> + BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>> if (!hugepages_supported()) {
>>>>>>> if (hugetlb_max_hstate ||
>>>>>>> default_hstate_max_huge_pages)
>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>> order)
>>>>>>> }
>>>>>>> BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>> BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>> + WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>> h = &hstates[hugetlb_max_hstate++];
>>>>>>> __mutex_init(&h->resize_lock, "resize mutex", &h-
>>>>>>> >resize_key);
>>>>>>> h->order = order;
>>>>>
>>>>> We end up registering hugetlb folios that are bigger than
>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger
>>>>> that
>>>>> (and if we have to support that).
>>>>>
>>>>
>>>> MAX_FOLIO_ORDER is defined as:
>>>>
>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>> #define MAX_FOLIO_ORDER PUD_ORDER
>>>> #else
>>>> #define MAX_FOLIO_ORDER MAX_PAGE_ORDER
>>>> #endif
>>>>
>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>
>>>> hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>
>>>> Gives:
>>>>
>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>
>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing
>>> folio_dump() code would not handle it correctly as well.
>>
>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>> ("hugetlb: allow to free gigantic pages regardless of the
>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>> needed to be able to allocate gigantic pages at runtime. It is not
>> needed to reserve gigantic pages at boottime.
>>
>> What am I missing ?
>
> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its
> name.
In its name for sure, but the commit I mention says:
On systems without CONTIG_ALLOC activated but that support gigantic
pages,
boottime reserved gigantic pages can not be freed at all. This patch
simply enables the possibility to hand back those pages to memory
allocator.
And one of the hunks is:
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,7 +19,7 @@ config ARM64
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
- select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
+ select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
So I understand from the commit message that it was possible at that
time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as
you didn't have to be able to free them during runtime.
>
> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant
> hugetlb config that allows for *gigantic pages*.
>
We probably can, but I'd really like to understand history and how we
ended up in the situation we are now.
Because blind fixes often lead to more problems.
If I follow things correctly I see a helper gigantic_page_supported()
added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page
allocation at runtime").
And then commit 461a7184320a ("mm/hugetlb: introduce
ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()
Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages
regardless of the configuration") changed gigantic_page_supported() to
gigantic_page_runtime_supported()
So where are we now ?
Christophe
next prev parent reply other threads:[~2025-10-09 10:01 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-01 15:03 [PATCH v2 00/37] mm: remove nth_page() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 01/37] mm: stop making SPARSEMEM_VMEMMAP user-selectable David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 02/37] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP" David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 03/37] s390/Kconfig: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 04/37] x86/Kconfig: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 05/37] wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config David Hildenbrand
2025-09-08 16:48 ` Jason A. Donenfeld
2025-09-01 15:03 ` [PATCH v2 06/37] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 07/37] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
2025-10-09 7:14 ` (bisected) " Christophe Leroy
2025-10-09 7:22 ` David Hildenbrand
2025-10-09 7:44 ` Christophe Leroy
2025-10-09 8:04 ` Christophe Leroy
2025-10-09 8:14 ` David Hildenbrand
2025-10-09 9:16 ` Christophe Leroy
2025-10-09 9:20 ` David Hildenbrand
2025-10-09 10:01 ` Christophe Leroy [this message]
2025-10-09 10:27 ` David Hildenbrand
2025-10-09 12:08 ` Christophe Leroy
2025-10-09 13:05 ` David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 09/37] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 10/37] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 11/37] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 12/37] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 13/37] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 14/37] mm/mm/percpu-km: drop nth_page() usage within single allocation David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 15/37] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 16/37] fs: hugetlbfs: cleanup " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 17/37] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 18/37] mm/gup: drop nth_page() usage within folio when recording subpages David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 19/37] mm/gup: remove record_subpages() David Hildenbrand
2025-09-05 6:41 ` David Hildenbrand
2025-09-05 11:26 ` Jens Axboe
2025-09-05 11:34 ` Lorenzo Stoakes
2025-09-05 11:38 ` David Hildenbrand
2025-09-05 23:00 ` Eric Biggers
2025-09-06 6:57 ` David Hildenbrand
2025-09-09 4:25 ` Andrew Morton
2025-09-06 1:05 ` John Hubbard
2025-09-06 6:56 ` David Hildenbrand
2025-09-06 7:00 ` David Hildenbrand
2025-09-07 5:14 ` John Hubbard
2025-09-08 8:00 ` David Hildenbrand
2025-09-08 12:25 ` Lorenzo Stoakes
2025-09-08 12:53 ` David Hildenbrand
2025-09-08 17:12 ` John Hubbard
2025-09-08 15:16 ` Mark Brown
2025-09-08 15:22 ` David Hildenbrand
2025-09-08 15:28 ` Mark Brown
2025-09-01 15:03 ` [PATCH v2 20/37] io_uring/zcrx: remove nth_page() usage within folio David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 21/37] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 22/37] mm/cma: refuse handing out non-contiguous page ranges David Hildenbrand
2025-09-09 9:55 ` David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 23/37] dma-remap: drop nth_page() in dma_common_contiguous_remap() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 24/37] scatterlist: disallow non-contigous page ranges in a single SG entry David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 25/37] ata: libata-sff: drop nth_page() usage within " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 26/37] drm/i915/gem: " David Hildenbrand
2025-09-02 9:22 ` Tvrtko Ursulin
2025-09-02 9:42 ` David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 27/37] mspro_block: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 28/37] memstick: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 29/37] mmc: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 30/37] scsi: scsi_lib: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 31/37] scsi: sg: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 32/37] vfio/pci: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 33/37] crypto: remove " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 34/37] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 35/37] kfence: drop nth_page() usage David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 36/37] block: update comment of "struct bio_vec" regarding nth_page() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 37/37] mm: remove nth_page() David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0c730c52-97ee-43ea-9697-ac11d2880ab7@csgroup.eu \
--to=christophe.leroy@csgroup.eu \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=cl@gentwo.org \
--cc=david@redhat.com \
--cc=dennis@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=hannes@cmpxchg.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=io-uring@vger.kernel.org \
--cc=iommu@lists.linux.dev \
--cc=jackmanb@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kasan-dev@googlegroups.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@axis.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mmc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=m.szyprowski@samsung.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux.dev \
--cc=wireguard@lists.zx2c4.com \
--cc=x86@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox