From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Shuah Khan <skhan@linuxfoundation.org>, Mike Rapoport <rppt@kernel.org>
Cc: akpm@linux-foundation.org, maddy@linux.ibm.com,
mpe@ellerman.id.au, npiggin@gmail.com,
christophe.leroy@csgroup.eu, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, surenb@google.com,
mhocko@suse.com, masahiroy@kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH] Revert "mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb"
Date: Thu, 4 Dec 2025 20:38:56 +0100 [thread overview]
Message-ID: <8a5b59b9-112d-44cf-b81e-6f79f59bb999@kernel.org> (raw)
In-Reply-To: <78af7da4-d213-42c6-8ca6-c2bdca81f233@linuxfoundation.org>
On 12/4/25 18:03, Shuah Khan wrote:
> On 12/3/25 23:35, Mike Rapoport wrote:
>> On Thu, Dec 04, 2025 at 07:17:06AM +0100, David Hildenbrand (Red Hat) wrote:
>>> Hi,
>>>
>>> On 12/4/25 03:33, Shuah Khan wrote:
>>>> This reverts commit 39231e8d6ba7f794b566fd91ebd88c0834a23b98.
>>>
>>> That was supposed to fix powerpc handling though. So I think we have to
>>> understand what is happening here.
>
> This patch changes include/linux/mm.h and mm/Kconfig in addition to
> arch/powerpc/Kconfig and arch/powerpc/platforms/Kconfig.cputype
>
> With this patch HAVE_GIGANTIC_FOLIOS is enabled on x86_64 config
>
> The following mm/Kconfig isn't arch specific. This makes this
> not powerpc specific and this is enabled on x86_64
Yes, and as the patch explains that's expected. See below.
>
> +#
> +# We can end up creating gigantic folio.
> +#
> +config HAVE_GIGANTIC_FOLIOS
> + def_bool (HUGETLB_PAGE && ARCH_HAS_GIGANTIC_PAGE) || \
> + (ZONE_DEVICE && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
> +
>
> The following change in include/linux/mm.h is also generic
> and applies to x86_64 as well.
>
> -#if !defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE)
> +#if !defined(CONFIG_HAVE_GIGANTIC_FOLIOS)
>
> Is this not intended on all architectures?
All expected. See below.
>
>>>
>>>>
>>>> Enabling HAVE_GIGANTIC_FOLIOS broke kernel build and git clone on two
>>>> systems. git fetch-pack fails when cloning large repos and make hangs
>>>> or errors out of Makefile.build with Error: 139. These failures are
>>>> random with git clone failing after fetching 1% of the objects, and
>>>> make hangs while compiling random files.
>>>
>>> On which architecture do we see these issues and with which kernel configs?
>>> Can you share one?
>
> Config attached.
Okay, let's walk this through. The config has:
CONFIG_HAVE_GIGANTIC_FOLIOS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_ZONE_DEVICE=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_VMEMMAP=y
In the old code:
#if !defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE)
/*
* We don't expect any folios that exceed buddy sizes (and consequently
* memory sections).
*/
#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
/*
* Only pages within a single memory section are guaranteed to be
* contiguous. By limiting folios to a single memory section, all folio
* pages are guaranteed to be contiguous.
*/
#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
#else
/*
* There is no real limit on the folio size. We limit them to the maximum we
* currently expect (e.g., hugetlb, dax).
*/
#define MAX_FOLIO_ORDER PUD_ORDER
#endif
We would get MAX_FOLIO_ORDER = PUD_ORDER = 18
In the new code we will get:
#if !defined(CONFIG_HAVE_GIGANTIC_FOLIOS)
/*
* We don't expect any folios that exceed buddy sizes (and consequently
* memory sections).
*/
#define MAX_FOLIO_ORDER MAX_PAGE_ORDER
#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
/*
* Only pages within a single memory section are guaranteed to be
* contiguous. By limiting folios to a single memory section, all folio
* pages are guaranteed to be contiguous.
*/
#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT
#elif defined(CONFIG_HUGETLB_PAGE)
/*
* There is no real limit on the folio size. We limit them to the maximum we
* currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we expect
* no folios larger than 16 GiB on 64bit and 1 GiB on 32bit.
*/
#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_1G)
#else
/*
* Without hugetlb, gigantic folios that are bigger than a single PUD are
* currently impossible.
*/
#define MAX_FOLIO_ORDER PUD_ORDER
#endif
MAX_FOLIO_ORDER = get_order(SZ_16G) = 22
That's expected and okay (raising the maximum we expect), as we only want to set a
rough upper cap on the maximum folio size.
As I raised, observe how MAX_FOLIO_ORDER is only used to
* trigger warnings if we observe an unexpectedly large folio size. Safety checks.
* use it when dumping a folio to detect possible folio corruption on unexpected folio sizes
>
>>>
>>>>
>>>> The blow is is one of the git clone failures:
>>>>
>>>> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux_6.19
>>>> Cloning into 'linux_6.19'...
>>>> remote: Enumerating objects: 11173575, done.
>>>> remote: Counting objects: 100% (785/785), done.
>>>> remote: Compressing objects: 100% (373/373), done.
>>>> remote: Total 11173575 (delta 534), reused 505 (delta 411), pack-reused 11172790 (from 1)
>>>> Receiving objects: 100% (11173575/11173575), 3.00 GiB | 7.08 MiB/s, done.
>>>> Resolving deltas: 100% (9195212/9195212), done.
>>>> fatal: did not receive expected object 0002003e951b5057c16de5a39140abcbf6e44e50
>>>> fatal: fetch-pack: invalid index-pack output
>>>
>>> If I would have to guess, these symptoms match what we saw between commit
>>> adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio")
>>> and commit 5bebe8de1926 ("mm/huge_memory: Fix initialization of huge zero folio").
>>>
>>> 5bebe8de1926 went into v6.18-rc7.
>>>
>>> Just to be sure, are you sure we were able to reproduce this issue with a
>>> v6.18-rc7 or even v6.18 that contains 5bebe8de1926?
>>>
>>> Bisecting might give you wrong results, as the problems of adfb6609c680 do not
>>> reproduce reliably.
>>
>> I can confirm that bisecting gives odd results between v6.18-rc5 and
>> v6.18-rc6. I was seeing failures in some tests, bisected a few times and
>> got a bunch of bogus commits including 3470715e5c22 ("MAINTAINERS: update
>> David Hildenbrand's email address") :)
>
> I am sure this patch is the cause oh the problems I have seen on my two
> systems. Reverting this commit solved issues since this commit does
> impact all architectures enabling HAVE_GIGANTIC_FOLIOS if the conditions
> are right.
>
>>
>> And 5bebe8de1926 actually solved the issue for me.
>
> Were you seeing the problems I reported without 5bebe8de1926?
> Is 5bebe8de1926 is 6.18?
We were seeing all kinds of different segmentation faults or corruptions.
In my case, every-time I tried to login something would segfault. For others,
compilers stopped working or they got different random segfaults.
Assume you think you have a shared zero page, but every time you reboot it's filled
with other garbage data. Not good when your app assumes something contains 0s.
>
> I can try this commit with 39231e8d6ba7f794b566fd91ebd88c0834a23b98
> and see what happens on my system.
Yes, please. I cannot yet make sense of how MAX_FOLIO_ORDER would make any
difference.
Unless you would actually be seeing one of the WARNINGS that are based on
MAX_FOLIO_ORDER / MAX_FOLIO_NR_PAGES. But I guess none showed up in dmesg?
--
Cheers
David
next prev parent reply other threads:[~2025-12-04 19:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 2:33 Shuah Khan
2025-12-04 6:17 ` David Hildenbrand (Red Hat)
2025-12-04 6:35 ` Mike Rapoport
2025-12-04 17:03 ` Shuah Khan
2025-12-04 19:38 ` David Hildenbrand (Red Hat) [this message]
2025-12-04 20:59 ` Mike Rapoport
2025-12-04 21:27 ` Mike Rapoport
2025-12-04 21:57 ` Andrew Morton
2025-12-04 22:12 ` David Hildenbrand (Red Hat)
2025-12-05 7:01 ` Mike Rapoport
2025-12-05 7:05 ` David Hildenbrand (Red Hat)
2025-12-05 7:41 ` Christophe Leroy (CS GROUP)
2025-12-05 18:19 ` Andrew Morton
2025-12-04 22:16 ` David Hildenbrand (Red Hat)
2025-12-04 23:23 ` Shuah Khan
2025-12-05 6:50 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a5b59b9-112d-44cf-b81e-6f79f59bb999@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maddy@linux.ibm.com \
--cc=masahiroy@kernel.org \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=rppt@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox