* Re: Linux 6.18 amdgpu build error [not found] ` <1f31b86d-283c-4878-92d0-ab90aed0c58d@roeck-us.net> @ 2025-12-04 2:34 ` Shuah Khan 2025-12-04 6:05 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 8+ messages in thread From: Shuah Khan @ 2025-12-04 2:34 UTC (permalink / raw) To: Linus Torvalds, akpm, david Cc: Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List, Shuah Khan On 12/3/25 18:06, Guenter Roeck wrote: > On 12/3/25 14:16, Shuah Khan wrote: >> >> CONFIG_RANDSTRUCT is disabled and so are the GCC_PLUGINS in my config. > > I guess that would have been too easy... > >> I am also seeing issues with cloning kernel.org repos on my system after >> a recent update: >> >> remote: Enumerating objects: 11177736, done. >> remote: Counting objects: 100% (1231/1231), done. >> remote: Compressing objects: 100% (624/624), done. >> remote: Total 11177736 (delta 855), reused 781 (delta 606), pack-reused 11176505 (from 1) >> Receiving objects: 100% (11177736/11177736), 3.01 GiB | 7.10 MiB/s, done. >> Resolving deltas: 100% (9198323/9198323), done. >> fatal: did not receive expected object 0002003e951b5057c16de5a39140abcbf6e44e50 >> fatal: fetch-pack: invalid index-pack output >> > Linus, Andrew, and David, Finally figured this out. I narrowed it to to be the HAVE_GIGANTIC_FOLIOS support that went into Linux 6.18-rc6 in this commit: From 39231e8d6ba7f794b566fd91ebd88c0834a23b98 Mon Sep 17 00:00:00 2001 From: "David Hildenbrand (Red Hat)" <david@kernel.org> Date: Fri, 14 Nov 2025 22:49:20 +0100 Subject: [PATCH] mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb This appears to be large change than the powerpc scope. It broke my workflow completely. I sent a revert so this doesn't cause problems for others. I can reproduce this problem om two systems - with this commit git fetch-pack fails when cloning large repos and make hangs or errors out of Makefile.build with Error: 139. These failures are random with git clone failing after fetching 1% of the objects, and make hangs while compiling random files These failures are random and confusing sending me down the path of looking at tool chain. Without this commit, I can clone and build kernels on the two systems I was seeing problems. thanks, -- Shuah ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 2:34 ` Linux 6.18 amdgpu build error Shuah Khan @ 2025-12-04 6:05 ` David Hildenbrand (Red Hat) 2025-12-04 17:40 ` Shuah Khan 0 siblings, 1 reply; 8+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-12-04 6:05 UTC (permalink / raw) To: Shuah Khan, Linus Torvalds, akpm Cc: Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List On 12/4/25 03:34, Shuah Khan wrote: > On 12/3/25 18:06, Guenter Roeck wrote: >> On 12/3/25 14:16, Shuah Khan wrote: > >>> >>> CONFIG_RANDSTRUCT is disabled and so are the GCC_PLUGINS in my config. >> >> I guess that would have been too easy... >> >>> I am also seeing issues with cloning kernel.org repos on my system after >>> a recent update: >>> >>> remote: Enumerating objects: 11177736, done. >>> remote: Counting objects: 100% (1231/1231), done. >>> remote: Compressing objects: 100% (624/624), done. >>> remote: Total 11177736 (delta 855), reused 781 (delta 606), pack-reused 11176505 (from 1) >>> Receiving objects: 100% (11177736/11177736), 3.01 GiB | 7.10 MiB/s, done. >>> Resolving deltas: 100% (9198323/9198323), done. >>> fatal: did not receive expected object 0002003e951b5057c16de5a39140abcbf6e44e50 >>> fatal: fetch-pack: invalid index-pack output >>> >> > > Linus, Andrew, and David, > > Finally figured this out. I narrowed it to to be the HAVE_GIGANTIC_FOLIOS > support that went into Linux 6.18-rc6 in this commit: > > From 39231e8d6ba7f794b566fd91ebd88c0834a23b98 Mon Sep 17 00:00:00 2001 > From: "David Hildenbrand (Red Hat)" <david@kernel.org> > Date: Fri, 14 Nov 2025 22:49:20 +0100 > Subject: [PATCH] mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb > Unsuspected and confusing :( Let me take a look at reply on the revert. -- Cheers David ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 6:05 ` David Hildenbrand (Red Hat) @ 2025-12-04 17:40 ` Shuah Khan 2025-12-04 19:36 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Shuah Khan @ 2025-12-04 17:40 UTC (permalink / raw) To: David Hildenbrand (Red Hat), Linus Torvalds, akpm Cc: Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List, Shuah Khan On 12/3/25 23:05, David Hildenbrand (Red Hat) wrote: > On 12/4/25 03:34, Shuah Khan wrote: >> On 12/3/25 18:06, Guenter Roeck wrote: >>> On 12/3/25 14:16, Shuah Khan wrote: >> >>>> >>>> CONFIG_RANDSTRUCT is disabled and so are the GCC_PLUGINS in my config. >>> >>> I guess that would have been too easy... >>> >>>> I am also seeing issues with cloning kernel.org repos on my system after >>>> a recent update: >>>> >>>> remote: Enumerating objects: 11177736, done. >>>> remote: Counting objects: 100% (1231/1231), done. >>>> remote: Compressing objects: 100% (624/624), done. >>>> remote: Total 11177736 (delta 855), reused 781 (delta 606), pack-reused 11176505 (from 1) >>>> Receiving objects: 100% (11177736/11177736), 3.01 GiB | 7.10 MiB/s, done. >>>> Resolving deltas: 100% (9198323/9198323), done. >>>> fatal: did not receive expected object 0002003e951b5057c16de5a39140abcbf6e44e50 >>>> fatal: fetch-pack: invalid index-pack output >>>> >>> >> >> Linus, Andrew, and David, >> >> Finally figured this out. I narrowed it to to be the HAVE_GIGANTIC_FOLIOS >> support that went into Linux 6.18-rc6 in this commit: >> >> From 39231e8d6ba7f794b566fd91ebd88c0834a23b98 Mon Sep 17 00:00:00 2001 >> From: "David Hildenbrand (Red Hat)" <david@kernel.org> >> Date: Fri, 14 Nov 2025 22:49:20 +0100 >> Subject: [PATCH] mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb >> > > Unsuspected and confusing :( This commit has impact on all architectures, not a narrow scoped powerpc only thing - it enables HAVE_GIGANTIC_FOLIOS on x86_64 and changes the common code that determines MAX_FOLIO_ORDER in include/linux/mm.h > > Let me take a look at reply on the revert. > Sounds good. Reverting or finding a fix is good with me. It definitely impacted two of my systems and the problem was introduced in Linux 6.18-rc6 and is in Linux 6.18. thanks, -- Shuah ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 17:40 ` Shuah Khan @ 2025-12-04 19:36 ` Linus Torvalds 2025-12-04 19:45 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2025-12-04 19:36 UTC (permalink / raw) To: Shuah Khan Cc: David Hildenbrand (Red Hat), akpm, Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List On Thu, 4 Dec 2025 at 09:40, Shuah Khan <skhan@linuxfoundation.org> wrote: > > This commit has impact on all architectures, not a narrow scoped > powerpc only thing - it enables HAVE_GIGANTIC_FOLIOS on x86_64 > and changes the common code that determines MAX_FOLIO_ORDER in > include/linux/mm.h So I suspect your bisection might not have worked out, and there might be two different things going on. In particular, hugepages were broken in 6.18-rc6 due to commit adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio"). That was then fixed for rc7 (and obviously final 6.18) by commit 5bebe8de19264 ("mm/huge_memory: Fix initialization of huge zero folio"), but the breakage up until that time was a bit random. End result: if you ever ended up bisecting into that broken range between those two commits, you would get failures on some loads (but not reliably), and your bisection would end up pointing to some random thing. But as mentioned, that particular problem would have been fixed in rc7 and in final 6.18, so any issues you saw with the final build would have been due to something else. Can I ask you to try to re-do the bisection, but with that commit 5bebe8de19264 applied by hand - if it wasn't already there - every time you build a kernel that has adfb6609c680? That way the bisection wouldn't be affected by that other known bug. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 19:36 ` Linus Torvalds @ 2025-12-04 19:45 ` David Hildenbrand (Red Hat) 2025-12-04 23:20 ` Shuah Khan 0 siblings, 1 reply; 8+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-12-04 19:45 UTC (permalink / raw) To: Linus Torvalds, Shuah Khan Cc: akpm, Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List On 12/4/25 20:36, Linus Torvalds wrote: > On Thu, 4 Dec 2025 at 09:40, Shuah Khan <skhan@linuxfoundation.org> wrote: >> >> This commit has impact on all architectures, not a narrow scoped >> powerpc only thing - it enables HAVE_GIGANTIC_FOLIOS on x86_64 >> and changes the common code that determines MAX_FOLIO_ORDER in >> include/linux/mm.h > > So I suspect your bisection might not have worked out, and there might > be two different things going on. > > In particular, hugepages were broken in 6.18-rc6 due to commit > adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero > folio"). > > That was then fixed for rc7 (and obviously final 6.18) by commit > 5bebe8de19264 ("mm/huge_memory: Fix initialization of huge zero > folio"), but the breakage up until that time was a bit random. > > End result: if you ever ended up bisecting into that broken range > between those two commits, you would get failures on some loads (but > not reliably), and your bisection would end up pointing to some random > thing. > > But as mentioned, that particular problem would have been fixed in rc7 > and in final 6.18, so any issues you saw with the final build would > have been due to something else. > > Can I ask you to try to re-do the bisection, but with that commit > 5bebe8de19264 applied by hand - if it wasn't already there - every > time you build a kernel that has adfb6609c680? Right, that's what I also proposed in [1]. I cannot make sense of how 39231e8d6ba could possibly trigger it given that it only affects the value of MAX_FOLIO_ORDER --- which is primarily used for safety checks and snapshot_page(), nothing that could explain changed application behavior, really. But while Shuah is retesting, I'll go have a yet another look. [1] https://lore.kernel.org/all/78af7da4-d213-42c6-8ca6-c2bdca81f233@linuxfoundation.org/ -- Cheers David ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 19:45 ` David Hildenbrand (Red Hat) @ 2025-12-04 23:20 ` Shuah Khan 2025-12-04 23:23 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Shuah Khan @ 2025-12-04 23:20 UTC (permalink / raw) To: David Hildenbrand (Red Hat), Linus Torvalds Cc: akpm, Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List, Shuah Khan On 12/4/25 12:45, David Hildenbrand (Red Hat) wrote: > On 12/4/25 20:36, Linus Torvalds wrote: >> On Thu, 4 Dec 2025 at 09:40, Shuah Khan <skhan@linuxfoundation.org> wrote: >>> >>> This commit has impact on all architectures, not a narrow scoped >>> powerpc only thing - it enables HAVE_GIGANTIC_FOLIOS on x86_64 >>> and changes the common code that determines MAX_FOLIO_ORDER in >>> include/linux/mm.h >> >> So I suspect your bisection might not have worked out, and there might >> be two different things going on. >> >> In particular, hugepages were broken in 6.18-rc6 due to commit >> adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero >> folio"). >> >> That was then fixed for rc7 (and obviously final 6.18) by commit >> 5bebe8de19264 ("mm/huge_memory: Fix initialization of huge zero >> folio"), but the breakage up until that time was a bit random. >> Both my systems were running rc6 - I was stuck in a state where I was able to rebase to rc7 and then 6.18, but could never build either one. >> End result: if you ever ended up bisecting into that broken range >> between those two commits, you would get failures on some loads (but >> not reliably), and your bisection would end up pointing to some random >> thing. >> >> But as mentioned, that particular problem would have been fixed in rc7 >> and in final 6.18, so any issues you saw with the final build would >> have been due to something else. >> >> Can I ask you to try to re-do the bisection, but with that commit >> 5bebe8de19264 applied by hand - if it wasn't already there - every >> time you build a kernel that has adfb6609c680? When I suspected rc6 to be the problem, I booted rc5 and compiled 6.18 after reverting 39231e8d6ba based on config file changes between rc5 and rc6. > > Right, that's what I also proposed in [1]. > > I cannot make sense of how 39231e8d6ba could possibly trigger it given that it only affects the value of MAX_FOLIO_ORDER --- which is primarily used for safety checks and snapshot_page(), nothing that could explain changed application behavior, really. > > But while Shuah is retesting, I'll go have a yet another look. I retested on both systems on 6.18 making sure I have 5bebe8de19264 and 39231e8d6ba in there. I cloned linux_next and built it on both. I didn't see any problems on 6.18. Having said that, It might make sense to hold off on including 39231e8d6ba in 6.18 so there is more time to test beyond 2 rc cycles. That is for you all to decide. thanks, -- Shuah ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 23:20 ` Shuah Khan @ 2025-12-04 23:23 ` Linus Torvalds 2025-12-04 23:28 ` Shuah Khan 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2025-12-04 23:23 UTC (permalink / raw) To: Shuah Khan Cc: David Hildenbrand (Red Hat), akpm, Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List On Thu, 4 Dec 2025 at 15:20, Shuah Khan <skhan@linuxfoundation.org> wrote: > > I didn't see any problems on 6.18. Ahh. So it might be just that buggy commit adfb6609c680 then, and the fix already being in rc7 (and final). Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux 6.18 amdgpu build error 2025-12-04 23:23 ` Linus Torvalds @ 2025-12-04 23:28 ` Shuah Khan 0 siblings, 0 replies; 8+ messages in thread From: Shuah Khan @ 2025-12-04 23:28 UTC (permalink / raw) To: Linus Torvalds Cc: David Hildenbrand (Red Hat), akpm, Alexander Deucher, Linux Kernel Mailing List, amd-gfx, dri-devel, Guenter Roeck, Linux Memory Management List, Shuah Khan On 12/4/25 16:23, Linus Torvalds wrote: > On Thu, 4 Dec 2025 at 15:20, Shuah Khan <skhan@linuxfoundation.org> wrote: >> >> I didn't see any problems on 6.18. > > Ahh. So it might be just that buggy commit adfb6609c680 then, and the > fix already being in rc7 (and final). > Yes - correct. thanks, -- Shuah ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-12-04 23:28 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <74032153-813a-4a40-8363-cce264f4d5ea@linuxfoundation.org>
[not found] ` <1eb24816-530b-4470-8e58-ce7d8297996c@roeck-us.net>
[not found] ` <0425d7b4-76e4-4057-83a5-a7b17a051c54@linuxfoundation.org>
[not found] ` <ec77d11a-7613-4b75-8c9e-f2bba1595f0f@roeck-us.net>
[not found] ` <9d520a1d-0b8d-4d30-b29f-230fc0f92b8a@linuxfoundation.org>
[not found] ` <1f31b86d-283c-4878-92d0-ab90aed0c58d@roeck-us.net>
2025-12-04 2:34 ` Linux 6.18 amdgpu build error Shuah Khan
2025-12-04 6:05 ` David Hildenbrand (Red Hat)
2025-12-04 17:40 ` Shuah Khan
2025-12-04 19:36 ` Linus Torvalds
2025-12-04 19:45 ` David Hildenbrand (Red Hat)
2025-12-04 23:20 ` Shuah Khan
2025-12-04 23:23 ` Linus Torvalds
2025-12-04 23:28 ` Shuah Khan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox