On 10/12/23 07:53, Mike Kravetz wrote: > On 10/11/23 17:03, Nathan Chancellor wrote: > > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote: > > > On 10/09/23 15:56, Usama Arif wrote: > > > > I suspect the crash that our continuous integration spotted [1] is the > > same issue that Konrad is seeing, as I have bisected that failure to > > bfb41d6b2fe1 in next-20231009. However, neither the first half of your > > diff (since the second half does not apply at bfb41d6b2fe1) nor the > > original patch in this thread resolves the issue though, so maybe it is > > entirely different from Konrad's? > > > > For what it's worth, this issue is only visible for me when building for > > arm64 using LLVM with CONFIG_INIT_STACK_NONE=y, instead of the default > > CONFIG_INIT_STACK_ALL_ZERO=y (which appears to hide the problem?), > > making it seem like it could be something with uninitialized memory... I > > have not been able to reproduce it with GCC, which could also mean > > something. > > Thank you Nathan! That is very helpful. > > I will use this information to try and recreate. If I can recreate, I > should be able to get to root cause. I could easily recreate the issue using the provided instructions. First thing I did was add a few printk's to check/verify state. The beginning of gather_bootmem_prealloc looked like this: static void __init gather_bootmem_prealloc(void) { LIST_HEAD(folio_list); struct huge_bootmem_page *m; struct hstate *h, *prev_h = NULL; if (list_empty(&huge_boot_pages)) printk("gather_bootmem_prealloc: huge_boot_pages list empty\n"); list_for_each_entry(m, &huge_boot_pages, list) { struct page *page = virt_to_page(m); struct folio *folio = (void *)page; printk("gather_bootmem_prealloc: loop entry m %lx\n", (unsigned long)m); The STRANGE thing is that the printk after testing for list_empty would print, then we would enter the 'list_for_each_entry()' loop as if the list was not empty. This is the cause of the addressing exception. m pointed to the list head as opposed to an entry on the list. I have attached disassembly of gather_bootmem_prealloc with INIT_STACK_NONE and INIT_STACK_ALL_ZERO. disassembly listings are for code without printks. This is the first time I have looked at arm assembly, so I may be missing something. However, in the INIT_STACK_NONE case it looks like we get the address of huge_boot_pages into a register but do not use it to determine if we should execute the loop. Code generated with INIT_STACK_ALL_ZERO seems to show code checking the list before entering the loop. Can someone with more arm assembly experience take a quick look? Since huge_boot_pages is a global variable rather than on the stack, I can't see how INIT_STACK_ALL_ZERO/INIT_STACK_NONE could make a difference. -- Mike Kravetz