Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.cmpxchg.org/linux-mmotm.git master commit 27e2ce5dba4c30db031744c8140675d03d2ae7aa Author: Pavel Tatashin AuthorDate: Thu May 3 23:02:17 2018 +0000 Commit: Johannes Weiner CommitDate: Thu May 3 23:02:17 2018 +0000 mm: access to uninitialized struct page The following two bugs were reported by Fengguang Wu: kernel reboot-without-warning in early-boot stage, last printk: early console in setup code http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com And, also: [per_cpu_ptr_to_phys] PANIC: early exception 0x0d IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000 http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com Both of the problems are due to accessing uninitialized struct page from trap_init(). We must first do mm_init() in order to initialize allocated struct pages, and than we can access fields of any struct page that belongs to memory that's been allocated. Below is explanation of the root cause. The issue arises in this stack: start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT) The returned "struct page" is sometimes uninitialized, and thus failing later when used. It turns out sometimes is because it depends on KASLR. When boot is failing we have this when pfn_to_page() is called: kasrl: 0x000000000d600000 addr: ffffffff83e0d000 pa: 1040d000 pfn: 1040d page: ffff88001f113340 page->flags ffffffffffffffff <- Uninitialized! When boot is successful: kaslr: 0x000000000a800000 addr: ffffffff83e0d000 pa: d60d000 pfn: d60d page: ffff88001f05b340 page->flags 280000000000 <- Initialized! Here are physical addresses that BIOS provided to us: e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved In both cases, working and non-working the real physical address is the same: pa - kasrl = 0x2E0D000 The only thing that is different is PFN. We initialize struct pages in four places: 1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones. 2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock. 3. Using on-demand logic when pages are allocated after mm_init call 4. After smp_init() when the rest free deferred pages are initialized. The above path happens before deferred memory is initialized, and thus it must be covered either by 1, 2 or 3. So, lets check what PFNs are initialized after (1). memmap_init_zone() is called for pfn ranges: 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000, as it leaves the rest to be initialized as deferred pages. In the working scenario pfn ended up being below 1000, but in the failing scenario it is above. Hence, we must initialize this page in (2). But trap_init() is called before mm_init(). The bug was introduced by "mm: initialize pages on demand during boot" because we lowered amount of pages that is initialized in the step (1). But, it still could happen, because the number of initialized pages was a guessing. The current fix moves trap_init() to be called after mm_init, but as alternative, we could increase pgdat->static_init_pgcnt: In free_area_init_node we can increase: pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION, pgdat->node_spanned_pages); Instead of one PAGES_PER_SECTION, set several, so the text is covered for all KASLR offsets. But, this would still be guessing. Therefore, I prefer the current fix. Link: http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot") Signed-off-by: Pavel Tatashin Reviewed-by: Steven Rostedt (VMware) Cc: Steven Sistare Cc: Daniel Jordan Cc: Thomas Gleixner Cc: Michal Hocko Cc: Mel Gorman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Steven Rostedt (VMware) Cc: Fengguang Wu Cc: Dennis Zhou Signed-off-by: Andrew Morton 7a0e68e17b mm: sections are not offlined during memory hotremove 27e2ce5dba mm: access to uninitialized struct page f26843c180 pci: test for unexpectedly disabled bridges +-------------------------------------------------+------------+------------+------------+ | | 7a0e68e17b | 27e2ce5dba | mmotm/v4.1 | +-------------------------------------------------+------------+------------+------------+ | boot_successes | 29 | 0 | 0 | | boot_failures | 11 | 15 | 21 | | BUG:soft_lockup-CPU##stuck_for#s | 11 | | | | RIP:ftrace_likely_update | 2 | | | | Kernel_panic-not_syncing:softlockup:hung_tasks | 11 | | | | RIP:__local_bh_enable_ip | 4 | | | | RIP:__local_bh_disable_ip | 4 | | | | RIP:_raw_spin_lock_bh | 1 | | | | BUG:kernel_reboot-without-warning_in_test_stage | 0 | 15 | 21 | +-------------------------------------------------+------------+------------+------------+ [ 7.926963] VIA Graphics Integration Chipset framebuffer 2.4 initializing [ 7.927638] vmlfb: initializing [ 7.927964] no IO addresses supplied [ 7.928403] usbcore: registered new interface driver udlfb [ 7.928829] usbcore: registered new interface driver smscufx BUG: kernel reboot-without-warning in test stage # HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD git bisect start fde0dff646849fa89e3a4719076b95cf79b4bf8b 6da6c0db5316275015e8cc2959f12a17584aeb64 -- git bisect bad fd90f2f335c4e4c7ba2b0abbaf4d39f9abf4d036 # 04:49 B 0 11 24 0 Merge 'superna9999/amlogic/v4.17/ao-cec-monitor' into devel-catchup-201805060049 git bisect bad 74e54669e872ec6e10d357dfbb7b3fed586a4076 # 05:05 B 0 6 19 0 Merge 'pinchartl-media/drm/du/next' into devel-catchup-201805060049 git bisect bad cb455c1cfb0892769f12297c0d960fef2cdaf86b # 05:19 B 0 11 24 0 Merge 'yhuang/fix_thp_swap' into devel-catchup-201805060049 git bisect good 248940b0466555830976ee7b03ac83e1b6f19248 # 05:48 G 11 0 2 8 0day base guard for 'devel-catchup-201805060049' git bisect bad 47038efa1e57bd47df8bf3a5ca43fcc650e6034d # 06:26 B 0 4 17 0 mm/ksm: remove unused page_referenced_ksm declaration git bisect bad 3e86b1e8f205968b70c55d4272acef28b7d404e5 # 06:59 B 0 11 24 0 mm, memcontrol: implement memory.swap.events git bisect bad 9e57af3cb2757349d0a02e1e05b298651503d2d2 # 07:18 B 0 11 26 2 scripts/faddr2line: fix error when addr2line output contains discriminator git bisect good 9d98b109b501913c8414739830995d36615de37e # 07:39 G 11 0 1 1 z3fold: fix reclaim lock-ups git bisect bad 57780f28c60a16270830fd95e6171121699eb87c # 07:48 B 0 2 15 0 mm: don't show nr_indirectly_reclaimable in /proc/vmstat git bisect good 7a0e68e17b8aa41aa33e8c80015e36d47dde390a # 08:11 G 11 0 4 4 mm: sections are not offlined during memory hotremove git bisect bad 27e2ce5dba4c30db031744c8140675d03d2ae7aa # 08:25 B 0 3 16 0 mm: access to uninitialized struct page # first bad commit: [27e2ce5dba4c30db031744c8140675d03d2ae7aa] mm: access to uninitialized struct page git bisect good 7a0e68e17b8aa41aa33e8c80015e36d47dde390a # 08:37 G 32 0 7 11 mm: sections are not offlined during memory hotremove # extra tests with debug options git bisect bad 27e2ce5dba4c30db031744c8140675d03d2ae7aa # 08:53 B 0 11 24 0 mm: access to uninitialized struct page # extra tests on HEAD of linux-devel/devel-catchup-201805060049 git bisect bad fde0dff646849fa89e3a4719076b95cf79b4bf8b # 08:53 B 0 29 61 0 0day head guard for 'devel-catchup-201805060049' # extra tests on tree/branch mmotm/master git bisect bad f26843c1807bca843e87bfc9e9036ede686260f8 # 09:19 B 0 11 28 4 pci: test for unexpectedly disabled bridges # extra tests with first bad commit reverted git bisect good 203504d2048112f180c55b79f01a81b97c1d2c7a # 09:39 G 10 0 3 3 Revert "mm: access to uninitialized struct page" --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/lkp Intel Corporation