From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f199.google.com (mail-io0-f199.google.com [209.85.223.199]) by kanga.kvack.org (Postfix) with ESMTP id CDA056B0003 for ; Mon, 16 Jul 2018 22:59:05 -0400 (EDT) Received: by mail-io0-f199.google.com with SMTP id u23-v6so35596015iol.22 for ; Mon, 16 Jul 2018 19:59:05 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id n130-v6sor14507ita.97.2018.07.16.19.59.04 for (Google Transport Security); Mon, 16 Jul 2018 19:59:04 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20180713164804.fc2c27ccbac4c02ca2c8b984@linux-foundation.org> From: Ard Biesheuvel Date: Tue, 17 Jul 2018 10:59:03 +0800 Message-ID: Subject: Re: Instability in current -git tree Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Andrew Morton , Thomas Gleixner , Matthew Wilcox , Ingo Molnar , Jens Axboe , Greg Kroah-Hartman , David Miller , Al Viro , Dave Airlie , Tejun Heo , Ted Ts'o , Mike Snitzer , linux-mm , Daniel Vacek , Pavel Tatashin , Mel Gorman On 14 July 2018 at 08:20, Linus Torvalds wrote: > On Fri, Jul 13, 2018 at 4:51 PM Linus Torvalds > wrote: >> >> I'm building a "replace VM_BUG_ON() with proper printk's instead" right now. > > Ok, the machine now stays up, and I get messages like > > Removed VM_BUG_ON()! > pfn c2400 - c25ff > zone DMA32 DMA > zone pfn 1000 1 > > Removed VM_BUG_ON()! > pfn c0a00 - c0bff > zone DMA32 DMA > zone pfn 1000 1 > > Removed VM_BUG_ON()! > pfn c2200 - c23ff > zone DMA DMA32 > zone pfn 1 1000 > > instead. > > That's from > > + printk("Removed VM_BUG_ON()!\n"); > + printk(" pfn %lx - %lx\n", page_to_pfn(start_page), > page_to_pfn(end_page)); > + printk(" zone %s %s\n", page_zone(start_page)->name, > page_zone(end_page)->name); > + printk(" zone pfn %lx %lx\n", > page_zone(start_page)->zone_start_pfn, > page_zone(end_page)->zone_start_pfn); > > inside an if() statement that replaced that VM_BUG_ON(). > > WTF? That's just odd. > > But everything seems to work fine, and now it doesn't crash. > > But there's something really odd going on wrt page_zone() and/or page_to_pfn(). > > page_to_pfn() implies this is just regular memory in the 3GB area. It > is likely related to this: > > BIOS-e820: [mem 0x00000000c0b33000-0x00000000c226cfff] reserved > BIOS-e820: [mem 0x00000000c226d000-0x00000000c227efff] ACPI data > BIOS-e820: [mem 0x00000000c227f000-0x00000000c2439fff] usable > BIOS-e820: [mem 0x00000000c243a000-0x00000000c2a61fff] ACPI NVS > BIOS-e820: [mem 0x00000000c2a62000-0x00000000c32fefff] reserved > BIOS-e820: [mem 0x00000000c32ff000-0x00000000c32fffff] usable > BIOS-e820: [mem 0x00000000c3300000-0x00000000c7ffffff] reserved > > I dunno. It's a bit odd. I'm not sure I understand that VM_BUG_ON(). > Adding Ard (who worked on the memblock_next_valid_pfn() thing not that > long ago) and must have hit this same BUG_ON() because he modified it > not that long ago. > > Ard, I triggered the VM_BUG_ON() in mm/page_alloc.c:2016, with a call trace opf > > RIP: move_pfreepages_block() > Call Trace: > steal_suitable_fallback > get_page_from_freelist > ... > > just for some context. > Pavel's fix for this issue in commit e181ae0c5db9 is causing boot problems on i686 for me. Is anyone else seeing the same? I get no output whatsoever when booting a i386_defconfig kernel under qemu/kvm (without EFI)