From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80A3FC433E0 for ; Thu, 28 Jan 2021 02:46:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1414764DD6 for ; Thu, 28 Jan 2021 02:46:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1414764DD6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4517C6B0070; Wed, 27 Jan 2021 21:46:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DD916B0072; Wed, 27 Jan 2021 21:46:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A5616B0073; Wed, 27 Jan 2021 21:46:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id 0AEC86B0070 for ; Wed, 27 Jan 2021 21:46:04 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B12EE8249980 for ; Thu, 28 Jan 2021 02:46:03 +0000 (UTC) X-FDA: 77753644206.27.bird87_20044832759c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 926EE3D669 for ; Thu, 28 Jan 2021 02:46:03 +0000 (UTC) X-HE-Tag: bird87_20044832759c X-Filterd-Recvd-Size: 9319 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 28 Jan 2021 02:46:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611801962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uPo1STRYTKfqs7mFA7ySYVGYtwah0YVMMd6XdvG+LT4=; b=Dj7FRl2tXCd189pUYUQVGBJXrrfIYn6LEFtfOrav5iw82QvGMX9rxYRNlZxq0/AGmYsbgW 3WAI+iudzuooN5hP6xvEWZa1uPoDju6u8Xs8HaL2lUnuvgYobeuNa614mlRkj9FbGsq5PG bT7qC9B9cDLOh4A4G9yBgTMlOlI8fLk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-590-TqP09w0eMA-KLnRjWLG-nw-1; Wed, 27 Jan 2021 21:45:55 -0500 X-MC-Unique: TqP09w0eMA-KLnRjWLG-nw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 29AB356BE6; Thu, 28 Jan 2021 02:45:53 +0000 (UTC) Received: from localhost (ovpn-12-59.pek2.redhat.com [10.72.12.59]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D309818A2F; Thu, 28 Jan 2021 02:45:51 +0000 (UTC) Date: Thu, 28 Jan 2021 10:45:49 +0800 From: Baoquan He To: Mike Rapoport Cc: =?utf-8?Q?=C5=81ukasz?= Majczak , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, =?utf-8?B?UmFkb3PFgmF3?= Biernacki , Marcin Wojtas , Alex Levin , Guenter Roeck , Jesse Barnes , Chris Wilson , "Sarvela, Tomi P" Subject: Re: PROBLEM: Crash after mm: fix initialization of struct page for holes in memory layout Message-ID: <20210128024549.GA3693@MiWiFi-R3L-srv> References: <20210127100454.GK196782@linux.ibm.com> <20210127111858.GA273567@linux.ibm.com> <20210127182651.GA281042@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210127182651.GA281042@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01/27/21 at 08:26pm, Mike Rapoport wrote: > Hi Lukasz, >=20 > On Wed, Jan 27, 2021 at 02:15:53PM +0100, =C5=81ukasz Majczak wrote: > > Hi Mike, > >=20 > > I have started bisecting your patch and I have figured out that there > > might be something wrong with clamping - with comments out these line= s > > it started to work. > > The full log (with logs from below patch) can be found here: > > https://gist.github.com/semihalf-majczak-lukasz/3cecbab0ddc59a6c3ce11= ddc29645725 > > it's fresh - I haven't analyze it yet, just sharing with hope it will= help. >=20 > Thanks, that helps! >=20 > The first page is never considered by the kernel as memory and so > arch_zone_lowest_possible_pfn[ZONE_DMA] is set to 0x1000. As the result= , > init_unavailable_mem() skips pfn 0 and then __SetPageReserved(page) in > reserve_bootmem_region() panics because the struct page for pfn 0 remai= ns > poisoned. It's a great finding and quick fix. Previously I tested my cleanup patches based on Mike's commit 9ebeee59af4cdd4d ("mm: fix initialization of struct page for holes in memory layout") on a hardware system, didn't meet this crash. But this crash seems to be a always reproduced issue, wondering why I didn't reproduce it. >=20 > Can you please try the below patch on top of v5.11-rc5? >=20 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 783913e41f65..3ce9ef238dfc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7083,10 +7083,11 @@ void __init free_area_init_memoryless_node(int = nid) > static u64 __init init_unavailable_range(unsigned long spfn, unsigned = long epfn, > int zone, int nid) > { > - unsigned long pfn, zone_spfn, zone_epfn; > + unsigned long pfn, zone_spfn =3D 0, zone_epfn; > u64 pgcnt =3D 0; > =20 > - zone_spfn =3D arch_zone_lowest_possible_pfn[zone]; > + if (zone > 0) > + zone_spfn =3D arch_zone_highest_possible_pfn[zone - 1]; > zone_epfn =3D arch_zone_highest_possible_pfn[zone]; > =20 > spfn =3D clamp(spfn, zone_spfn, zone_epfn); >=20 > =20 > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index eed54ce26ad1..9f4468c413a1 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -7093,9 +7093,11 @@ static u64 __init > > init_unavailable_range(unsigned long spfn, unsigned long epfn, > > zone_spfn =3D arch_zone_lowest_possible_pfn[zone]; > > zone_epfn =3D arch_zone_highest_possible_pfn[zone]; > >=20 > > - spfn =3D clamp(spfn, zone_spfn, zone_epfn); > > - epfn =3D clamp(epfn, zone_spfn, zone_epfn); > > - > > + //spfn =3D clamp(spfn, zone_spfn, zone_epfn); > > + //epfn =3D clamp(epfn, zone_spfn, zone_epfn); > > + pr_info("LMA DBG: zone_spfn: %llx, zone_epfn %llx\n", > > zone_spfn, zone_epfn); > > + pr_info("LMA DBG: spfn: %llx, epfn %llx\n", spfn, epfn); > > + pr_info("LMA DBG: clamp_spfn: %llx, clamp_epfn %llx\n", > > clamp(spfn, zone_spfn, zone_epfn), clamp(epfn, zone_spfn, zone_epfn))= ; > > for (pfn =3D spfn; pfn < epfn; pfn++) { > > if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) = { > > pfn =3D ALIGN_DOWN(pfn, pageblock_nr_pages) > >=20 > > Best regards, > > Lukasz > >=20 > >=20 > > =C5=9Br., 27 sty 2021 o 13:15 =C5=81ukasz Majczak = napisa=C5=82(a): > > > > > > Unfortunately nothing :( my current kernel command line contains: > > > console=3DttyS0,115200n8 debug earlyprintk=3Dserial loglevel=3D7 > > > > > > I was thinking about using earlycon, but it seems to be blocked. > > > (I think the lack of earlycon might be related to Chromebook HW > > > security design. There is an EC controller which is a part of AP -> > > > serial chain as kernel messages are considered sensitive from a > > > security standpoint.) > > > > > > Best regards, > > > Lukasz > > > > > > =C5=9Br., 27 sty 2021 o 12:19 Mike Rapoport na= pisa=C5=82(a): > > > > > > > > On Wed, Jan 27, 2021 at 11:08:17AM +0100, =C5=81ukasz Majczak wro= te: > > > > > Hi Mike, > > > > > > > > > > Actually I have a serial console attached (via servo device), b= ut > > > > > there is no output :( and also the reboot/crash is very fast/im= mediate > > > > > after power on. > > > > > > > > If you boot with earlyprintk=3Dserial are there any messages? > > > > > > > > > Best regards > > > > > Lukasz > > > > > > > > > > =C5=9Br., 27 sty 2021 o 11:05 Mike Rapoport napisa=C5=82(a): > > > > > > > > > > > > Hi Lukasz, > > > > > > > > > > > > On Wed, Jan 27, 2021 at 10:22:29AM +0100, =C5=81ukasz Majczak= wrote: > > > > > > > Crash after mm: fix initialization of struct page for holes= in memory layout > > > > > > > > > > > > > > Hi, > > > > > > > I was trying to run v5.11-rc5 on my Samsung Chromebook Pro = (Caroline), > > > > > > > but I've noticed it has crashed - unfortunately it seems to= happen at > > > > > > > a very early stage - No output to the console nor to the sc= reen, so I > > > > > > > have started a bisect (between 5.11-rc4 - which works just = find - and > > > > > > > 5.11-rc5), > > > > > > > bisect results points to: > > > > > > > > > > > > > > d3921cb8be29 mm: fix initialization of struct page for hole= s in memory layout > > > > > > > > > > > > > > Reproduction is just to build and load the kernel. > > > > > > > > > > > > > > If it will help any how I am attaching: > > > > > > > - /proc/cpuinfo (from healthy system): > > > > > > > https://gist.github.com/semihalf-majczak-lukasz/3517867bf39= f07377c1a785b64a97066 > > > > > > > - my .config file (for a broken system): > > > > > > > https://gist.github.com/semihalf-majczak-lukasz/584b329f1bf= 3e43b53efe8e18b5da33c > > > > > > > > > > > > > > If there is anything I could add/do/test to help fix this p= lease let me know. > > > > > > > > > > > > Chris Wilson also reported boot failures on several Chromeboo= ks: > > > > > > > > > > > > https://lore.kernel.org/lkml/161160687463.28991.3549875421822= 81928@build.alporthouse.com > > > > > > > > > > > > I presume serial console is not an option, so if you could bo= ot with > > > > > > earlyprintk=3Dvga and see if there is anything useful printed= on the screen > > > > > > it would be really helpful. > > > > > > > > > > > > > Best regards > > > > > > > Lukasz > > > > > > > > > > > > -- > > > > > > Sincerely yours, > > > > > > Mike. > > > > > > > > -- > > > > Sincerely yours, > > > > Mike. >=20 > --=20 > Sincerely yours, > Mike. >=20