From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7183C433E0 for ; Wed, 27 Jan 2021 19:18:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3C07164DC5 for ; Wed, 27 Jan 2021 19:18:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C07164DC5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=semihalf.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A73B6B0005; Wed, 27 Jan 2021 14:18:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 457406B0006; Wed, 27 Jan 2021 14:18:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F9126B006E; Wed, 27 Jan 2021 14:18:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 054D46B0005 for ; Wed, 27 Jan 2021 14:18:48 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9D388363B for ; Wed, 27 Jan 2021 19:18:47 +0000 (UTC) X-FDA: 77752517094.18.dad08_1909ec727599 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 5B44B100ED0DB for ; Wed, 27 Jan 2021 19:18:47 +0000 (UTC) X-HE-Tag: dad08_1909ec727599 X-Filterd-Recvd-Size: 10046 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Wed, 27 Jan 2021 19:18:46 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id g15so1993971pjd.2 for ; Wed, 27 Jan 2021 11:18:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=p2S96QDDNUIXc9tL2w1AIuzEgTZ6BCZqglNvFX/RFaU=; b=gM2JI1c2/TWVkOiXiJkIGsaz85QWwAw3h/aChgK4ruvRoUelL7VZehXx1bbyLnCswe HyCz4B1674Q28kmojxAx24SNWMZtYrHAcxjWKSRGmk8ICkyhSTJDu1skXnyyx/rO2ciU vjvkf8aJAfeM2SZV2MuGhtAfXDSItAopjphPMl5lvWNMIFZI/qOiLZ+9afzh7pjQvFxE 67o+ylHuqfbi+xnBxnb1Q1zfKiEWYsgisdiu5L+w5ANJTvi6u3qUOTf6D8zpd1uwunwP HQh5AALW1P9ZCZAUZZ2jgWnVrrOR0SINhFWV4uF5Zoxb/sthJ1/h1q3mF7MpYCrU23Ad 0Xwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=p2S96QDDNUIXc9tL2w1AIuzEgTZ6BCZqglNvFX/RFaU=; b=ZvfnwsqYh9uQHyXzoDeLDFoDT62EvPc6rSRqE1+8lG1BVkZId2mwAq9wta2ZKAzmKf tGh5PJS+UzEQaI9EOvIKCowGkdt9GwJOFl/O5BChar8/nixVorVHEtuLKu0asgP6ZfmQ d+i572GdlpEL5JSjRResLEwSBWWllYY0kNBTOGI21bBG+W7VbNUE2cmS51LQLCia5EkU 8mwcEEYT0Mb7/q0VIvuebm6tHfY91PHJzuz0Zj/6Dsob3Kd0p1NcR1iLa3RpNaPdRS/l MBltJWjhkxV3hQRxlJPrR3ziIRXbbXmQZgV8Y16gaWj62MZDB6233xyLp4gnYlnEpzgy QnpA== X-Gm-Message-State: AOAM53155bgUlvAiXieUrMJ8/uY/2Npw1vhGLbexBNIW4McL/kzGCjun EgeLTUw6YuvmupDACvAVmZGQg9chzF12axmwmO6i4g== X-Google-Smtp-Source: ABdhPJzTVgkTUC/dG3jlKvLsUKl6+RxPD+Bh4SS0V6yYGwubtqdGapS22lHSRyHJb6BJDwtJX2kMYoOFRf7Fbv2t+zM= X-Received: by 2002:a17:90a:3b04:: with SMTP id d4mr7115398pjc.48.1611775125508; Wed, 27 Jan 2021 11:18:45 -0800 (PST) MIME-Version: 1.0 References: <20210127100454.GK196782@linux.ibm.com> <20210127111858.GA273567@linux.ibm.com> <20210127182651.GA281042@linux.ibm.com> In-Reply-To: <20210127182651.GA281042@linux.ibm.com> From: =?UTF-8?Q?=C5=81ukasz_Majczak?= Date: Wed, 27 Jan 2021 20:18:34 +0100 Message-ID: Subject: Re: PROBLEM: Crash after mm: fix initialization of struct page for holes in memory layout To: Mike Rapoport Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, =?UTF-8?Q?Rados=C5=82aw_Biernacki?= , Marcin Wojtas , Alex Levin , Guenter Roeck , Jesse Barnes , Chris Wilson , "Sarvela, Tomi P" , =?UTF-8?Q?=C5=81ukasz_Bartosik?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mike, Great ! it seems to work well - I have built a valila kernel v5.11-rc5 with your patch and it boots properly. Full log available here: https://gist.github.com/semihalf-majczak-lukasz/ea89bf52f6fad7907a18d1870e7= ce9bd Best regards, Lukasz =C5=9Br., 27 sty 2021 o 19:27 Mike Rapoport napisa=C5= =82(a): > > Hi Lukasz, > > On Wed, Jan 27, 2021 at 02:15:53PM +0100, =C5=81ukasz Majczak wrote: > > Hi Mike, > > > > I have started bisecting your patch and I have figured out that there > > might be something wrong with clamping - with comments out these lines > > it started to work. > > The full log (with logs from below patch) can be found here: > > https://gist.github.com/semihalf-majczak-lukasz/3cecbab0ddc59a6c3ce11dd= c29645725 > > it's fresh - I haven't analyze it yet, just sharing with hope it will h= elp. > > Thanks, that helps! > > The first page is never considered by the kernel as memory and so > arch_zone_lowest_possible_pfn[ZONE_DMA] is set to 0x1000. As the result, > init_unavailable_mem() skips pfn 0 and then __SetPageReserved(page) in > reserve_bootmem_region() panics because the struct page for pfn 0 remains > poisoned. > > Can you please try the below patch on top of v5.11-rc5? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 783913e41f65..3ce9ef238dfc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7083,10 +7083,11 @@ void __init free_area_init_memoryless_node(int ni= d) > static u64 __init init_unavailable_range(unsigned long spfn, unsigned lo= ng epfn, > int zone, int nid) > { > - unsigned long pfn, zone_spfn, zone_epfn; > + unsigned long pfn, zone_spfn =3D 0, zone_epfn; > u64 pgcnt =3D 0; > > - zone_spfn =3D arch_zone_lowest_possible_pfn[zone]; > + if (zone > 0) > + zone_spfn =3D arch_zone_highest_possible_pfn[zone - 1]; > zone_epfn =3D arch_zone_highest_possible_pfn[zone]; > > spfn =3D clamp(spfn, zone_spfn, zone_epfn); > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index eed54ce26ad1..9f4468c413a1 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -7093,9 +7093,11 @@ static u64 __init > > init_unavailable_range(unsigned long spfn, unsigned long epfn, > > zone_spfn =3D arch_zone_lowest_possible_pfn[zone]; > > zone_epfn =3D arch_zone_highest_possible_pfn[zone]; > > > > - spfn =3D clamp(spfn, zone_spfn, zone_epfn); > > - epfn =3D clamp(epfn, zone_spfn, zone_epfn); > > - > > + //spfn =3D clamp(spfn, zone_spfn, zone_epfn); > > + //epfn =3D clamp(epfn, zone_spfn, zone_epfn); > > + pr_info("LMA DBG: zone_spfn: %llx, zone_epfn %llx\n", > > zone_spfn, zone_epfn); > > + pr_info("LMA DBG: spfn: %llx, epfn %llx\n", spfn, epfn); > > + pr_info("LMA DBG: clamp_spfn: %llx, clamp_epfn %llx\n", > > clamp(spfn, zone_spfn, zone_epfn), clamp(epfn, zone_spfn, zone_epfn)); > > for (pfn =3D spfn; pfn < epfn; pfn++) { > > if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { > > pfn =3D ALIGN_DOWN(pfn, pageblock_nr_pages) > > > > Best regards, > > Lukasz > > > > > > =C5=9Br., 27 sty 2021 o 13:15 =C5=81ukasz Majczak na= pisa=C5=82(a): > > > > > > Unfortunately nothing :( my current kernel command line contains: > > > console=3DttyS0,115200n8 debug earlyprintk=3Dserial loglevel=3D7 > > > > > > I was thinking about using earlycon, but it seems to be blocked. > > > (I think the lack of earlycon might be related to Chromebook HW > > > security design. There is an EC controller which is a part of AP -> > > > serial chain as kernel messages are considered sensitive from a > > > security standpoint.) > > > > > > Best regards, > > > Lukasz > > > > > > =C5=9Br., 27 sty 2021 o 12:19 Mike Rapoport napi= sa=C5=82(a): > > > > > > > > On Wed, Jan 27, 2021 at 11:08:17AM +0100, =C5=81ukasz Majczak wrote= : > > > > > Hi Mike, > > > > > > > > > > Actually I have a serial console attached (via servo device), but > > > > > there is no output :( and also the reboot/crash is very fast/imme= diate > > > > > after power on. > > > > > > > > If you boot with earlyprintk=3Dserial are there any messages? > > > > > > > > > Best regards > > > > > Lukasz > > > > > > > > > > =C5=9Br., 27 sty 2021 o 11:05 Mike Rapoport = napisa=C5=82(a): > > > > > > > > > > > > Hi Lukasz, > > > > > > > > > > > > On Wed, Jan 27, 2021 at 10:22:29AM +0100, =C5=81ukasz Majczak w= rote: > > > > > > > Crash after mm: fix initialization of struct page for holes i= n memory layout > > > > > > > > > > > > > > Hi, > > > > > > > I was trying to run v5.11-rc5 on my Samsung Chromebook Pro (C= aroline), > > > > > > > but I've noticed it has crashed - unfortunately it seems to h= appen at > > > > > > > a very early stage - No output to the console nor to the scre= en, so I > > > > > > > have started a bisect (between 5.11-rc4 - which works just fi= nd - and > > > > > > > 5.11-rc5), > > > > > > > bisect results points to: > > > > > > > > > > > > > > d3921cb8be29 mm: fix initialization of struct page for holes = in memory layout > > > > > > > > > > > > > > Reproduction is just to build and load the kernel. > > > > > > > > > > > > > > If it will help any how I am attaching: > > > > > > > - /proc/cpuinfo (from healthy system): > > > > > > > https://gist.github.com/semihalf-majczak-lukasz/3517867bf39f0= 7377c1a785b64a97066 > > > > > > > - my .config file (for a broken system): > > > > > > > https://gist.github.com/semihalf-majczak-lukasz/584b329f1bf3e= 43b53efe8e18b5da33c > > > > > > > > > > > > > > If there is anything I could add/do/test to help fix this ple= ase let me know. > > > > > > > > > > > > Chris Wilson also reported boot failures on several Chromebooks= : > > > > > > > > > > > > https://lore.kernel.org/lkml/161160687463.28991.354987542182281= 928@build.alporthouse.com > > > > > > > > > > > > I presume serial console is not an option, so if you could boot= with > > > > > > earlyprintk=3Dvga and see if there is anything useful printed o= n the screen > > > > > > it would be really helpful. > > > > > > > > > > > > > Best regards > > > > > > > Lukasz > > > > > > > > > > > > -- > > > > > > Sincerely yours, > > > > > > Mike. > > > > > > > > -- > > > > Sincerely yours, > > > > Mike. > > -- > Sincerely yours, > Mike.