From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57BDDC5519F for ; Wed, 25 Nov 2020 12:08:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 98110206F9 for ; Wed, 25 Nov 2020 12:08:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98110206F9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C23886B0075; Wed, 25 Nov 2020 07:08:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BAD896B007B; Wed, 25 Nov 2020 07:08:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC1D16B007D; Wed, 25 Nov 2020 07:08:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 930C96B0075 for ; Wed, 25 Nov 2020 07:08:57 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6066F3637 for ; Wed, 25 Nov 2020 12:08:57 +0000 (UTC) X-FDA: 77522819514.07.rule87_1310f6b27376 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 451E21803F9AE for ; Wed, 25 Nov 2020 12:08:57 +0000 (UTC) X-HE-Tag: rule87_1310f6b27376 X-Filterd-Recvd-Size: 4919 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Nov 2020 12:08:56 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6BD3BAF0B; Wed, 25 Nov 2020 12:08:55 +0000 (UTC) To: Andrea Arcangeli , David Hildenbrand Cc: Mel Gorman , Andrew Morton , linux-mm@kvack.org, Qian Cai , Michal Hocko , linux-kernel@vger.kernel.org, Mike Rapoport , Baoquan He References: <8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw> <20201121194506.13464-1-aarcange@redhat.com> <20201121194506.13464-2-aarcange@redhat.com> From: Vlastimil Babka Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Message-ID: <1c4c405b-52e0-cf6b-1f82-91a0a1e3dd53@suse.cz> Date: Wed, 25 Nov 2020 13:08:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/25/20 6:34 AM, Andrea Arcangeli wrote: > Hello, >=20 > On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote: >> On 11/21/20 8:45 PM, Andrea Arcangeli wrote: >> > A corollary issue was fixed in >> > 39639000-39814fff : Unknown E820 type >> >=20 >> > pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM: >> >=20 >> > 7a17b000-7a216fff : Unknown E820 type >>=20 >> It would be nice to also provide a /proc/zoneinfo and how exactly the=20 >> "zone_spans_pfn" was violated. I assume we end up below zone's=20 >> start_pfn, but is it true? >=20 > Agreed, I was about to grab that info along with all page struct > around the pfn 0x7a200 and phys address 0x7a216fff. >=20 > # grep -A1 E820 /proc/iomem > 7a17b000-7a216fff : Unknown E820 type > 7a217000-7bffffff : System RAM >=20 > DMA zone_start_pfn 1 zone_end_pfn() 4096 contig= uous 1 > DMA32 zone_start_pfn 4096 zone_end_pfn() 1048576 contig= uous 0 > Normal zone_start_pfn 1048576 zone_end_pfn() 4715392 contig= uous 1 > Movable zone_start_pfn 0 zone_end_pfn() 0 contig= uous 0 So the above means that around the "Unknown E820 type" we have: pfn 499712 - start of pageblock in ZONE_DMA32 pfn 500091 - start of the "Unknown E820 type" range pfn 500224 - start of another pageblock pfn 500246 - end of "Unknown E820 type" So this is indeed not a zone boundary issue, but basically a hole not=20 aligned to pageblock boundary and really unexpected. We have CONFIG_HOLES_IN_ZONE (that x86 doesn't set) for architectures=20 that do this, and even that config only affects pfn_valid_within(). But=20 here pfn_valid() is true, but the zone/node linkage is unexpected. > However the real bug seems that reserved pages have a zero zone_id in > the page->flags when it should have the real zone id/nid. The patch I > sent earlier to validate highest would only be needed to deal with > pfn_valid. >=20 > Something must have changed more recently than v5.1 that caused the > zoneid of reserved pages to be wrong, a possible candidate for the > real would be this change below: >=20 > + __init_single_page(pfn_to_page(pfn), pfn, 0, 0); >=20 > Even if it may not be it, at the light of how the reserved page > zoneid/nid initialized went wrong, the above line like it's too flakey > to stay. >=20 > It'd be preferable if the pfn_valid fails and the > pfn_to_section_nr(pfn) returns an invalid section for the intermediate > step. Even better memset 0xff over the whole page struct until the > second stage comes around. >=20 > Whenever pfn_valid is true, it's better that the zoneid/nid is correct > all times, otherwise if the second stage fails we end up in a bug with > weird side effects. Yeah I guess it would be simpler if zoneid/nid was correct for=20 pfn_valid() pfns within a zone's range, even if they are reserved due=20 not not being really usable memory. I don't think we want to introduce CONFIG_HOLES_IN_ZONE to x86. If the=20 chosen solution is to make this to a real hole, the hole should be=20 extended to MAX_ORDER_NR_PAGES aligned boundaries. In any case, compaction code can't fix this with better range checks. > Maybe it's not the above that left a zero zoneid though, I haven't > tried to bisect it yet to look how the page->flags looked like on a > older kernel that didn't seem to reproduce this crash, I'm just > guessing. >=20 > Thanks, > Andrea >=20