From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C97D5C433DB for ; Fri, 12 Feb 2021 10:42:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 729CA64E56 for ; Fri, 12 Feb 2021 10:42:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 729CA64E56 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 04CEF8D0047; Fri, 12 Feb 2021 05:42:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F40D48D0015; Fri, 12 Feb 2021 05:42:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E56EC8D0047; Fri, 12 Feb 2021 05:42:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0024.hostedemail.com [216.40.44.24]) by kanga.kvack.org (Postfix) with ESMTP id CFFC78D0015 for ; Fri, 12 Feb 2021 05:42:27 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8DE118249980 for ; Fri, 12 Feb 2021 10:42:27 +0000 (UTC) X-FDA: 77809276734.06.8DEB21F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 7B88F40001DE for ; Fri, 12 Feb 2021 10:42:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613126546; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6ZntO9QgNpavvwp9VvBrnPMHVLypPqNlK4JlwHmPtXM=; b=NbtoAW605UJ3UOp4xIU03x33AV3t+jxflu1/TxJaMK8tjhYVnw3FSh+LbREgpwVfT3VaWc /HUBowR9SD2e5eS4rv0C4HFpwMHASfJj8vnUYAISTTf6KbvGvtf/3ul/JMHOI0o96e8bb7 H3P9cN2YZr77vbPnv2Sv6gH1KbwNEsQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-548-RTEL0pWVP2Gk_MdZynMuhQ-1; Fri, 12 Feb 2021 05:42:24 -0500 X-MC-Unique: RTEL0pWVP2Gk_MdZynMuhQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F37A3427C3; Fri, 12 Feb 2021 10:42:20 +0000 (UTC) Received: from [10.36.114.178] (ovpn-114-178.ams2.redhat.com [10.36.114.178]) by smtp.corp.redhat.com (Postfix) with ESMTP id C27637047A; Fri, 12 Feb 2021 10:42:16 +0000 (UTC) To: Michal Hocko , Mike Rapoport Cc: Andrew Morton , Andrea Arcangeli , Baoquan He , Borislav Petkov , Chris Wilson , "H. Peter Anvin" , Ingo Molnar , Linus Torvalds , =?UTF-8?Q?=c5=81ukasz_Majczak?= , Mel Gorman , Mike Rapoport , Qian Cai , "Sarvela, Tomi P" , Thomas Gleixner , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, x86@kernel.org References: <20210208110820.6269-1-rppt@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v5 1/1] mm: refactor initialization of struct page for holes in memory layout Message-ID: Date: Fri, 12 Feb 2021 11:42:15 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7B88F40001DE X-Stat-Signature: o8pyex81ttjdyro8qmpmxwmmo3f44fhf Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613126545-872440 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12.02.21 11:33, Michal Hocko wrote: > On Mon 08-02-21 13:08:20, Mike Rapoport wrote: >> From: Mike Rapoport >> >> There could be struct pages that are not backed by actual physical mem= ory. >> This can happen when the actual memory bank is not a multiple of >> SECTION_SIZE or when an architecture does not register memory holes >> reserved by the firmware as memblock.memory. >> >> Such pages are currently initialized using init_unavailable_mem() func= tion >> that iterates through PFNs in holes in memblock.memory and if there is= a >> struct page corresponding to a PFN, the fields of this page are set to >> default values and it is marked as Reserved. >> >> init_unavailable_mem() does not take into account zone and node the pa= ge >> belongs to and sets both zone and node links in struct page to zero. >=20 > IIUC the zone should be associated based on the pfn and architecture > constraines on zones. Node is then guessed based on the last existing > range, right? >=20 >> On a system that has firmware reserved holes in a zone above ZONE_DMA,= for >> instance in a configuration below: >> >> # grep -A1 E820 /proc/iomem >> 7a17b000-7a216fff : Unknown E820 type >> 7a217000-7bffffff : System RAM >=20 > I like the description here though. Thanks very useful. >=20 >> unset zone link in struct page will trigger >> >> VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); >=20 > I guess you mean set_pfnblock_flags_mask, right? Is this bug on really > needed? Maybe we just need to skip over reserved pages? >=20 >> because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone li= nk >> in struct page) in the same pageblock. >> >> Moreover, it is possible that the lowest node and zone start is not al= igned >> to the section boundarie, for example on x86: >> >> [ 0.078898] Zone ranges: >> [ 0.078899] DMA [mem 0x0000000000001000-0x0000000000ffffff] >> ... >> [ 0.078910] Early memory node ranges >> [ 0.078912] node 0: [mem 0x0000000000001000-0x000000000009cfff] >> [ 0.078913] node 0: [mem 0x0000000000100000-0x000000003fffffff] >> >> and thus with SPARSEMEM memory model the beginning of the memory map w= ill >> have struct pages that are not spanned by any node and zone. >> >> Update detection of node boundaries in get_pfn_range_for_nid() so that= the >> node range will be expanded to cover memory map section. Since zone sp= ans >> are derived from the node span, there always will be a zone that cover= s the >> part of the memory map with unavailable pages. >> >> Interleave initialization of the unavailable pages with the normal >> initialization of memory map, so that zone and node information will b= e >> properly set on struct pages that are not backed by the actual memory. >=20 > I have to digest this but my first impression is that this is more heav= y > weight than it needs to. Pfn walkers should normally obey node range at > least. The first pfn is usually excluded but I haven't seen real We've seen examples where this is not sufficient. Simple example: Have your physical memory end within a memory section. Easy via QEMU,=20 just do a "-m 4000M". The remaining part of the last section has=20 fake/wrong node/zone info. Hotplug memory. The node/zone gets resized such that PFN walkers might=20 stumble over it. The basic idea is to make sure that any initialized/"online" pfn belongs=20 to exactly one node/zone and that the node/zone spans that PFN. > problems with that. The VM_BUG_ON blowing up is really bad but as said > above we can simply make it less offensive in presence of reserved page= s > as those shouldn't reach that path AFAICS normally. Andrea tried tried working around if via PG_reserved pages and it=20 resulted in quite some ugly code. Andrea also noted that we cannot rely=20 on any random page walker to do the right think when it comes to messed=20 up node/zone info. --=20 Thanks, David / dhildenb