From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15893C433E0 for ; Thu, 11 Mar 2021 04:29:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AC3E264FA7 for ; Thu, 11 Mar 2021 04:29:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC3E264FA7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 405648D026D; Wed, 10 Mar 2021 23:29:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39F2C8D0250; Wed, 10 Mar 2021 23:29:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 240628D026D; Wed, 10 Mar 2021 23:29:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 095018D0250 for ; Wed, 10 Mar 2021 23:29:18 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B9B6712F1 for ; Thu, 11 Mar 2021 04:29:17 +0000 (UTC) X-FDA: 77906313954.24.1E3B426 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf13.hostedemail.com (Postfix) with ESMTP id 0E6CAE0011C5 for ; Thu, 11 Mar 2021 04:29:14 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 08B5631B; Wed, 10 Mar 2021 20:29:16 -0800 (PST) Received: from [10.163.66.3] (unknown [10.163.66.3]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C9CDD3F793; Wed, 10 Mar 2021 20:29:12 -0800 (PST) Subject: Re: [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes To: David Hildenbrand , linux-mm@kvack.org Cc: Russell King , Catalin Marinas , Will Deacon , Andrew Morton , Mike Rapoport , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <1615174073-10520-1-git-send-email-anshuman.khandual@arm.com> <745496f5-e099-8780-e42e-f347b55e8476@redhat.com> From: Anshuman Khandual Message-ID: <72902ace-5f00-b484-aa71-e6841fb7d082@arm.com> Date: Thu, 11 Mar 2021 09:59:49 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <745496f5-e099-8780-e42e-f347b55e8476@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0E6CAE0011C5 X-Stat-Signature: fp9hp8w6ayg6ameoi7tfy7tdc5pc5rhd Received-SPF: none (arm.com>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=foss.arm.com; client-ip=217.140.110.172 X-HE-DKIM-Result: none/none X-HE-Tag: 1615436954-987162 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/8/21 2:07 PM, David Hildenbrand wrote: > On 08.03.21 04:27, Anshuman Khandual wrote: >> Platforms like arm and arm64 have redefined pfn_valid() because their = early >> memory sections might have contained memmap holes caused by memblock a= reas >> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a= pfn >> for struct page backing. This scenario could be captured with a new op= tion >> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() ca= n be >> improved to accommodate such platforms. This reduces overall code foot= print >> and also improves maintainability. >> >> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic= mm") >> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), whic= h in >> turn had expanded its scope to new platforms like arc and m68k. Rather= lets >> restrict back the scope for free_unused_memmap() to arm and arm64 plat= forms >> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP. >> >> While here, it exports the symbol memblock_is_map_memory() to build dr= ivers >> that depend on pfn_valid() but does not have the required visibility. = After >> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID from= both >> arm and arm64 platforms. >> >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon >> Cc: Andrew Morton >> Cc: Mike Rapoport >> Cc: David Hildenbrand >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linux-kernel@vger.kernel.org >> Cc: linux-mm@kvack.org >> Suggested-by: David Hildenbrand >> Signed-off-by: Anshuman Khandual >> --- >> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1] = and >> has been lightly tested on the arm64 platform. The idea to represent t= his >> unique situation on the arm and arm64 platforms with a config option w= as >> proposed by David H during an earlier discussion [2]. This still does = not >> build on arm platform due to pfn_valid() resolution errors. Nonetheles= s >> wanted to get some early feedback whether the overall approach here, i= s >> acceptable or not. >=20 > It might make sense to keep the arm variant for now. The arm64 variant = is where the magic happens and where we missed updates when working on th= e generic variant. Sure, will drop the changes on arm. >=20 > The generic variant really only applies to 64bit targets where we have = SPARSEMEM. See x86 as an example. Okay. >=20 > [...] >=20 >> =C2=A0 /* >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 47946cec7584..93532994113f 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct mem_= section *ms, unsigned long pfn) >> =C2=A0 } >> =C2=A0 #endif >> =C2=A0 +bool memblock_is_map_memory(phys_addr_t addr); >> + >> =C2=A0 #ifndef CONFIG_HAVE_ARCH_PFN_VALID >> =C2=A0 static inline int pfn_valid(unsigned long pfn) >> +{ >> +=C2=A0=C2=A0=C2=A0 phys_addr_t addr =3D PFN_PHYS(pfn); >> + >> +=C2=A0=C2=A0=C2=A0 /* >> +=C2=A0=C2=A0=C2=A0=C2=A0 * Ensure the upper PAGE_SHIFT bits are clear= in the >> +=C2=A0=C2=A0=C2=A0=C2=A0 * pfn. Else it might lead to false positives= when >> +=C2=A0=C2=A0=C2=A0=C2=A0 * some of the upper bits are set, but the lo= wer bits >> +=C2=A0=C2=A0=C2=A0=C2=A0 * match a valid pfn. >> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >> +=C2=A0=C2=A0=C2=A0 if (PHYS_PFN(addr) !=3D pfn) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >=20 > I think this should be fine for other archs as well. >=20 >> + >> +#ifdef CONFIG_SPARSEMEM >=20 > Why do we need the ifdef now? If that's to cover the arm case, then ple= ase consider the arm64 case only for now. Yes, it is not needed. >=20 >> =C2=A0 { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct mem_section *ms; >> =C2=A0 @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned lon= g pfn) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Traditionally early sections al= ways returned pfn_valid() for >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * the entire section-sized span. >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >> -=C2=A0=C2=A0=C2=A0 return early_section(ms) || pfn_section_valid(ms, = pfn); >> +=C2=A0=C2=A0=C2=A0 if (early_section(ms)) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return IS_ENABLED(CONFIG_H= AVE_EARLY_SECTION_MEMMAP_HOLES) ? >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 me= mblock_is_map_memory(pfn << PAGE_SHIFT) : 1; >> + >> +=C2=A0=C2=A0=C2=A0 return pfn_section_valid(ms, pfn); >> +} >> +#endif >> +=C2=A0=C2=A0=C2=A0 return 1; >> =C2=A0 } >> =C2=A0 #endif >> =C2=A0 diff --git a/mm/Kconfig b/mm/Kconfig >> index 24c045b24b95..0ec20f661b3f 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP >> =C2=A0 config ARCH_KEEP_MEMBLOCK >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bool >> =C2=A0 +config HAVE_EARLY_SECTION_MEMMAP_HOLES >> +=C2=A0=C2=A0=C2=A0 depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMAP >> +=C2=A0=C2=A0=C2=A0 def_bool n >> +=C2=A0=C2=A0=C2=A0 help >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Early sections on certain platforms mi= ght have portions which are >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 not backed with struct page mapping as= their memblock entries are >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 marked with MEMBLOCK_NOMAP. When subsc= ribed, this option enables >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 specific handling for those memory sec= tions in certain situations >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 such as pfn_valid(). >> + >> =C2=A0 # Keep arch NUMA mapping infrastructure post-init. >> =C2=A0 config NUMA_KEEP_MEMINFO >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bool >> diff --git a/mm/memblock.c b/mm/memblock.c >> index afaefa8fc6ab..d9fa2e62ab7a 100644 >> --- a/mm/memblock.c >> +++ b/mm/memblock.c >> @@ -1744,6 +1744,7 @@ bool __init_memblock memblock_is_map_memory(phys= _addr_t addr) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return false; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return !memblock_is_nomap(&memblock.mem= ory.regions[i]); >> =C2=A0 } >> +EXPORT_SYMBOL(memblock_is_map_memory); >> =C2=A0 =C2=A0 int __init_memblock memblock_search_pfn_nid(unsigned lon= g pfn, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 unsigned long *start_pfn, unsigned long *end_pfn) >> @@ -1926,7 +1927,7 @@ static void __init free_unused_memmap(void) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long start, end, prev_end =3D = 0; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int i; >> =C2=A0 -=C2=A0=C2=A0=C2=A0 if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID)= || >> +=C2=A0=C2=A0=C2=A0 if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_H= OLES) || >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CONF= IG_SPARSEMEM_VMEMMAP)) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return; >> =C2=A0 >=20 > With >=20 > commit 1f90a3477df3ff1a91e064af554cdc887c8f9e5e > Author: Dan Williams > Date:=C2=A0=C2=A0 Thu Feb 25 17:17:05 2021 -0800 >=20 > =C2=A0=C2=A0=C2=A0 mm: teach pfn_to_online_page() about ZONE_DEVICE sec= tion collisions >=20 > (still in -next I think) Already has merged mainline. >=20 > You'll also have to take care of pfn_to_online_page(). >=20 Something like this would work ? diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5ba51a8bdaeb..19812deb807f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -309,6 +309,11 @@ struct page *pfn_to_online_page(unsigned long pfn) * Save some code text when online_section() + * pfn_section_valid() are sufficient. */ + if (IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES)) { + if (early_section(ms) && !memblock_is_map_memory(PFN_PHYS= (pfn))) + return NULL; + } + if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn)) return NULL;