From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92BA8E77188 for ; Fri, 20 Dec 2024 18:30:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1204A6B008C; Fri, 20 Dec 2024 13:30:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D1386B0092; Fri, 20 Dec 2024 13:30:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F01506B0093; Fri, 20 Dec 2024 13:30:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D2B806B008C for ; Fri, 20 Dec 2024 13:30:07 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 937C5B010D for ; Fri, 20 Dec 2024 18:30:07 +0000 (UTC) X-FDA: 82916175510.13.60D020C Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf02.hostedemail.com (Postfix) with ESMTP id E504D80018 for ; Fri, 20 Dec 2024 18:28:59 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of cmarinas@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734719369; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Or7RhWZH9ux8BupbvWUzAXHn5vBgFkWtRwg9wCItkF4=; b=LQI4gNlanKjlzGJDjYBsAAg57TOv2QJrh4l+tNovrdi7Lj1GnUbbxCQSsNCRtrUZgaeZxE MEwaiB+WDArBnev/3Zaiu72/VGazktiQQjZwDXU1ya8QWocyESLvh/N26MPOCjhXursHXH Kmv5bm1LJA+LBatqFq1jJ7/PSZ9x2kA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of cmarinas@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734719369; a=rsa-sha256; cv=none; b=Ju63P3yKh17YSATjdceMKuEnd42I/r02AY5uO6dv4MOqiStuxpwhFYxVsfPyHfOvfna9GY Pl268djnAHJYOH4B5ibvkViJ6I6UYAzS0jBGAFO5nFxZlZxIjRUmNvW5v6ucdeyn0hMO6W 7KwgbCigUv6BT9KMz3xFmiuZG9fP6UU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id A0863A42156; Fri, 20 Dec 2024 18:28:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 620A9C4CECD; Fri, 20 Dec 2024 18:30:02 +0000 (UTC) Date: Fri, 20 Dec 2024 18:30:00 +0000 From: Catalin Marinas To: Zhenhua Huang Cc: will@kernel.org, ardb@kernel.org, ryan.roberts@arm.com, mark.rutland@arm.com, joey.gouly@arm.com, dave.hansen@linux.intel.com, akpm@linux-foundation.org, chenfeiyang@loongson.cn, chenhuacai@kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Anshuman Khandual Subject: Re: [PATCH v2 1/2] arm64: mm: vmemmap populate to page level if not section aligned Message-ID: References: <20241209094227.1529977-1-quic_zhenhuah@quicinc.com> <20241209094227.1529977-2-quic_zhenhuah@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241209094227.1529977-2-quic_zhenhuah@quicinc.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E504D80018 X-Rspam-User: X-Stat-Signature: 57r8y77qzs3oyhtjsxxot4okkmdqwod1 X-HE-Tag: 1734719339-636436 X-HE-Meta: U2FsdGVkX1/HhpseIubvz5dCKArC4sW9QiXEHxbHHvHNrKcL3nqYznwI1jk4FjgVXAPhBZX6D1NOLHDS8Jk3BaYD162ljTH0k6idPvhvwN/aThGkMAL0KFCQP3m8+X3NeDgifQe03Ef0qSAKhqzBhKaabhmmMBGKhi3Urr/VETi56aBFt4DYnzeFYc2YFLgv83qIbWYWGObVaVtnQ2lM6ieRHNwTgXox4IstZ+3gTXb3k1WU98YEFKC0B7chdzVpjtJJOLdjYOrGV87t1x4mavCaySZWUpbcN3G9bug9mKkp/Z9auDiifgkPnef5OzZF12yYp+x0qRdgPzLrkNYc0sLCwnRqnVsZ6TBA9hX0fvv0c/v4ORMEbW0/O+eQSugYqoQInz9AJHIJ7VfwJbB4dKTs5WITzjjiWqpO5vkAVHJME8zUcTq8uvCp2xilAmLubR2pKswkhlYUv5UXuC28t2Wu1oUq48PzWy8RaZwkQqXHtYwDC8AbmC80tpiUDytyCub/FrX741adgs2hukYpR9i6/T4mCkK0phjeW+Bmv2H3K3DAx5B78uRISC/YqB6xt0FSdKR5TO90kgeGczoxaXDHMQJzdGudqRaOjaoOF1MaNCwKUoSNyn0LgjHJ4IxqvNXEv5xYFEsPia7H4Vf5fMoh39vXIQUJmhf0XvwGYw13qOK4RMN1Gb3BMepoyv6u3RycWGkHRc9utcHZUeY5gbZXKkrKG8cfbilATGkeVCob+SJcUsllxRZNXbQVLMXbyCaZnq+6PDVsmnh10NvpwCSL9EgLr1QBJDqsm/hrGIriZl96gPcaErV705iyr28NtwfXrUW0ZTNiQpnaqkAbiU8rpZoWhAFcgrBlkAAuPvaxgY7a9gxzAvi/3jBe8K+9UN1J+De8rp+79l6k1lh7SCeKDraQZ911tiIGHprX5QuTUtl9SiHvawlSFQindjha9Y2baFOx08P50FyqVjK eT+XepI3 t+OXRZ79CGTvdnjTpdjiWKn3rtrID6S7hhEXT1KKvLNAbBu0tvk1PchnoIRXUdycIpKZfrt35sHyUNS5fbUpWwj1etYGX1qvvBgvnFtFzV4hwd6QJXuiYeGCXovH0nmmT+gBG4ixHAvTd5Txngpnt8BoUrZHO5B4F6+YpPdS+iUNPKkCislbMWERe0CKA71Wq+gYAzTFVranVjoyZhOBggrbQhvPovaMl7/RBHwWcAbnwVvcrqEpPEJc8ren4z9AVtFfSB8GDuHYgnTwwDPm6mx4jN9t8YbSgPhTQlTL5JzRP5xZW6axZ414gADPIEE7GsjXx7uqKzHj+UZERlKhOs85caXduGEtDuGQaf5RkQbhLrdiLRNvnehXKiaCGuB/1OMwGH2zWH4OVw8PQLekV2/prKg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote: > Commit c1cc1552616d ("arm64: MMU initialisation") > optimizes the vmemmap to populate at the PMD section level. However, if > start or end is not aligned to a section boundary, such as when a > subsection is hot added, populating the entire section is wasteful. For > instance, if only one subsection hot-added, the entire section's struct > page metadata will still be populated.In such cases, it is more effective > to populate at page granularity. OK, so from the vmemmap perspective, we waste up to 2MB memory that has been allocated even if a 2MB hot-plugged subsection required only 32KB of struct page. I don't mind this much really. I hope all those subsections are not scattered around to amplify this waste. > This change also addresses mismatch issues during vmemmap_free(): When > pmd_sect() is true, the entire PMD section is cleared, even if there is > other effective subsection. For example, pagemap1 and pagemap2 are part > of a single PMD entry and they are hot-added sequentially. Then pagemap1 > is removed, vmemmap_free() will clear the entire PMD entry, freeing the > struct page metadata for the whole section, even though pagemap2 is still > active. I think that's the bigger issue. We can't unplug a subsection only. Looking at unmap_hotplug_pmd_range(), it frees a 2MB vmemmap section but that may hold struct page for the equivalent of 128MB of memory. So any struct page accesses for the other subsections will fault. > Fixes: c1cc1552616d ("arm64: MMU initialisation") I wouldn't add a fix for the first commit adding arm64 support, we did not even have memory hotplug at the time (added later in 5.7 by commit bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"). That commit broke some arm64 assumptions. > Signed-off-by: Zhenhua Huang > --- > arch/arm64/mm/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index e2739b69e11b..fd59ee44960e 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > { > WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); > > - if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) > + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || > + !IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) || > + !IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION)) > return vmemmap_populate_basepages(start, end, node, altmap); > else > return vmemmap_populate_hugepages(start, end, node, altmap); An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid nuking the whole vmemmap pmd section if it's not empty. Not sure how easy that is, whether we have the necessary information (I haven't looked in detail). A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If that's possible, the problem isn't solved by this patch. -- Catalin