From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AC38C021A1 for ; Mon, 17 Feb 2025 10:34:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DFDA280032; Mon, 17 Feb 2025 05:34:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6909628002A; Mon, 17 Feb 2025 05:34:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 531EA280032; Mon, 17 Feb 2025 05:34:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 348AA28002A for ; Mon, 17 Feb 2025 05:34:33 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DB2DB142155 for ; Mon, 17 Feb 2025 10:34:32 +0000 (UTC) X-FDA: 83129077584.12.7D6E3A4 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by imf21.hostedemail.com (Postfix) with ESMTP id 62D3C1C0002 for ; Mon, 17 Feb 2025 10:34:30 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=gu9z0zao; spf=pass (imf21.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739788470; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zEF/t1yOI2x4Lz67WwMMmiTHnzEvQPEcnkfnDYR7dWk=; b=p05zXmDzrPUQh1QXvXi9mX9FFwqgBGRj2VaWbNMgOThlRQkAn2btxIwTzFaMpOQ2EJmq6T LkWBRUr+8W0v0Xj6I9eo5OlYbmfH6ooADtb4NGzC/fU2OvcCRZPycQba6KvH/PL89XWQnN W6/OH/y2r8hUggKW/krtbZyk8/sDNC8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=gu9z0zao; spf=pass (imf21.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739788470; a=rsa-sha256; cv=none; b=3QxXQvbOYxZInCG0YKTMH8BKa4j54Il2kVOaNjOTL81bkoGBe0ZPhtd18To1lwu4B+Ldgt uBtpaAlIMf0OHE70OrNkFylLbQ6tABuDM4/XhmgnefKPPBwKXAmFcaYsPdjmECNdeCINDM FNUvPCv2SMKThAGqBjFV/kKzeI3kqWY= Received: from pps.filterd (m0279867.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51H03RS2027346; Mon, 17 Feb 2025 10:34:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= zEF/t1yOI2x4Lz67WwMMmiTHnzEvQPEcnkfnDYR7dWk=; b=gu9z0zaoxsOYLrS5 2CvVceT25iEGKFC5fjfdDxDyjp8phH6Fwb1kAAvBn0LEl3j2OqorvVr0JtdxeHYl g7hzwlS8TJXpccP+O4enNvFTEpAx90IqkcvxqcCiPGCwcyArreJ6AzEU0rVy0nGn J87/Rs0Ku6bD25nKWUrm+9RF/M284BQjnTWxgLKN2Ms8vyzZvW+4kbJC3/JhEaXj 1dcpsv6nvZZ1O5SQAZc0hqHDvQvyMFNPax5fkA/0Felh1//l4bdJ2hRw0RD+6vDN OavnTJqJwtLoF3EoElPlSkuDfk8Mac318hwqXmlKxDKtB0ADzjoLxE3Av9k0hEq/ 0lSy7A== Received: from nalasppmta01.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 44ut7shce8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 17 Feb 2025 10:34:23 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA01.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 51HAYMuD002859 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 17 Feb 2025 10:34:22 GMT Received: from [10.239.132.245] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Mon, 17 Feb 2025 02:34:18 -0800 Message-ID: Date: Mon, 17 Feb 2025 18:34:16 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned To: David Hildenbrand , , CC: , , , , , , , , , , , , References: <20250217092907.3474806-1-quic_zhenhuah@quicinc.com> <8c1578ed-cfef-4fba-a334-ebf5eac26d60@redhat.com> Content-Language: en-US From: Zhenhua Huang In-Reply-To: <8c1578ed-cfef-4fba-a334-ebf5eac26d60@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: y2LJ5ii9FP1h9mkiK860PuGsQdhLpovy X-Proofpoint-ORIG-GUID: y2LJ5ii9FP1h9mkiK860PuGsQdhLpovy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-17_05,2025-02-13_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 clxscore=1015 priorityscore=1501 adultscore=0 spamscore=0 malwarescore=0 impostorscore=0 suspectscore=0 phishscore=0 lowpriorityscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502170093 X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 62D3C1C0002 X-Stat-Signature: ce1fen837yraeums9qq8xqo6jnrdmwhr X-HE-Tag: 1739788470-720911 X-HE-Meta: U2FsdGVkX19JUMmDW4H85XKk3ANeZinJKCQ9asNp5eVWNA8/UsuyHxicZStYonSXgVjpyO5KL2yuenQF8wzMODahqkD8x9RnoZGFy8Ssw202OZYbyr0inPBDcpw70z6dYjnzsAhHnXLGI5I2u52P0Ve8y0FoqMcr76VXeBzz7seGXN5kFXjCOWwKuvRa3U8GEIFrs0vOK4Qdu6Nt5Af9ux+u4yJnDk77eHYLXsUek6w/ITdUBRlKRTL+tdrI+ymM6WIzhS6VIDcmmSDkR7Qf1sqcMwJ72DIREE7OOQiTCoNIrBeNAg3N9ZQoWaRkJ7UguoVQtkhsJzdx8kDc7ZNCY8lX8h/wVl+lAonvLTdbY8Su4BXr11K1xk+FU2W+7h5Fw1+mfqb3okk8DcB05SYAGoxwAZw8ghhPB5BbME7bGGTAtJQ/+5mewm2rh13I9Oc4pugMJVhRdaRKfi+FTH5ZzUF+L/8T2isWdUslz8StJ9gN+1ri7bKkOyu11TYqWz7fYnM3LiAMnj67mHyfNHpD2X3Cxd+LgVj8N2LPQz3Xt41bL+9Ny1k/XydyPjkbXR9MyPHF1ifiUmVr+RGLfuNNUuURh7otdPPXjYPa0okhrUV4TY1sVRxOo4QyUBPnAAYcuzDJwSFA0sr3VR/5DoBwh3jKWIK2ZeXS/v/x2w0NtBgO+J5J15EyPmWzONLl5pfMY5uMGh72A/BnPnetYP9uUz6fYa1mLASn34MW7H3blpuQ4Ky65iQSzR6t4vRYeAsfeTICsz+D1HY+hxkwqpenVAByUjm5Rnok+NjLfUX/UkriEbQnK21slQGN0/y0Lx29DKqrvFTcw9gxLq7NhlIWAFO6Oc4q6ShK+fIcsVj/mA17dHsuhMZctNB9L32L+m3NOWdP/MrZ7/uqoMRVQb5kHz10ByCwIluhQk4pUnDrHAYXUe2kHX5W62hqjCvIFV2ZAAeCw0pn0YXU8KP6nvf AIHQnK1m UqWUT8Tw1DWZdjiUdR+kVlGeeocOTvmDWKP3mD4MnerTFisQ4eWKyBNMUoV4L2Sy9mo4nTwNRtPse+lChlctPDTme2KSeS5dXRaQ9iZg3dz8cnjJRxOcnuIXwJZVSR0xBWJcuwRl0peiMMitgupt3PmBkz695Qk7G4QHsbVhFO/aUEzXXNTHe3f1RGdH6bI27Jvc6LwwiHnyFIAL1pjzTEbOYFopODxPRgnMCEu6Wr+xd0HIDk1k0dHMhhi1JIchjqmNgaCl3Qt0OUpaEc63k9c0MhzfpLp8D857jzNAEIfxxE2GdleVE400LDXsW5iPoVXjGUaabPvHlSlbFqHP8bioaUkYZtWKcra2pCq+YCUyHE/cQDJj4omjhrnm0UHDyFYRWu0YE2oOUPuw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/17 17:44, David Hildenbrand wrote: > On 17.02.25 10:29, Zhenhua Huang wrote: >> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set >> to 27, making one section 128M. The related page struct which vmemmap >> points to is 2M then. >> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >> vmemmap to populate at the PMD section level which was suitable >> initially since hot plug granule is always one section(128M). However, >> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the >> existing arm64 assumptions. >> >> The first problem is that if start or end is not aligned to a section >> boundary, such as when a subsection is hot added, populating the entire >> section is wasteful. >> >> The Next problem is if we hotplug something that spans part of 128 MiB >> section (subsections, let's call it memblock1), and then hotplug >> something >> that spans another part of a 128 MiB section(subsections, let's call it >> memblock2), and subsequently unplug memblock1, vmemmap_free() will clear >> the entire PMD entry which also supports memblock2 even though memblock2 >> is still active. >> >> Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the >> fix similar to x86-64: populate to pages levels if start/end is not >> aligned >> with section boundary. >> >> Signed-off-by: Zhenhua Huang >> --- >>   arch/arm64/mm/mmu.c | 3 ++- >>   1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index b4df5bc5b1b8..eec1666da368 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long >> start, unsigned long end, int node, >>   { >>       WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); >> -    if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) >> +    if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || >> +        (end - start < PAGES_PER_SECTION * sizeof(struct page))) >>           return vmemmap_populate_basepages(start, end, node, altmap); >>       else >>           return vmemmap_populate_hugepages(start, end, node, altmap); > > Yes, this does mimic what x86 does. That handling does look weird, > because it > doesn't care about any address alignments, only about the size, which is > odd. > > I wonder if we could do better and move this handling > into vmemmap_populate_hugepages(), where we already have a fallback > to vmemmap_populate_basepages(). Hi David, I had the same doubt initially. After going through the codes, I noticed for vmemmap_populate(), the arguments "start" and "end" passed down should already be within one section. early path: for_each_present_section_nr __populate_section_memmap .. vmemmap_populate() hotplug path: __add_pages section_activate vmemmap_populate() Therefore.. focusing only on the size seems OK to me, and fall back solution below appears unnecessary? BTW, I have few more doubt about the original codes below, but they're not bugs, so I have not raised them. Please correct me if it's incorrect. > > Something like: > > One thing that confuses me is the "altmap" handling in x86-64 code: in > particular > why it is ignored in some cases. So that might need a bit of thought / > double-checking. > > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 01ea7c6df3036..57542313c0000 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -1546,10 +1546,10 @@ int __meminit vmemmap_populate(unsigned long > start, unsigned long end, int node, >         VM_BUG_ON(!PAGE_ALIGNED(start)); >         VM_BUG_ON(!PAGE_ALIGNED(end)); > > -       if (end - start < PAGES_PER_SECTION * sizeof(struct page)) > -               err = vmemmap_populate_basepages(start, end, node, NULL); > -       else if (boot_cpu_has(X86_FEATURE_PSE)) > +       if (boot_cpu_has(X86_FEATURE_PSE)) >                 err = vmemmap_populate_hugepages(start, end, node, > altmap); > +       else > +               err = vmemmap_populate_basepages(start, end, node, NULL); >         else if (altmap) { >                 pr_err_once("%s: no cpu support for altmap allocations\n", >                                 __func__); > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > index 3287ebadd167d..8b217265b25b1 100644 > --- a/mm/sparse-vmemmap.c > +++ b/mm/sparse-vmemmap.c > @@ -300,6 +300,10 @@ int __weak __meminit vmemmap_check_pmd(pmd_t *pmd, > int node, >         return 0; >  } > > +/* > + * Try to populate PMDs, but fallback to populating base pages when ranges > + * would only partially cover a PMD. > + */ >  int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned > long end, >                                          int node, struct vmem_altmap > *altmap) >  { > @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned > long start, unsigned long end, >         for (addr = start; addr < end; addr = next) { This for loop appears to be redundant for arm64 as well, as above mentioned, a single call to pmd_addr_end() should suffice. >                 next = pmd_addr_end(addr, end); > > +               if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next, > PMD_SIZE)) > +                       goto fallback; > + >                 pgd = vmemmap_pgd_populate(addr, node); >                 if (!pgd) >                         return -ENOMEM; > @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned > long start, unsigned long end, >                         } >                 } else if (vmemmap_check_pmd(pmd, node, addr, next)) >                         continue; > +fallback: >                 if (vmemmap_populate_basepages(addr, next, node, altmap)) >                         return -ENOMEM; It seems we have no chance to call populate_basepages here? >         } > >