From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F110C021A9 for ; Tue, 18 Feb 2025 03:08:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39C2F2800BA; Mon, 17 Feb 2025 22:08:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 349CD2800B5; Mon, 17 Feb 2025 22:08:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 211BB2800BA; Mon, 17 Feb 2025 22:08:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 04BCD2800B5 for ; Mon, 17 Feb 2025 22:07:59 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A530E16174E for ; Tue, 18 Feb 2025 03:07:59 +0000 (UTC) X-FDA: 83131581078.19.A7770EA Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf06.hostedemail.com (Postfix) with ESMTP id 5B2A218000B for ; Tue, 18 Feb 2025 03:07:57 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=BqDYxcUh; spf=pass (imf06.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739848077; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=shYYV8RRo/e3jDenMP8rmX0RcYFCmW9iJwK2gN92jOQ=; b=03k76Jp+y7b4XVUs/rQ7XSiJZnor3PGGwhpoeOmJzVevK9fAZTpm1yhpqnDuZtirz7nIXl bDuj+mLayDYUMymb9S97lAtmh1ORtk4gdBcZ/g5Gvbz9oHEWwXPiHJ49rEz1YKn890QHfz yTSv2eP+wWZW3MNAJjOavh0ApWqLv4M= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=BqDYxcUh; spf=pass (imf06.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739848077; a=rsa-sha256; cv=none; b=C+lq+ViQhE/KlvJCPHNqo8B4hR/edDgWd9XCnXyodZhYdeJBTa4NowtEZsoSi10Mhjlxuw JZdLuW8mPJ6E7bAsU4ugdGhy/oC9w5F3R2rTg3Qc9VG/YhWWxc3rkoTr8+ecrJW/Md5tWX MZoD4SyQZ5YENH6Goz49IXwFKe61LwA= Received: from pps.filterd (m0279869.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51I2xMk8003648; Tue, 18 Feb 2025 03:07:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= shYYV8RRo/e3jDenMP8rmX0RcYFCmW9iJwK2gN92jOQ=; b=BqDYxcUhGRXXD0Ea iNLJZwzc/vK+5tSoUsJw1jF4DfvejElryRziGRwb/HlC6Q2C2qtWnIlIHy5oN0/z yxKoRuvYEX9bZv774AmVDsHwZR/qs7/3kpgiqdTEcDgZDUT1A7F5tz+6B1TiN+43 HWentZh54sTsUfUSymTYzshN+0ZFc62zyunLYX4Beht51ps52WJbQkGbQ7/BkMGp Vmhvggzz/66SAmBlWghnEK0yGMrS+qFzieRQIRIDuh+8vaOue13Ea+Hlj7YXymKD 148A2YROZLNTp4263eek7smu9ecsH9zAkGcER51hR+PSGn5wX97u6ccJ5vytkIBT pszhhg== Received: from nalasppmta01.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 44v3mg1wxt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 03:07:44 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA01.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 51I37eSp018444 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Feb 2025 03:07:40 GMT Received: from [10.239.132.245] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Mon, 17 Feb 2025 19:07:36 -0800 Message-ID: Date: Tue, 18 Feb 2025 11:07:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned To: David Hildenbrand , , CC: , , , , , , , , , , , , References: <20250217092907.3474806-1-quic_zhenhuah@quicinc.com> <8c1578ed-cfef-4fba-a334-ebf5eac26d60@redhat.com> <871c0dae-c419-4ac2-9472-6901aab90dcf@redhat.com> Content-Language: en-US From: Zhenhua Huang In-Reply-To: <871c0dae-c419-4ac2-9472-6901aab90dcf@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: HAMYsGH-0wqSvpR5iF_z7pyr5kB9jYSr X-Proofpoint-ORIG-GUID: HAMYsGH-0wqSvpR5iF_z7pyr5kB9jYSr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-17_09,2025-02-13_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 clxscore=1015 adultscore=0 priorityscore=1501 mlxscore=0 phishscore=0 spamscore=0 mlxlogscore=999 impostorscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502180021 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 5B2A218000B X-Stat-Signature: dtg1bc4u4z9hn98i5fi6zfgdffxcm941 X-HE-Tag: 1739848077-592540 X-HE-Meta: U2FsdGVkX1+36Jj7JIzPbuRN3ku8eKzZ7XBrSHfrc8Zu9L1xcBhdwwEPKJRUhovyfFN0I/RFcHIIWVbdjcSZFsKXknlWA1JbAaC6EiMtfEFJ8WGEP5EgFp4BBPjhQC87UOt419ctjsB/Pipkm3Tqyxjjge9PNb3s4e+fK0pfTHZ4B1lEBoom8mb4TjVoOCC6q+H7WRTZDA664hYiSpBUaZ2ONsMIVkUSadabir/JK5GyyGxqMS5xwdDbf4XFkAz/unTZIo5/MYaMFZrumAAdfsNDGCBKeLvEjeF6k8PmutdMTdU2Ioi3/ujDGiMvq55vlnWiHCI7AHHGERuPCoxgaQqvTemP931vn7eFzYwgIDhyKh0d3JGJoNzd55tUSIddMlO3eHcdA91lwIH/Sri27i94awv3oPvUkO3bfVVt2sISa0Jupo8uT3VsNFXBj/Nn3pYHXyOpx72EksLzSzxBVaI1wXHwtJ5BjXys2QfTi1KxfGM5giLCN0xlCgFLrFtb2rmqnc/q8aKQIkfRMmyHlUI0eckGTz5ivc1jQM12TN2SbZUxKun4rSkr+G+dmSa/DSXjiA1ISu/fWpSCiPmo4SIkznk7ceWBXd4I+2p6Gz/tV+5nKr8tWrbMwKIK24xTNAtSWnVNFcfRN4N665GiFDL3TBjDOx0hYXMQOxUm3wY+CSD68xW4J12xnom0e44s+GfqV10O8hT0w6A0b9j0DdXYCURRuyc45WbvHlApXpdtsjXfcIAeLq5vtYvO0Gg+o/EyzNs+HtOSqqY/EVDn+ySX5nb7rG2K994LxSvVTCHgt/l6DEv+bS5swENJ5KOzeBeuJRvsor+ICIAobBIZ4Ztp7oxBk3elF04/RJPU2NCKi/w9zei7XRfy1Xi0epZBBSfMlFePAL7Tcc5COwTx6R5hDGWxCGz38rCBgiO+AS8o0OE7HIRUFkYL9SJihh+QaDhizAVnArwPm3yRm1m 0tV0ZTZe w7xUfySl9MJk1sNb2IkKDgFcyqKDfdz6skSJhswhDENbcK7vmz2D8ZYiW6DEsNM8b43QyukylTfiAZRMqKGV7IVWk8GUNMnDocnK0Q4lLtBuwjkvTnYpbg1qqXSIkz7nJb22mVQKyRTKs/8zk5J9tfVaWkgt0J4rde3C2F1Y06UEolfeylrVeekIrkiDf2hRCr3Y8QPbJWnSkTHfOXLHH+WV/ZvISiNl+z9ugGcCUs0rtSokODxSR7YuEFoEehVGObhdx0FVjEo8ye1JXkRRJJDth8Ni+1QcTv2AZeSZJZV0AzXW4se+k1GP2cjrtWh402XME/zcgHXXVgvHxCFmc3YOGr5tGwq45bcO+c9N8Ah63TPHjINnLxbpqxTnHq5l/nZfG/RSOHyG85vnLHP4+5SVexrxOKEBVGiDwi7lZhMI7LPs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/17 22:30, David Hildenbrand wrote: > On 17.02.25 11:34, Zhenhua Huang wrote: >> >> >> On 2025/2/17 17:44, David Hildenbrand wrote: >>> On 17.02.25 10:29, Zhenhua Huang wrote: >>>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is >>>> set >>>> to 27, making one section 128M. The related page struct which vmemmap >>>> points to is 2M then. >>>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >>>> vmemmap to populate at the PMD section level which was suitable >>>> initially since hot plug granule is always one section(128M). However, >>>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the >>>> existing arm64 assumptions. >>>> >>>> The first problem is that if start or end is not aligned to a section >>>> boundary, such as when a subsection is hot added, populating the entire >>>> section is wasteful. >>>> >>>> The Next problem is if we hotplug something that spans part of 128 MiB >>>> section (subsections, let's call it memblock1), and then hotplug >>>> something >>>> that spans another part of a 128 MiB section(subsections, let's call it >>>> memblock2), and subsequently unplug memblock1, vmemmap_free() will >>>> clear >>>> the entire PMD entry which also supports memblock2 even though >>>> memblock2 >>>> is still active. >>>> >>>> Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the >>>> fix similar to x86-64: populate to pages levels if start/end is not >>>> aligned >>>> with section boundary. >>>> >>>> Signed-off-by: Zhenhua Huang >>>> --- >>>>    arch/arm64/mm/mmu.c | 3 ++- >>>>    1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >>>> index b4df5bc5b1b8..eec1666da368 100644 >>>> --- a/arch/arm64/mm/mmu.c >>>> +++ b/arch/arm64/mm/mmu.c >>>> @@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long >>>> start, unsigned long end, int node, >>>>    { >>>>        WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); >>>> -    if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) >>>> +    if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || >>>> +        (end - start < PAGES_PER_SECTION * sizeof(struct page))) >>>>            return vmemmap_populate_basepages(start, end, node, altmap); >>>>        else >>>>            return vmemmap_populate_hugepages(start, end, node, altmap); >>> >>> Yes, this does mimic what x86 does. That handling does look weird, >>> because it >>> doesn't care about any address alignments, only about the size, which is >>> odd. >>> >>> I wonder if we could do better and move this handling >>> into vmemmap_populate_hugepages(), where we already have a fallback >>> to vmemmap_populate_basepages(). >> >> Hi David, >> >> I had the same doubt initially. >> After going through the codes, I noticed for vmemmap_populate(), the >> arguments "start" and "end" passed down should already be within one >> section. >> early path: >> for_each_present_section_nr >>     __populate_section_memmap >>         .. >>         vmemmap_populate() >> >> hotplug path: >> __add_pages >>     section_activate >>         vmemmap_populate() >> >> Therefore.. focusing only on the size seems OK to me, and fall back >> solution below appears unnecessary? > > Ah, in that case it is fine. Might make sense to document/enforce that > somehow for the time being ... Shall I document and WARN_ON if the size exceeds? like: WARN_ON(end - start > PAGES_PER_SECTION * sizeof(struct page)) Since vmemmap_populate() is implemented per architecture, the change should apply for other architectures as well. However I have no setup to test it on... Therefore, May I implement it only for arm64 now ? Additionally, from previous discussion, the change is worth backporting(apologies for missing to CC stable kernel in this version). Keeping it for arm64 should simplify for backporting. WDYT? > > >>> +/* >>> + * Try to populate PMDs, but fallback to populating base pages when >>> ranges >>> + * would only partially cover a PMD. >>> + */ >>>    int __meminit vmemmap_populate_hugepages(unsigned long start, >>> unsigned >>> long end, >>>                                            int node, struct vmem_altmap >>> *altmap) >>>    { >>> @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned >>> long start, unsigned long end, >>>           for (addr = start; addr < end; addr = next) { >> >> This for loop appears to be redundant for arm64 as well, as above >> mentioned, a single call to pmd_addr_end() should suffice. > > Right, that was what was confusing me in the first place. > >> >>>                   next = pmd_addr_end(addr, end); >>> >>> +               if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next, >>> PMD_SIZE)) >>> +                       goto fallback; >>> + >>>                   pgd = vmemmap_pgd_populate(addr, node); >>>                   if (!pgd) >>>                           return -ENOMEM; >>> @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned >>> long start, unsigned long end, >>>                           } >>>                   } else if (vmemmap_check_pmd(pmd, node, addr, next)) >>>                           continue; >>> +fallback: >>>                   if (vmemmap_populate_basepages(addr, next, node, >>> altmap)) >>>                           return -ENOMEM; >> >> It seems we have no chance to call populate_basepages here? > > > Can you elaborate? It's invoked within vmemmap_populate_hugepages(), which is called by vmemmap_populate(). This implies that we are always performing a whole section hotplug? However, since it's common code used by other architectures like x86, RISC-V and LoongArch, it is still necessary to review the code for these architectures as well. At the very least, it's not a BUG :) > >