From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5768BC02198 for ; Fri, 14 Feb 2025 09:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E44C36B0096; Fri, 14 Feb 2025 04:47:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DF3786B0098; Fri, 14 Feb 2025 04:47:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE266280001; Fri, 14 Feb 2025 04:47:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B0C026B0096 for ; Fri, 14 Feb 2025 04:47:02 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6AD9B1222D4 for ; Fri, 14 Feb 2025 09:47:02 +0000 (UTC) X-FDA: 83118071484.29.5B87EAA Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf24.hostedemail.com (Postfix) with ESMTP id 1E8BA18000F for ; Fri, 14 Feb 2025 09:46:59 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="ZS5/K7f3"; spf=pass (imf24.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739526420; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wt/iQvY38SLBB+vliNgUL2Hkrb2gXuog8nLwR89H8Dk=; b=F1xAVjSi2tWVq1hs7lu1EVglNSvtvzWfGdPC+oaVgAPB8bv1nPbljFqpXrywFq/zDCz0Gf bZFMF6h/1XFXPvnT8eSUP8g7ITl/bKwBHJyc+Y0xLvwSUSQzD/lVW1zxMBZRtpv6VtG+EX BfqQOYwVObvAM1dyGQN8/OlTi0lIrXw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="ZS5/K7f3"; spf=pass (imf24.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739526420; a=rsa-sha256; cv=none; b=w7ESEXKM5X+TopuDGmsxoBrcS0ouEPPil1km/AtQXyqz0qsqASRA21Iv8FbhOJC9dt5zIO CiSB6F8Odj6VlAszWAFJ4EA+W7nLGeIUMFV9/1Vwgm4/Qi7nK5v/3PH0Zz33CYJPwbtZr3 RquKSCLDXEOer6/fnMk7qWGCrhsG2jE= Received: from pps.filterd (m0279873.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51DLglXM001230; Fri, 14 Feb 2025 09:46:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= wt/iQvY38SLBB+vliNgUL2Hkrb2gXuog8nLwR89H8Dk=; b=ZS5/K7f3DQyKDTEt fdUCHOK6l8sGrYtcR8xaKtjIzkyizQpGkAYLRwwwuH4uN2DHV1lFPKhVv2DKFVSs CkJ81DwTgi3wAAHoKb2F2wM1q3ylQheDCIJ+ZH/iU7u9eLWwsMaFqB6TGw1tfShF JuOE55Rin6zz5qK92o9Cjec/uIeKivZkREJYguchI8W9m4p4vswuvqb507oY27JU nhJKRt7F+MZp65U+8rBGYCKwrDbUYssjKzGqC9rDL0L+q0Ogc6J8iJvxF6w8PGH/ 3zJp6uAqNBBd27uCmJTRFL+gTI/Cb+YL/qCqa0zQYgKHThhhzB1XecKD6IJvgHYY DQ1s7w== Received: from nalasppmta03.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 44seq0342c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 14 Feb 2025 09:46:53 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA03.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 51E9kqRU027585 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 14 Feb 2025 09:46:52 GMT Received: from [10.239.132.245] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Fri, 14 Feb 2025 01:46:48 -0800 Message-ID: Date: Fri, 14 Feb 2025 17:46:45 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6] arm64: mm: Populate vmemmap/linear at the page level for hotplugged sections To: David Hildenbrand , Catalin Marinas CC: , , , , , , , , , , , , , , References: <20250213075703.1270713-1-quic_zhenhuah@quicinc.com> <9bc91fe3-c590-48e2-b29f-736d0b056c34@redhat.com> Content-Language: en-US From: Zhenhua Huang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: 1xmDfeMpkahF76XFwUCOazOxk1QdiY81 X-Proofpoint-ORIG-GUID: 1xmDfeMpkahF76XFwUCOazOxk1QdiY81 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-14_04,2025-02-13_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 impostorscore=0 bulkscore=0 clxscore=1011 priorityscore=1501 adultscore=0 mlxscore=0 malwarescore=0 phishscore=0 spamscore=0 lowpriorityscore=0 mlxlogscore=699 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502140070 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 1E8BA18000F X-Stat-Signature: otkdi7dseoh56zenrgygat5gaetjpmx9 X-HE-Tag: 1739526419-330672 X-HE-Meta: U2FsdGVkX1/A/Y5aQmlNzDSro5S3rM9WT7KMbjIzQJjgOuvNWaQSL8/uw53cdCnnV7nZioJ/IisRcSb56xrr1y5OofJIOvYJnO0D39WqIKaqsOXtLEvkSEmIrHFZyb4lLJh5sFLWhBXv7z1H0XN4o9WzVGeUggTy5UYyiQ1glW3LP52neO7Nxi5NK/uhCFeloQqbWvTlejsDIVB3ZmkxRrmD/RCAs6ONG2oVqwlsIqlSF3VCvglx7lhMQ8aj4up+lhpIDLLEyf1rQxDpN+8ySZ/YW8bpeaRzaiKmSj/JvyhMHmq6ZBsZaeV0R1XdPDhUeOb1RGJTimgZDzQqfavKe5o1YQqEYucdsyNi3eYC07HPXMEwrSF4b1z1NY1behdDIVIfmfUVKvzdOIrYel6z7khB5wP2OKWwWl7C0x9Pw6Qr8UI0OmIK3OVuIyRZYuA+dGVh/riYrlVNabBZVIR3Uex9To4entPQUgSCLSMqh6MyAwPkVceHrJvFR7qgBUzfQsWlgl8mUrWABu8lT2QxhxofhuWXohGlUnViurtvCbeR5HTBYVHaie8g85+ELhcrTeDI3VwrDljlMwyM5OTF+TWoicPWcag9zxGbxQy3QYw1xczBEW5C7PBzzoglYjJhPAhIcf3dDmNaxCmB52dPlELBZTm3vc92Tujznv1Ilail/3m6JBguyPB/U1SJU8XoiA3ddCKLJs1yE8n+w7/TkfGSjlOZuWFy4igKBlOaRj0h1B5vu17coLDTouNDlH0yeyQobP0ZJmmTmUhd77QbgU7VZ/ipdNrY2wsnnepTdI8d4xqKGTnHYcE+p6yO0sqSced+USu6W3nSqaTC4elOpdiGmtj2hY1B8XpyfERWz7CzImk43x455LHBLRmAYqxV1uMJD8xq3j6S8zgpPGRjMYrA+KENBoeNM/XeQX6Hr5sYcBw3QSwCxTm5pixyLJvBaHF6SK754pXimX6g9xI qIWkolZD 6gRhQFp26pdV8MJlvBQVRDE3rxGtKal/txsvyn/gXACHq4nwomFkvwLBp8EL1MgzP7rgOzi9J7o/O3y0JO8OGTFgKVZ2L8Lc8qkUTnlB1E5eu2gD7BMPPfeZ4pCZuFkE52H9mLlqv1mlshZLZ0qCPcDVJm1jjrFDVhCyBGdjCf0Xv6inBDEOJg47h/es1wdauLeCulFMaNL9rEbocwd7FRvaRuU/LxlYgyL/VnWj5fqvC7AlgL4nt7mV0zDlU8ZjC0AEPYEhpdKSZZ1TykJvlcIou2TPwzO9ixV3/WPIT7rP9EYqGmQdZP76JV8rMsGs+AxuaMCCjZ0cPvMDEwsWYoQZeJBUSjtfxdO8MRAboWcUkjAmBEN5SHxTbe6TbDgVRFeUpRoreS6xeyMTLrybMNQu+EfERwzVxmV8IyT3hiXvlgobo+ay2/orBk+2L89xC2CoWbwLsmN6DUDRQPlqQd/C4jmxFGN17RyCi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/14 2:20, David Hildenbrand wrote: > On 13.02.25 18:56, Catalin Marinas wrote: >> On Thu, Feb 13, 2025 at 05:16:37PM +0100, David Hildenbrand wrote: >>> On 13.02.25 16:49, Catalin Marinas wrote: >>>> On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote: >>>>> On 13.02.25 08:57, Zhenhua Huang wrote: >>>>>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS >>>>>> is set >>>>>> to 27, making one section 128M. The related page struct which vmemmap >>>>>> points to is 2M then. >>>>>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >>>>>> vmemmap to populate at the PMD section level which was suitable >>>>>> initially since hot plug granule is always one section(128M). >>>>>> However, >>>>>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>>>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted >>>>>> the >>>>>> existing arm64 assumptions. >>>>>> >>>>>> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when >>>>>> pmd_sect() is true, the entire PMD section is cleared, even if >>>>>> there is >>>>>> other effective subsection. For example page_struct_map1 and >>>>>> page_strcut_map2 are part of a single PMD entry and they are hot- >>>>>> added >>>>>> sequentially. Then page_struct_map1 is removed, vmemmap_free() >>>>>> will clear >>>>>> the entire PMD entry freeing the struct page map for the whole >>>>>> section, >>>>>> even though page_struct_map2 is still active. Similar problem exists >>>>>> with linear mapping as well, for 16K base page(PMD size = 32M) or 64K >>>>>> base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. >>>>>> Tearing down the entire PMD mapping too will leave other subsections >>>>>> unmapped in the linear mapping. >>>>>> >>>>>> To address the issue, we need to prevent PMD/PUD/CONT mappings for >>>>>> both >>>>>> linear and vmemmap for non-boot sections if corresponding size on the >>>>>> given base page exceeds SUBSECTION_SIZE(2MB now). >>>>>> >>>>>> Cc: # v5.4+ >>>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>>>> Reviewed-by: Catalin Marinas >>>>>> Signed-off-by: Zhenhua Huang >>>>> >>>>> Just so I understand correctly: for ordinary memory-sections-size >>>>> hotplug >>>>> (NVDIMM, virtio-mem), we still get a large mapping where possible? >>>> >>>> Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The >>>> vmemmap mapping is also limited to PAGE_SIZE mappings (we could use >>>> contiguous mappings for vmemmap but it's not wired up; I don't think >>>> it's worth the hassle). >>> >>> But that's messed up, no? >>> >>> If someone hotplugs a memory section, they have to hotunplug a memory >>> section, not parts of it. >>> >>> That's why x86 does in vmemmap_populate(): >>> >>> if (end - start < PAGES_PER_SECTION * sizeof(struct page)) >>>     err = vmemmap_populate_basepages(start, end, node, NULL); >>> else if (boot_cpu_has(X86_FEATURE_PSE)) >>>     err = vmemmap_populate_hugepages(start, end, node, altmap); >>> ... >>> >>> Maybe I'm missing something. Most importantly, why the weird subsection >>> stuff is supposed to degrade ordinary hotplug of dimms/virtio-mem etc. >> >> I think that's based on the discussion for a previous version assuming >> that the hotplug/unplug sizes are not guaranteed to be symmetric: >> >> https://lore.kernel.org/lkml/a720aaa5-a75e-481e-b396- >> a5f2b50ed362@quicinc.com/ >> > > If that's not the case, we can indeed ignore the SUBSECTION_SIZE> > altogether and just rely on the start/end of the hotplugged region. > > All cases I know about hotunplug system RAM in the same granularity they > hotplugged (virtio-mem, dax/kmem, dimm, dlpar), and if they wouldn't, > they wouldn't operate on sub-section sizes either way. > > Regarding dax/pmem, I also recall that it happens always in the same > granularity. If not, it should be fixed: this weird subsection hotplug > should not make all other hotplug users suffer (e.g., no vmemmap PMD). > > What can likely happen (dax/pmem) is that we hotplug something that > spans part of 128 MiB section (subsections), to then hotplug something > that spans another part of a 128 MiB section (subsections). > Hotunplugging either should not hotplug something part of the other > device (e.g., rip out the vmemmap PMD). > > I think this was expressed with: > > "However, if start or end is not aligned to a section boundary, such as > when a subsection is hot added, populating the entire section is > wasteful." -- which is what we should focus on. > > I thought x86-64 would handle that case; it would surprise me if > handling between both archs would have to differ in that regard: with 4k > arm64 we have the same section/subsection sizes as on x86-64. > Thanks David and Catalin. From your discussion, I understand that hotplug/unplug sizes are guaranteed to be symmetric ? Therefore, it should be straightforward to populate to base pages if (end - start < PAGES_PER_SECTION * sizeof(struct page)) ? I will write patch and verify. Please correct me if my understanding is incorrect.