From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF4F5C02196 for ; Thu, 6 Feb 2025 08:49:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55CE0280002; Thu, 6 Feb 2025 03:49:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E818280003; Thu, 6 Feb 2025 03:49:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 390BE280002; Thu, 6 Feb 2025 03:49:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1608F280002 for ; Thu, 6 Feb 2025 03:49:00 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8CABA4B4D0 for ; Thu, 6 Feb 2025 08:48:59 +0000 (UTC) X-FDA: 83088894798.28.FB7C72D Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf16.hostedemail.com (Postfix) with ESMTP id 23AB018001D for ; Thu, 6 Feb 2025 08:48:56 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=MNHCTiRu; spf=pass (imf16.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738831737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HV7rdfOO37+bLA4QCT63lTmO77gcJONDjGXKxVKB6mQ=; b=R/TODTpwa6So+69OePTuLC0YvrgSd/pB7YLpsCJlxWMLMnG7SOzl3U+ArRL3Irs2PXOLjH rSzyqHlSLMUsVf22GtVVkB+f9m3sFPzMM7RK+UMwr995xgX8Dt1bANau0n7s0SZqSMhf3Z RF0hseusK88FNKgUGQZw+6SXLgJmDFg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738831737; a=rsa-sha256; cv=none; b=GzdPnBQD6iR7vToCc0LZgXnuqBjeVWcCrWea2CCB4X35Lz5o/gPrZ7R5AgDOhCe5yB/XsO yOnVS5KO3iB1iifJE5RcL7iy7iErdZUCh6JwEBCXdb/PTnUHlkoZVpn/N6t6YdCRgRppg1 mBKu/DDgeUaZNB4WYt2DcQSHGyRurt8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=MNHCTiRu; spf=pass (imf16.hostedemail.com: domain of quic_zhenhuah@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_zhenhuah@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5167xquF006451; Thu, 6 Feb 2025 08:48:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= HV7rdfOO37+bLA4QCT63lTmO77gcJONDjGXKxVKB6mQ=; b=MNHCTiRuJXvvYBfh b2jUL9P8/F/1lpzXvgFaX/n7QhMKlCtbRXshaV4SOUlAicsDLPKS/ld879TaM/AK l8JBFGZBEbJ4aVUWPpeB+KcjoOEiZcWPUjMLwYQ8IzURZS3Kr6Sd5ptBw6oecxTi IGo1k8jukP4VE5Yjkt2vhf9vjlh30HzsRABMdFXu6owZqKBzcM7dRL4paIqFwhW0 bJutjtPFZNzJD0+AqdWw+vbWbnRR+uIU4DmG7z0tb8K9vLW4F04c65eDbKYl2plx fDBu6+5T2DS6PCJaPfE+XITsuer1qr2+ym7SLs8+gxy+SiMiOuqHtKu4lyfMLf8G uP3Faw== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 44mp648hgu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 06 Feb 2025 08:48:48 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA05.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 5168mlmn017334 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 6 Feb 2025 08:48:47 GMT Received: from [10.239.132.245] (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Thu, 6 Feb 2025 00:48:43 -0800 Message-ID: <10a970c0-e8f2-41c5-a68f-3ac4ae88d9d0@quicinc.com> Date: Thu, 6 Feb 2025 16:48:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5] arm64: mm: Populate vmemmap/linear at the page level for hotplugged sections From: Zhenhua Huang To: , CC: , , , , , , , , , , , , , References: <20250109093824.452925-1-quic_zhenhuah@quicinc.com> <9ae36424-2cb6-491d-8ac2-95bfe39828a2@quicinc.com> Content-Language: en-US In-Reply-To: <9ae36424-2cb6-491d-8ac2-95bfe39828a2@quicinc.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: pK0AeNTT_iaLFAnCN66di-FsCCP3I7ts X-Proofpoint-ORIG-GUID: pK0AeNTT_iaLFAnCN66di-FsCCP3I7ts X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-06_01,2025-02-05_03,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 phishscore=0 clxscore=1015 malwarescore=0 mlxlogscore=999 mlxscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2501170000 definitions=main-2502060071 X-Rspamd-Queue-Id: 23AB018001D X-Stat-Signature: c1g7gso7z4gkjsmstfwncj9xc1thrsoj X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738831736-138813 X-HE-Meta: U2FsdGVkX1/T+G41wi+UjqQJDGpPZGELyVsGVnZtjfwnSzMpVOpnmQ0KLQUG43yA7YN9wt4d3Ty44aFzB6Oi1zUADX3BQ18r6WXrEBE/Gs0d++7PpeYt655dhXNDRMef4nCCsEP5mPS3PAC4xVNhHjbfgKjvn2iuXm37YbIcIrUZe66DwrcNx0JZ3P7K3nRQbgVSAuVq1EWFm0D7cjSZv4TdEWQFtdFkqSzBOGGkCnYxHuw7G5PhIxEcvOfAFq4mEmusPQNiX9FvEUOAJy1cos5mxTikAxjBnF51WAIB98wNoJE0isTKjEKu9KluKYz2M+Il91UaE4fHfOVG4VyNssUHBTCoYdxLz3Omx943A9nUvhna3GM1MvoGNGp1BLLvFwW3v1ScisNmGaHzTaLC0ciGrP5DJOqnonExrZCyHb3rhgmT5xIJwGT+22RaoXAX3JeM81thjCGhnUCBNlxdwvFW4yi+Bs10u+KwOduIGyPsWmD0mur8XRQxGvK/T7Zgyw96y7NvRk3O5YeNTFfsgyxzaO65MDuGkdr+H8LdLbQ+K34MtrmuaaLJY0pE8Gw1uHOpDg5+FILWmIJKVVGLPKjSVLNk/CfmDbFcscwwArNAfhe+xpvMai6+MS5KapGoYkqNa/3gWHk1JVkNkQJqGZliwgzNbwIjmilK0yk4454yW4H908wmE2K5x8RTsOHRHbI+Dq3DOiTBVqq09FTqNphKQ/6cRmJO8li8szZAhRU5UO/wYtSKvOL1M739MzWKiWW/fxEIjmuQg1AwxocOR1NkTnFpqmC2oGR61oixNztJSLJRYexuaQVf/LYdRfSrKUMI64I15JEwoqbKtnaq/ydLMgrN2MQnv+F7xsz/nzkcYk4K7I5LHNJ8bEWDgFMuyffMDp2Us+uL1/k0eV8OrCpCmYdBFf/lyV8BgbUOqs7l3ucsnPye8dS5Px22a39bnh+in0Ra+gpEDnVBv27 GSh1hRl3 J1YZbJoU1md6P+Q3yQt7RYSZD7WDzr9yxadt4sEilDgfV2QRz6MCeELdBPzjzKw1mDwc6gv+F+bI4AkKo0favjhbHG/AHlhTaFp5ioo8D+v9GwMwLhoNhrv5nGdVQ161cMaevbM/ntvNvQRQz4YTFsFdg6mYUuBrYxii2o3XBNTmAifpeEhPhSEeQut+kjkgfXWUzgnl+H2vv/2pUQGEyo1neCW9PDt5Xn1N5vh8cTMx47BXR6hnlsVhFoL/XbUpIWg/xCruEjC/sa3ji3MjmDfZgL2DYpieD2iPvF5C+D4srY/A/CgnW/ECgCjyv6LBwQoPUlfLIZ70Y939JS3o7+NulbHAQ5VS9cVO1G80o5ejYnBR+t7rIXNtryrxx3R5q027vT3eBDQO/Cv2GcY2Kl91+i0e/2hT7zPfm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Dear Catalin and Anshuman, Given that it's a genuine bug affecting stability, we need to address it :). Could you please let me know if you have any additional concerns or thoughts? On 2025/1/15 10:13, Zhenhua Huang wrote: > Gentle reminder if you happened to miss it :) > > On 2025/1/9 17:38, Zhenhua Huang wrote: >> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set >> to 27, making one section 128M. The related page struct which vmemmap >> points to is 2M then. >> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >> vmemmap to populate at the PMD section level which was suitable >> initially since hot plug granule is always one section(128M). However, >> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the >> existing arm64 assumptions. >> >> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when >> pmd_sect() is true, the entire PMD section is cleared, even if there is >> other effective subsection. For example page_struct_map1 and >> page_strcut_map2 are part of a single PMD entry and they are hot-added >> sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear >> the entire PMD entry freeing the struct page map for the whole section, >> even though page_struct_map2 is still active. Similar problem exists >> with linear mapping as well, for 16K base page(PMD size = 32M) or 64K >> base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. >> Tearing down the entire PMD mapping too will leave other subsections >> unmapped in the linear mapping. >> >> To address the issue, we need to prevent PMD/PUD/CONT mappings for both >> linear and vmemmap for non-boot sections if corresponding size on the >> given base page exceeds SUBSECTION_SIZE(2MB now). >> >> Cc: stable@vger.kernel.org # v5.4+ >> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >> Signed-off-by: Zhenhua Huang >> --- >> Hi Catalin and Anshuman, >> I have addressed comments so far, please help review. >> One outstanding point which not finalized is in vmemmap_populate(): >> how to judge hotplug >> section. Currently I am using system_state, discussion: >> https://lore.kernel.org/linux-mm/1515dae4-cb53-4645-8c72- >> d33b27ede7eb@quicinc.com/ >>   arch/arm64/mm/mmu.c | 46 ++++++++++++++++++++++++++++++++++++--------- >>   1 file changed, 37 insertions(+), 9 deletions(-) >> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index e2739b69e11b..8718d6e454c5 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -42,9 +42,13 @@ >>   #include >>   #include >> -#define NO_BLOCK_MAPPINGS    BIT(0) >> -#define NO_CONT_MAPPINGS    BIT(1) >> -#define NO_EXEC_MAPPINGS    BIT(2)    /* assumes FEAT_HPDS is not >> used */ >> +#define NO_PMD_BLOCK_MAPPINGS    BIT(0) >> +#define NO_PUD_BLOCK_MAPPINGS    BIT(1)  /* Hotplug case: do not want >> block mapping for PUD */ >> +#define NO_BLOCK_MAPPINGS    (NO_PMD_BLOCK_MAPPINGS | >> NO_PUD_BLOCK_MAPPINGS) >> +#define NO_PTE_CONT_MAPPINGS    BIT(2) >> +#define NO_PMD_CONT_MAPPINGS    BIT(3)  /* Hotplug case: do not want >> cont mapping for PMD */ >> +#define NO_CONT_MAPPINGS    (NO_PTE_CONT_MAPPINGS | >> NO_PMD_CONT_MAPPINGS) >> +#define NO_EXEC_MAPPINGS    BIT(4)    /* assumes FEAT_HPDS is not >> used */ >>   u64 kimage_voffset __ro_after_init; >>   EXPORT_SYMBOL(kimage_voffset); >> @@ -224,7 +228,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, >> unsigned long addr, >>           /* use a contiguous mapping if the range is suitably aligned */ >>           if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) && >> -            (flags & NO_CONT_MAPPINGS) == 0) >> +            (flags & NO_PTE_CONT_MAPPINGS) == 0) >>               __prot = __pgprot(pgprot_val(prot) | PTE_CONT); >>           init_pte(ptep, addr, next, phys, __prot); >> @@ -254,7 +258,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long >> addr, unsigned long end, >>           /* try section mapping first */ >>           if (((addr | next | phys) & ~PMD_MASK) == 0 && >> -            (flags & NO_BLOCK_MAPPINGS) == 0) { >> +            (flags & NO_PMD_BLOCK_MAPPINGS) == 0) { >>               pmd_set_huge(pmdp, phys, prot); >>               /* >> @@ -311,7 +315,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, >> unsigned long addr, >>           /* use a contiguous mapping if the range is suitably aligned */ >>           if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) && >> -            (flags & NO_CONT_MAPPINGS) == 0) >> +            (flags & NO_PMD_CONT_MAPPINGS) == 0) >>               __prot = __pgprot(pgprot_val(prot) | PTE_CONT); >>           init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, flags); >> @@ -358,8 +362,8 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned >> long addr, unsigned long end, >>            * For 4K granule only, attempt to put down a 1GB block >>            */ >>           if (pud_sect_supported() && >> -           ((addr | next | phys) & ~PUD_MASK) == 0 && >> -            (flags & NO_BLOCK_MAPPINGS) == 0) { >> +            ((addr | next | phys) & ~PUD_MASK) == 0 && >> +            (flags & NO_PUD_BLOCK_MAPPINGS) == 0) { >>               pud_set_huge(pudp, phys, prot); >>               /* >> @@ -1177,7 +1181,13 @@ int __meminit vmemmap_populate(unsigned long >> start, unsigned long end, int node, >>   { >>       WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); >> -    if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) >> +    /* >> +     * Hotplugged section does not support hugepages as >> +     * PMD_SIZE (hence PUD_SIZE) section mapping covers >> +     * struct page range that exceeds a SUBSECTION_SIZE >> +     * i.e 2MB - for all available base page sizes. >> +     */ >> +    if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || system_state != >> SYSTEM_BOOTING) >>           return vmemmap_populate_basepages(start, end, node, altmap); >>       else >>           return vmemmap_populate_hugepages(start, end, node, altmap); >> @@ -1339,9 +1349,27 @@ int arch_add_memory(int nid, u64 start, u64 size, >>               struct mhp_params *params) >>   { >>       int ret, flags = NO_EXEC_MAPPINGS; >> +    unsigned long start_pfn = PFN_DOWN(start); >> +    struct mem_section *ms = __pfn_to_section(start_pfn); >>       VM_BUG_ON(!mhp_range_allowed(start, size, true)); >> +    /* should not be invoked by early section */ >> +    WARN_ON(early_section(ms)); >> + >> +    /* >> +     * Disallow BlOCK/CONT mappings if the corresponding size exceeds >> +     * SUBSECTION_SIZE which now is 2MB. >> +     * >> +     * PUD_BLOCK or PMD_CONT should consistently exceed SUBSECTION_SIZE >> +     * across all variable page size configurations, so add them >> directly >> +     */ >> +    flags |= NO_PUD_BLOCK_MAPPINGS | NO_PMD_CONT_MAPPINGS; >> +    if (SUBSECTION_SHIFT < PMD_SHIFT) >> +        flags |= NO_PMD_BLOCK_MAPPINGS; >> +    if (SUBSECTION_SHIFT < CONT_PTE_SHIFT) >> +        flags |= NO_PTE_CONT_MAPPINGS; >> + >>       if (can_set_direct_map()) >>           flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > >