From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60810C25B74 for ; Tue, 21 May 2024 13:19:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D05756B008C; Tue, 21 May 2024 09:19:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB4846B0098; Tue, 21 May 2024 09:19:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7C596B009A; Tue, 21 May 2024 09:19:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9A3A06B008C for ; Tue, 21 May 2024 09:19:52 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 48E0216120E for ; Tue, 21 May 2024 13:19:52 +0000 (UTC) X-FDA: 82142460624.29.3627E75 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf25.hostedemail.com (Postfix) with ESMTP id 363F9A001B for ; Tue, 21 May 2024 13:19:49 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=opTVqdMi; spf=pass (imf25.hostedemail.com: domain of alexghiti@rivosinc.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=alexghiti@rivosinc.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716297590; a=rsa-sha256; cv=none; b=6w9fucENALe2bFAODAGzfkkDheIakMZQqWSupL3ptRb5nWambKhjoGTG4kZq3bsIwxvpzs GI2RTdUDnveU0fGSu1EyTIZ/0DhLVmO2FCrVFRSzSylBVf1I/5aE25aVQbJ3L/zTebipB+ 1YJoMGwwQsp9pZi91WDJmY0civeJgrA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=opTVqdMi; spf=pass (imf25.hostedemail.com: domain of alexghiti@rivosinc.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=alexghiti@rivosinc.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716297590; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JUsqZcIImoN3lygpYsAoZ85j0J0RTOEqy4OcMBQJ2fM=; b=T8tlG6wzvWd6VBgeZBRdn+W7LzLhGL1DEHUFguxrzurpSdaK83CHEQghgxMmlybVSVj/ku fq5DHi5YTHpsZw09k/rT0tCJKvOyBglsRa6CASKFHGxO9uYfbvZlzSwHd3zNdI6nSlpxi3 gkHcngk6A5eYYqzoJrTUpYr366kiBDA= Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-574f7c0bab4so11332686a12.0 for ; Tue, 21 May 2024 06:19:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1716297588; x=1716902388; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JUsqZcIImoN3lygpYsAoZ85j0J0RTOEqy4OcMBQJ2fM=; b=opTVqdMiYwc3gmYxjiirpPqHG6spUVRdj8pkSMM9E8+yge2pMI/R71vGPtdc9mbf6b TH6AKUnPkuJ3EXquXnWCp3wiElHDDn9F4pSrb/0cSWNxSZXBWYLk6iWBHTOUHJRK8OJ3 jllxtzjjlSfr+rSVWdfzD74gmuYMoEZ9hW/JmM0tKzRhQRptkC8TxzrorhDwrsiawyUg 3285XniXYcyVMR7vZugXJ2afZ1Wx11isGXetq4oaFOVqabGmS94iPvXLI1IO6NKhYSTz 3wokfyTriGakJ/NMJ8Tj47cdaL+ZWFRyuQw4CFbgh0zpP7qsNyDtiYC7qhWIYlcl2ae7 VYdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716297588; x=1716902388; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JUsqZcIImoN3lygpYsAoZ85j0J0RTOEqy4OcMBQJ2fM=; b=TgQ9uFH//vnWj9smRdjgIN0zjpBwf1hKH89m5SnePzlr9MiE9IALya4v9n8+lPLCSd LwYt5bM8RvP9bAhxk6NWMF320cVsSwHqC9I3zZSlVhg33jWE9ZkWiWyKNUA9MoQ/xDoJ 5DlyAsj5igmzY+K2jlpXlormdS6JJ5+gOtlCu1jttIXvXJoZJXd6g8rLbD053A/Z5sEZ YpJc0BjtZ3vLAmsQFrg9OeBniK9O/HSLZu+uDu5d34DpJjUGAMtCLp2zdsXpxuBzRhX7 gB8oVvy2c/WpCH1D2lGDAqzC5lzcfMTYQhweUFiD5gxQ0oej4bBa0J0EcFti8cpogRXQ EzvQ== X-Forwarded-Encrypted: i=1; AJvYcCXHDUdDaKFN1WxXN3kDu+VgJBSmItia7sBXLQa5pgn7neIqUoIlR1HD/GUovQk/e55mY5+BQVB58coxKtFB9SobswM= X-Gm-Message-State: AOJu0Yz/x7VI4u6WQQgEST8NWIEW5v2gbai9BX+o1uclXXzi8mm9AGsQ xeKSlq+phgWjNIFzyQ4aewr1QxyjNYgk+c1xQn943x4zlfEQlMM7qMzJfHKHKCEuQC1s5m6zTHA NZo5Cp8W+RoxB1Nq2Y6XsnVf1s7vOIVphrthI0g== X-Google-Smtp-Source: AGHT+IEEzbyXnt/Njk3xCSo+L3Pf6K4ovTW/ntRZWyISNPBKcK+gLEL/RjTpGtnUz3wDTJ4yIeDk+UMPwIYqZTbvTRM= X-Received: by 2002:a17:907:9919:b0:a5a:8673:cccd with SMTP id a640c23a62f3a-a5d5c8270d0mr837719766b.30.1716297588567; Tue, 21 May 2024 06:19:48 -0700 (PDT) MIME-Version: 1.0 References: <20240521114830.841660-1-bjorn@kernel.org> <20240521114830.841660-6-bjorn@kernel.org> In-Reply-To: <20240521114830.841660-6-bjorn@kernel.org> From: Alexandre Ghiti Date: Tue, 21 May 2024 15:19:37 +0200 Message-ID: Subject: Re: [PATCH v3 5/9] riscv: mm: Add memory hotplugging support To: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Cc: Albert Ou , David Hildenbrand , Palmer Dabbelt , Paul Walmsley , linux-riscv@lists.infradead.org, Oscar Salvador , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Andrew Bresticker , Chethan Seshadri , Lorenzo Stoakes , Santosh Mamila , Sivakumar Munnangi , Sunil V L , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 363F9A001B X-Stat-Signature: is6joiyittzqatihr1kjjpff3szxxob8 X-HE-Tag: 1716297589-40711 X-HE-Meta: U2FsdGVkX1+SCt4wk9iJp+xan5jruhZ7ElTK1FFf7HEd6K8OxnrfeJwnG5wDa+fmrdJd7HHt3EX5+pYdGOaxSD78xPJTTBh0fDjqiM2PgsfyoJtFV88XdrUwqr1TZiDRkOW5k8U+vlFJ/l6o0CbQUPK4wwvjN9+9kn6rJNBMy5OpNZ6BLIWdHx3oqtR7dmCuETsIrntYo4yyZhqF39+Ku3VHdRg/WzerXoBB1nmlaF4FOY7sp3Y9u/xD/t1c5csfkL5NXKOsrZbf5tKbHXsUpU8HQCuXHY9pRfZA8gPD9PpKq7Uhi02a2E+tBauXmY91NMXWqtfFvXl+kYr70xG0Uc2J6ghM1boQxOSAbwY/Ip9I53E7FjrT20BqqPkdm1tXBxRUU9l4rwLgXKBEouNjXO5migbCNcxMuEs5Leqzo+eHqAj62ti5JhyZVMRN149qDTPZjCdaIMi4w6drQGGZtaA25vQJza7fomJpcUd7atJZBrOtK8WDYaSPv1LTISZih0hlYhNhk/edAW3j7D45Nl4BLlNqEUZ+DUGzA0IgGObBBwtVQoBSYsUJ3FEfJjWDeYHZ+bx4jUj5Ao56YH5WOfF1E0STxNGR4dZwxsGtNRGE7XJ6GKWfOSkn6kDJIZKzOLGfDPB+/F0TfwCW6/EvMufu5DxNW2WaE6hroGXQz9/xn+zFbC5Nf+IKyoduMHQl6H9qRVpkGCj7fDcop+Z3E+bOsSJu+xUe13uYqvD8LnAj/YQOkxvwtTyB5yIOz5ua5P1TBazABVtQBX4WX2b0HDS9Jbg5F7hJSXUivJfDH0i9WOevcACp0VMW5/Mp31dedpdQhH2gtKzKRvTFC7YgQ+DAhuRL6jmb50smAUbwlot2jFPn0+Irbvqp2uLiKybjZFNRM1+5HbgtWD62M3biWS5f9UIJ+H+QZtexe5SOYvdibbmRFXoUgz2A2psO3/BP03citdrgO218DzWgK9t 2M3zxZ3f VBjx7iwUtKi5CZLTvU6820mjb0CgElP0UAa17R8SbbNmorOR2v0tpJy0/tYrIgeBeknbVlNyy2L6abm9GqLDNaTJIoDsU8yIlAuOjx05O1L/o4KPLqFjUvTpdU0+4/w2XNkVMkeNUSIS+xEO5d1m7ZLskxbcb46AsHCJ9Vrd3GbV5VD4qNvfQS4hAbn67iMUw0hGVmS6W7DtL6oEgET4XaPYjI+ivhREcKXld0DxrbW3dqe1I+FM9xnxrQZ9qF//hiboo0KQ3rVDwHv4dcW26Mb4i7koAmXSQFVNBKT8gXtp+r/s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 21, 2024 at 1:49=E2=80=AFPM Bj=C3=B6rn T=C3=B6pel wrote: > > From: Bj=C3=B6rn T=C3=B6pel > > For an architecture to support memory hotplugging, a couple of > callbacks needs to be implemented: > > arch_add_memory() > This callback is responsible for adding the physical memory into the > direct map, and call into the memory hotplugging generic code via > __add_pages() that adds the corresponding struct page entries, and > updates the vmemmap mapping. > > arch_remove_memory() > This is the inverse of the callback above. > > vmemmap_free() > This function tears down the vmemmap mappings (if > CONFIG_SPARSEMEM_VMEMMAP is enabled), and also deallocates the > backing vmemmap pages. Note that for persistent memory, an > alternative allocator for the backing pages can be used; The > vmem_altmap. This means that when the backing pages are cleared, > extra care is needed so that the correct deallocation method is > used. > > arch_get_mappable_range() > This functions returns the PA range that the direct map can map. > Used by the MHP internals for sanity checks. > > The page table unmap/teardown functions are heavily based on code from > the x86 tree. The same remove_pgd_mapping() function is used in both > vmemmap_free() and arch_remove_memory(), but in the latter function > the backing pages are not removed. > > Signed-off-by: Bj=C3=B6rn T=C3=B6pel > --- > arch/riscv/mm/init.c | 261 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 261 insertions(+) > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 6f72b0b2b854..6693b742bf2f 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -1493,3 +1493,264 @@ void __init pgtable_cache_init(void) > } > } > #endif > + > +#ifdef CONFIG_MEMORY_HOTPLUG > +static void __meminit free_pagetable(struct page *page, int order) > +{ > + unsigned int nr_pages =3D 1 << order; > + > + /* > + * vmemmap/direct page tables can be reserved, if added at > + * boot. > + */ > + if (PageReserved(page)) { > + __ClearPageReserved(page); What's the difference between __ClearPageReserved() and ClearPageReserved()? Because it seems like free_reserved_page() calls the latter already, so why would you need to call __ClearPageReserved() on the first page? > + while (nr_pages--) > + free_reserved_page(page++); > + return; > + } > + > + free_pages((unsigned long)page_address(page), order); > +} > + > +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) > +{ > + pte_t *pte; > + int i; > + > + for (i =3D 0; i < PTRS_PER_PTE; i++) { > + pte =3D pte_start + i; > + if (!pte_none(*pte)) > + return; > + } > + > + free_pagetable(pmd_page(*pmd), 0); > + pmd_clear(pmd); > +} > + > +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) > +{ > + pmd_t *pmd; > + int i; > + > + for (i =3D 0; i < PTRS_PER_PMD; i++) { > + pmd =3D pmd_start + i; > + if (!pmd_none(*pmd)) > + return; > + } > + > + free_pagetable(pud_page(*pud), 0); > + pud_clear(pud); > +} > + > +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d) > +{ > + pud_t *pud; > + int i; > + > + for (i =3D 0; i < PTRS_PER_PUD; i++) { > + pud =3D pud_start + i; > + if (!pud_none(*pud)) > + return; > + } > + > + free_pagetable(p4d_page(*p4d), 0); > + p4d_clear(p4d); > +} > + > +static void __meminit free_vmemmap_storage(struct page *page, size_t siz= e, > + struct vmem_altmap *altmap) > +{ > + if (altmap) > + vmem_altmap_free(altmap, size >> PAGE_SHIFT); > + else > + free_pagetable(page, get_order(size)); > +} > + > +static void __meminit remove_pte_mapping(pte_t *pte_base, unsigned long = addr, unsigned long end, > + bool is_vmemmap, struct vmem_alt= map *altmap) > +{ > + unsigned long next; > + pte_t *ptep, pte; > + > + for (; addr < end; addr =3D next) { > + next =3D (addr + PAGE_SIZE) & PAGE_MASK; Nit: use ALIGN() instead. > + if (next > end) > + next =3D end; > + > + ptep =3D pte_base + pte_index(addr); > + pte =3D READ_ONCE(*ptep); Nit: Use ptep_get() > + > + if (!pte_present(*ptep)) > + continue; > + > + pte_clear(&init_mm, addr, ptep); > + if (is_vmemmap) > + free_vmemmap_storage(pte_page(pte), PAGE_SIZE, al= tmap); > + } > +} > + > +static void __meminit remove_pmd_mapping(pmd_t *pmd_base, unsigned long = addr, unsigned long end, > + bool is_vmemmap, struct vmem_alt= map *altmap) > +{ > + unsigned long next; > + pte_t *pte_base; > + pmd_t *pmdp, pmd; > + > + for (; addr < end; addr =3D next) { > + next =3D pmd_addr_end(addr, end); > + pmdp =3D pmd_base + pmd_index(addr); > + pmd =3D READ_ONCE(*pmdp); Nit: Use pmdp_get() > + > + if (!pmd_present(pmd)) > + continue; > + > + if (pmd_leaf(pmd)) { > + pmd_clear(pmdp); > + if (is_vmemmap) > + free_vmemmap_storage(pmd_page(pmd), PMD_S= IZE, altmap); > + continue; > + } > + > + pte_base =3D (pte_t *)pmd_page_vaddr(*pmdp); > + remove_pte_mapping(pte_base, addr, next, is_vmemmap, altm= ap); > + free_pte_table(pte_base, pmdp); > + } > +} > + > +static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long = addr, unsigned long end, > + bool is_vmemmap, struct vmem_alt= map *altmap) > +{ > + unsigned long next; > + pud_t *pudp, pud; > + pmd_t *pmd_base; > + > + for (; addr < end; addr =3D next) { > + next =3D pud_addr_end(addr, end); > + pudp =3D pud_base + pud_index(addr); > + pud =3D READ_ONCE(*pudp); Nit: Use pudp_get() > + > + if (!pud_present(pud)) > + continue; > + > + if (pud_leaf(pud)) { > + if (pgtable_l4_enabled) { > + pud_clear(pudp); > + if (is_vmemmap) > + free_vmemmap_storage(pud_page(pud= ), PUD_SIZE, altmap); > + } > + continue; > + } > + > + pmd_base =3D pmd_offset(pudp, 0); > + remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altm= ap); > + > + if (pgtable_l4_enabled) > + free_pmd_table(pmd_base, pudp); > + } > +} > + > +static void __meminit remove_p4d_mapping(p4d_t *p4d_base, unsigned long = addr, unsigned long end, > + bool is_vmemmap, struct vmem_alt= map *altmap) > +{ > + unsigned long next; > + p4d_t *p4dp, p4d; > + pud_t *pud_base; > + > + for (; addr < end; addr =3D next) { > + next =3D p4d_addr_end(addr, end); > + p4dp =3D p4d_base + p4d_index(addr); > + p4d =3D READ_ONCE(*p4dp); Nit: Use p4dp_get() > + > + if (!p4d_present(p4d)) > + continue; > + > + if (p4d_leaf(p4d)) { > + if (pgtable_l5_enabled) { > + p4d_clear(p4dp); > + if (is_vmemmap) > + free_vmemmap_storage(p4d_page(p4d= ), P4D_SIZE, altmap); > + } > + continue; > + } > + > + pud_base =3D pud_offset(p4dp, 0); > + remove_pud_mapping(pud_base, addr, next, is_vmemmap, altm= ap); > + > + if (pgtable_l5_enabled) > + free_pud_table(pud_base, p4dp); > + } > +} > + > +static void __meminit remove_pgd_mapping(unsigned long va, unsigned long= end, bool is_vmemmap, > + struct vmem_altmap *altmap) > +{ > + unsigned long addr, next; > + p4d_t *p4d_base; > + pgd_t *pgd; > + > + for (addr =3D va; addr < end; addr =3D next) { > + next =3D pgd_addr_end(addr, end); > + pgd =3D pgd_offset_k(addr); > + > + if (!pgd_present(*pgd)) > + continue; > + > + if (pgd_leaf(*pgd)) > + continue; > + > + p4d_base =3D p4d_offset(pgd, 0); > + remove_p4d_mapping(p4d_base, addr, next, is_vmemmap, altm= ap); > + } > + > + flush_tlb_all(); > +} > + > +static void __meminit remove_linear_mapping(phys_addr_t start, u64 size) > +{ > + unsigned long va =3D (unsigned long)__va(start); > + unsigned long end =3D (unsigned long)__va(start + size); > + > + remove_pgd_mapping(va, end, false, NULL); > +} > + > +struct range arch_get_mappable_range(void) > +{ > + struct range mhp_range; > + > + mhp_range.start =3D __pa(PAGE_OFFSET); > + mhp_range.end =3D __pa(PAGE_END - 1); > + return mhp_range; > +} > + > +int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_param= s *params) > +{ > + int ret =3D 0; > + > + create_linear_mapping_range(start, start + size, 0, ¶ms->pgpr= ot); > + ret =3D __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,= params); > + if (ret) { > + remove_linear_mapping(start, size); > + goto out; > + } > + > + max_pfn =3D PFN_UP(start + size); > + max_low_pfn =3D max_pfn; > + > + out: > + flush_tlb_all(); > + return ret; > +} > + > +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *a= ltmap) > +{ > + __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap); > + remove_linear_mapping(start, size); > + flush_tlb_all(); > +} > + > +void __ref vmemmap_free(unsigned long start, unsigned long end, struct v= mem_altmap *altmap) > +{ > + remove_pgd_mapping(start, end, true, altmap); > +} > +#endif /* CONFIG_MEMORY_HOTPLUG */ > -- > 2.40.1 >