From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3889C25B74 for ; Tue, 21 May 2024 14:18:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4549A6B00A1; Tue, 21 May 2024 10:18:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 404AF6B00A2; Tue, 21 May 2024 10:18:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CC7B6B00A3; Tue, 21 May 2024 10:18:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0F99F6B00A1 for ; Tue, 21 May 2024 10:18:53 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8946C12126A for ; Tue, 21 May 2024 14:18:52 +0000 (UTC) X-FDA: 82142609304.16.F9ACB56 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf23.hostedemail.com (Postfix) with ESMTP id D47E8140029 for ; Tue, 21 May 2024 14:18:50 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Dh39Fdl1; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of bjorn@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=bjorn@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716301131; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/C4Sq5eMSOEsD70n+MlzHFGFBNnaMCq9mc6fqQT9/f4=; b=oyu3m90NE9JVxbGvZGcJoWFazeluVmBPWzPPXE2d7ByKi1MfFqz+r+dpISERnokHQ5iOXV dTL2gnm8dGy9plBNni02dNSRJpUpBk8fD7DER4NRdyk3z8yHz8MBIAgFD8te1UZAoVvB8h e/6AXNz5p3wxfMQEd+9ZpWBy6lQwR6U= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Dh39Fdl1; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of bjorn@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=bjorn@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716301131; a=rsa-sha256; cv=none; b=JjS/gw5H6SshlHArEE3+FXvdUbPC/sT49Hbo0yyfzeVw+pAFrAXKdEN/ikjEnTdMjnus6f zNBLLOUeI801Yhx1fLFUpUch7gYU4RmtToc7tF3lF9wg42tzTgwGiFM8A6vUlFOt+TWtOd qQfBVi5e1TcC/otLhYx7A/eyKqiXCqk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id D801E62243; Tue, 21 May 2024 14:18:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 203FCC2BD11; Tue, 21 May 2024 14:18:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716301129; bh=tP7EG5a9H+6WQUK7/ZhP8N575ilx5NZpoxhPWsiIALA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Dh39Fdl1R6PctLN2xxAMhZZmzV1xeFmZv6jfhquWSpe6BUz1Unxn7w6GJWpQE6MXU aQdXjmZeX0PRXfyv2mZj6vl5wivtAjUy7mfT7qUG5ycsi+/mVwUcN4s8MUCXn0teyi vxI/lJtbRV1dZMsEQq+u++F3tNHEIumzcHrwC//+l/xX7P2O4p+vrDoIlUUpta01S+ CVaKDR1e8ozil0gg1jHkc4TZ2ji/jntKWe7atmY/hLZw8BfwTZCm9Mu+wVSqQ6Hc+3 GSUI+sIVyZhZ/iEd992Sfc+zUnalUN0RU0DS1VXc6443jCOyCX/9xJls8NlPp5EmYQ 9Ma41oadmcpYA== From: =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= To: Alexandre Ghiti Cc: Albert Ou , David Hildenbrand , Palmer Dabbelt , Paul Walmsley , linux-riscv@lists.infradead.org, Oscar Salvador , =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= , Andrew Bresticker , Chethan Seshadri , Lorenzo Stoakes , Santosh Mamila , Sivakumar Munnangi , Sunil V L , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org Subject: Re: [PATCH v3 5/9] riscv: mm: Add memory hotplugging support In-Reply-To: References: <20240521114830.841660-1-bjorn@kernel.org> <20240521114830.841660-6-bjorn@kernel.org> Date: Tue, 21 May 2024 16:18:46 +0200 Message-ID: <87jzjndz95.fsf@all.your.base.are.belong.to.us> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D47E8140029 X-Stat-Signature: 9n53sszmojnax95n189xzyqjfnrjfm8s X-HE-Tag: 1716301130-637820 X-HE-Meta: U2FsdGVkX1+qSidLlr1dHZ/mzumsk9CwT0wtUS+B8pBvS4w7SsTOE8yAtpAGVnxXgkPCCrfOdyRNzoHhOt3T9vX2FDye5JaARivOAe9oAtW+lxFgczqLJ9M1X4tN9GHYHu05PNhI2TXFUjLfVTe1vKW/XyjW6iB7zYL+vdV8C2uCNu9kyl+BN2eQr+nmfr4biK0lvIR4gZ1WRdPpILwXutIZ71mrg+056/J4pb5+fiPhPbwq6UaVZEgq/XCiP1LYPZJk12hUjTwYZlSbGFKgWXkg8KwJToure4SkIzqoLw8BQ4GLxXWn4PEpkFk7DB2MdueAICRu2OL1c6gbIOlFjj4Tg/RaK2J1C7517biwZClsYgqEWymKf7aqTB6PYOpML5tDmk491NIk1zb+6BYlhmeU5B2dCAtCkWKEMsFjrjilZTd4MqYxie8HFwUyl2PeaPGQgHb4rRPK3iIWGVfW3fLLT4g0sPL/iapMdPBQke4xGwnAsM86whAOUBZI3/HzcUyn4rKDZVTXshxMg40nGajEqkAtZOf+iPXeCTFcMCH7ECtfbN8DmPGXNoeT3QY/Jh2C87Y6XSS28iMFjA8pNzkWz5DM4hwYUf/dG3XoXRb8bM4Lr5SCbH6r1d2EePKFP1YEjh5Kxd4bDpYxX/jv0wfa/nZCROXmCN8F+78TNUUDgSvSLp2+t3YIgNGpR8CpzdAf4+/o7ggzQyUME/NzCqsnK0JURs7ZwjatmqNtElvShx1u1X6JhR4VA4vj77MGb1epBzMkrCaZj2neslVW3QJmeBQUJW4cIF64pCTh1KqdsQurFMnH0fTnENqkNEZveKuoF6Id9tiyjByGZHVo4yhgv0la1fWxeSEFBGMg8Obn+2ADmZ6kHY8mXIK2bubuAk5a8EtANynu63MLSdJHok6Lq0i+OJiLBb6S7Kj6UcnbjJoCIw6/gIKp3z0I+GaKYRXB26U3mgf+XUxXnB8 Tb+EVEvR pSv9OICJeZ9aFqYpTLiTKXIpTI7IjiH+oijeeQitacKXPcCC7S0MyHNcwbaIxvs3/1E1vfNqqbHEp+i3VZXq0RbpVSPyPgJdy9a3Sflq3UQYQS/s9SttCzLhtriMU/r0xxc6MWhGBzfWoZr9n6TlNmhoN0UoUWWYhNpjCdx2jy9CcP85AoxnAD/QLtXITxjCIKg/6mevJxkyMP1JT7QLDO6Ohenb+w3nEK8twgydi705iUU9DSAW5Pmu+hPP5A6jbwISCPlR9I7FXKLVOMrXLXkOed246nRwgVzaM+8Lk7i+tVoKxrCSTfLZU03Z7chdbVDt0pXtyYfFktASHHs7NlCng7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Alexandre Ghiti writes: > On Tue, May 21, 2024 at 1:49=E2=80=AFPM Bj=C3=B6rn T=C3=B6pel wrote: >> >> From: Bj=C3=B6rn T=C3=B6pel >> >> For an architecture to support memory hotplugging, a couple of >> callbacks needs to be implemented: >> >> arch_add_memory() >> This callback is responsible for adding the physical memory into the >> direct map, and call into the memory hotplugging generic code via >> __add_pages() that adds the corresponding struct page entries, and >> updates the vmemmap mapping. >> >> arch_remove_memory() >> This is the inverse of the callback above. >> >> vmemmap_free() >> This function tears down the vmemmap mappings (if >> CONFIG_SPARSEMEM_VMEMMAP is enabled), and also deallocates the >> backing vmemmap pages. Note that for persistent memory, an >> alternative allocator for the backing pages can be used; The >> vmem_altmap. This means that when the backing pages are cleared, >> extra care is needed so that the correct deallocation method is >> used. >> >> arch_get_mappable_range() >> This functions returns the PA range that the direct map can map. >> Used by the MHP internals for sanity checks. >> >> The page table unmap/teardown functions are heavily based on code from >> the x86 tree. The same remove_pgd_mapping() function is used in both >> vmemmap_free() and arch_remove_memory(), but in the latter function >> the backing pages are not removed. >> >> Signed-off-by: Bj=C3=B6rn T=C3=B6pel >> --- >> arch/riscv/mm/init.c | 261 +++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 261 insertions(+) >> >> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >> index 6f72b0b2b854..6693b742bf2f 100644 >> --- a/arch/riscv/mm/init.c >> +++ b/arch/riscv/mm/init.c >> @@ -1493,3 +1493,264 @@ void __init pgtable_cache_init(void) >> } >> } >> #endif >> + >> +#ifdef CONFIG_MEMORY_HOTPLUG >> +static void __meminit free_pagetable(struct page *page, int order) >> +{ >> + unsigned int nr_pages =3D 1 << order; >> + >> + /* >> + * vmemmap/direct page tables can be reserved, if added at >> + * boot. >> + */ >> + if (PageReserved(page)) { >> + __ClearPageReserved(page); > > What's the difference between __ClearPageReserved() and > ClearPageReserved()? Because it seems like free_reserved_page() calls > the latter already, so why would you need to call > __ClearPageReserved() on the first page? Indeed! x86 copy pasta (which uses bootmem info page that RV doesn't). >> + while (nr_pages--) >> + free_reserved_page(page++); >> + return; >> + } >> + >> + free_pages((unsigned long)page_address(page), order); >> +} >> + >> +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) >> +{ >> + pte_t *pte; >> + int i; >> + >> + for (i =3D 0; i < PTRS_PER_PTE; i++) { >> + pte =3D pte_start + i; >> + if (!pte_none(*pte)) >> + return; >> + } >> + >> + free_pagetable(pmd_page(*pmd), 0); >> + pmd_clear(pmd); >> +} >> + >> +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) >> +{ >> + pmd_t *pmd; >> + int i; >> + >> + for (i =3D 0; i < PTRS_PER_PMD; i++) { >> + pmd =3D pmd_start + i; >> + if (!pmd_none(*pmd)) >> + return; >> + } >> + >> + free_pagetable(pud_page(*pud), 0); >> + pud_clear(pud); >> +} >> + >> +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d) >> +{ >> + pud_t *pud; >> + int i; >> + >> + for (i =3D 0; i < PTRS_PER_PUD; i++) { >> + pud =3D pud_start + i; >> + if (!pud_none(*pud)) >> + return; >> + } >> + >> + free_pagetable(p4d_page(*p4d), 0); >> + p4d_clear(p4d); >> +} >> + >> +static void __meminit free_vmemmap_storage(struct page *page, size_t si= ze, >> + struct vmem_altmap *altmap) >> +{ >> + if (altmap) >> + vmem_altmap_free(altmap, size >> PAGE_SHIFT); >> + else >> + free_pagetable(page, get_order(size)); >> +} >> + >> +static void __meminit remove_pte_mapping(pte_t *pte_base, unsigned long= addr, unsigned long end, >> + bool is_vmemmap, struct vmem_al= tmap *altmap) >> +{ >> + unsigned long next; >> + pte_t *ptep, pte; >> + >> + for (; addr < end; addr =3D next) { >> + next =3D (addr + PAGE_SIZE) & PAGE_MASK; > > Nit: use ALIGN() instead. > >> + if (next > end) >> + next =3D end; >> + >> + ptep =3D pte_base + pte_index(addr); >> + pte =3D READ_ONCE(*ptep); > > Nit: Use ptep_get() > >> + >> + if (!pte_present(*ptep)) >> + continue; >> + >> + pte_clear(&init_mm, addr, ptep); >> + if (is_vmemmap) >> + free_vmemmap_storage(pte_page(pte), PAGE_SIZE, a= ltmap); >> + } >> +} >> + >> +static void __meminit remove_pmd_mapping(pmd_t *pmd_base, unsigned long= addr, unsigned long end, >> + bool is_vmemmap, struct vmem_al= tmap *altmap) >> +{ >> + unsigned long next; >> + pte_t *pte_base; >> + pmd_t *pmdp, pmd; >> + >> + for (; addr < end; addr =3D next) { >> + next =3D pmd_addr_end(addr, end); >> + pmdp =3D pmd_base + pmd_index(addr); >> + pmd =3D READ_ONCE(*pmdp); > > Nit: Use pmdp_get() > >> + >> + if (!pmd_present(pmd)) >> + continue; >> + >> + if (pmd_leaf(pmd)) { >> + pmd_clear(pmdp); >> + if (is_vmemmap) >> + free_vmemmap_storage(pmd_page(pmd), PMD_= SIZE, altmap); >> + continue; >> + } >> + >> + pte_base =3D (pte_t *)pmd_page_vaddr(*pmdp); >> + remove_pte_mapping(pte_base, addr, next, is_vmemmap, alt= map); >> + free_pte_table(pte_base, pmdp); >> + } >> +} >> + >> +static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long= addr, unsigned long end, >> + bool is_vmemmap, struct vmem_al= tmap *altmap) >> +{ >> + unsigned long next; >> + pud_t *pudp, pud; >> + pmd_t *pmd_base; >> + >> + for (; addr < end; addr =3D next) { >> + next =3D pud_addr_end(addr, end); >> + pudp =3D pud_base + pud_index(addr); >> + pud =3D READ_ONCE(*pudp); > > Nit: Use pudp_get() > >> + >> + if (!pud_present(pud)) >> + continue; >> + >> + if (pud_leaf(pud)) { >> + if (pgtable_l4_enabled) { >> + pud_clear(pudp); >> + if (is_vmemmap) >> + free_vmemmap_storage(pud_page(pu= d), PUD_SIZE, altmap); >> + } >> + continue; >> + } >> + >> + pmd_base =3D pmd_offset(pudp, 0); >> + remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, alt= map); >> + >> + if (pgtable_l4_enabled) >> + free_pmd_table(pmd_base, pudp); >> + } >> +} >> + >> +static void __meminit remove_p4d_mapping(p4d_t *p4d_base, unsigned long= addr, unsigned long end, >> + bool is_vmemmap, struct vmem_al= tmap *altmap) >> +{ >> + unsigned long next; >> + p4d_t *p4dp, p4d; >> + pud_t *pud_base; >> + >> + for (; addr < end; addr =3D next) { >> + next =3D p4d_addr_end(addr, end); >> + p4dp =3D p4d_base + p4d_index(addr); >> + p4d =3D READ_ONCE(*p4dp); > > Nit: Use p4dp_get() ...and I'll make sure to address these nits as well. Thanks! Bj=C3=B6rn