From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3AD2EB64D8 for ; Thu, 22 Jun 2023 15:13:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EA518D0006; Thu, 22 Jun 2023 11:13:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 299A68D0002; Thu, 22 Jun 2023 11:13:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1891D8D0006; Thu, 22 Jun 2023 11:13:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 095B28D0002 for ; Thu, 22 Jun 2023 11:13:06 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CF3F1140BBE for ; Thu, 22 Jun 2023 15:13:05 +0000 (UTC) X-FDA: 80930726730.08.ECE94A6 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf01.hostedemail.com (Postfix) with ESMTP id D2E114001E for ; Thu, 22 Jun 2023 15:13:02 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=dabbelt-com.20221208.gappssmtp.com header.s=20221208 header.b=axk7hier; dmarc=none; spf=pass (imf01.hostedemail.com: domain of palmer@dabbelt.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=palmer@dabbelt.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687446782; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=pWoBwn7kjOnWJ8cQ9OaXHQdl/aiQSmn14uD4PvCcxzE=; b=hyWJn5ioaEm7PhGfj1MhykoI2YPmId4AkcnZdzwMwpthelZV3w7kTiqme6sRPgmZR4mMXs FHXSKYIH6Lj0tebS1oaSuyAGAP+f9fbjEgdBS1MMDxptaaPv52ifOjiSrvSD0V0qVkxCeL A0X18WASYlgGt70TDaT+oe5FfXuFEi0= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=dabbelt-com.20221208.gappssmtp.com header.s=20221208 header.b=axk7hier; dmarc=none; spf=pass (imf01.hostedemail.com: domain of palmer@dabbelt.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=palmer@dabbelt.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687446783; a=rsa-sha256; cv=none; b=zb65F0CPEluURnz0x33jykbp/YXibTWNoMi+vOwxbaYnBAaGp9o3C6+/dH4OZ6Z2mhgRsR 4meSKg6cHiVg35teGStON9zEeHi01YCViHCRefm9HkD1csuOnQFdlIRY404bVR2JbCqcd3 htJy13cms78SBGw+JiIT9JXaSHE8G2g= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-66872d4a141so2992452b3a.1 for ; Thu, 22 Jun 2023 08:13:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20221208.gappssmtp.com; s=20221208; t=1687446781; x=1690038781; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=pWoBwn7kjOnWJ8cQ9OaXHQdl/aiQSmn14uD4PvCcxzE=; b=axk7hiernNQEvMLwxW/t+q32Dq3MztflbNUfqGg7S6uuw1pdYSv8doPGQ64t/whnp3 uMr5j+8BIjchom/j29y9Sg2eiX8wGvZdnVgLCzDR4Y+OXsgP3id6WKhbVxma5hcEzB5c Mj9EuCPKAOpOr7losO/cP1Xbuz3lhqFSL22rr5jj2Iyl8XV4pzYzeR93fV9tQHh9nmOg vB7p6+Cbxc8WdFOVYePavlSSPfr9HUHsuRItTJvNJZnZf4HhBlPd38ZfAHbNfHnpvleY 4jaO8Bm2Px9NCVkgGdb4C5PyuBeoPGW5a2n/zWajMiDxFjqFLmnxVfmoYUEWdSlmkNbE 50Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687446781; x=1690038781; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pWoBwn7kjOnWJ8cQ9OaXHQdl/aiQSmn14uD4PvCcxzE=; b=l+18G/xFyzqPxc3B/5oF3HI7Qq2XAH5SLRtzAZ1Us3pSqtC0peeexa3KpWjkpp1Oct lkB2G0uaMwL/QhmgutNeuTR5GwzFcqV1YKgEAcXIFDzusudDnpjrV1jh2S46QrKpCLnL q2jW9QkR7Lt79d7njRjnEx7c7CBbMYxPhF85ga66usKjoYflolcnjnbZ9wBEmnPt0O8+ JZjo7eydqiZT0fCpaWeyY5MLgwZYXIRMl14WCgY4k2/oXZ+r6YPcUUljdu2Ka223prbU l6e+ZpveEmWIkuQDRao2ihO1DHZ9JtrCm/i9JkvRUnJNzWRI29R7fePOtVECFb1su6Ue 3w6g== X-Gm-Message-State: AC+VfDzdJMaWjXLV7t2n4FY7fIWn/RROfy0VxI8jyrQs/ROvTZGtiOh2 D7iJHYP3ythgC4NrqOz+pQ0mmg== X-Google-Smtp-Source: ACHHUZ5Euxbr2zsl3BHypE18SGhVZ6f/GRnCEfjFokRyYC8Tkc1YmWl2e6vmy6euSiXkRS8u6rgLcg== X-Received: by 2002:a05:6a20:8f0f:b0:ff:f3d9:1ada with SMTP id b15-20020a056a208f0f00b000fff3d91adamr15056171pzk.60.1687446781174; Thu, 22 Jun 2023 08:13:01 -0700 (PDT) Received: from localhost ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id a3-20020a62bd03000000b00669c99d05fasm4044964pff.150.2023.06.22.08.13.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Jun 2023 08:13:00 -0700 (PDT) Date: Thu, 22 Jun 2023 08:13:00 -0700 (PDT) X-Google-Original-Date: Thu, 22 Jun 2023 08:12:01 PDT (-0700) Subject: Re: [PATCH] riscv: mm: Pre-allocate PGD entries vmalloc/modules area In-Reply-To: <20230529180023.289904-1-bjorn@kernel.org> CC: Paul Walmsley , aou@eecs.berkeley.edu, linux-riscv@lists.infradead.org, Bjorn Topel , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux@rivosinc.com, alexghiti@rivosinc.com, joro@8bytes.org From: Palmer Dabbelt To: bjorn@kernel.org Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D2E114001E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: hh3uq7f19tfrh17jzgaxoxgiunq53d8t X-HE-Tag: 1687446782-52997 X-HE-Meta: U2FsdGVkX1/KQ2ZI+1Icfw7UGeEzPQNjf7Rvg3xdzB5IMd8h5AJoGAeJAYzoVbpyQAx0JtUMYCzNLJUptDvMncvFFtB+gU0lUSnO8Obchrat6rpqL1fvFjdwSwl0FNFE6KvW/ssgWjSHIyAOK8ib8OgsaHwFdQnQ1Z4XBLDxcK9k5YGNnT6wAioChtpt3rUqU8oBD6bWUgT8dyv36RKGsxXm3lhN60tcRhoLkR30ua+FZohqg0hi1PteFRHO+mgjg5kNcyvmVQkI3Gzb7dXQd3Qls2E7cCy4yB6Ctc5Ysgvh5oK/txUr92GXHaPWIslsAaLcfeTPSqk2Pg/5pJgR6HcRIcxo9AOc6OZANS30ZqRof9af4y8p+nHQZNYpnUtOizuo/NH8hHXMBFpBW/syWtZKZ6fnhPENIibieJkM4NHvjE74RZMlN+iSaKSNs/PlcYh8TuQolLJ9/9mwQpANuraN4NuRYf63KSImtXelClWZooQBDcIXQMSF3dthKC07z5/NIdKpS2b81sMk8Hrkg085vssQRXnsNpjGXpd0Y13tRPnvJQc52mTl8XJ0XYkTSplzG4xopdPF9x41xMY1pIShg/99ow0IuXlhkinXSev8cKY5a1tkSfhZ8yTlkykPk11abngmmz+uYIW+jmhOXKE6b8ndpHP/6Vo/IZOTj76LbxietW1Lfzo5ymvEZeKskPBoA4xG96Xi7wYhHltK/NiucoDaF+L2JjBDinDpTEuVtnIKjWlGCF/w9BugAkX8DBRB9HCV7j54BFbZ0v9OyqfBZFnLbVveBCEufietz2ktSHDon8sm48B66TGF5wykhrobgvvlhiiA1fJr9TfyRxgsFR5Fwh5CtcyHPeSt67lZCaaE4iHKWyW1AFHxDx3xmmma3xQ36+OPj+m3xOAVqSM4hvJMYH7hqGIMNr7NVjntWFiLLUoSe8N5tjqvg+jUOCbM/Rz9PL8wU1ymofl Rh4xSKlz 9KO7Um/RrhDqx/mSs6bL6Uhh83CSwa0/CZk1q5/rvfkTR2zWA16LVrN9xgTYT43t82G1QdLfbOJNpmjkhkTlf4QoA0366+s2Ma6at402vGI+FgdaIPCGKHDn+NaeYkD46Pw2E5msMItBzVnph5i+1owPk8RJQLNzEhCrJO70kzXcEdQ9YvjZv1r6Kqu0RcSoad89d5X0/PRAyoqdHDs2wnsVaCWMWyU/XDcYPZrWfcEpP0JTqv9OjlYERq0348doEAPu9DfGO4fpvKSTDAbjnzIi+Rc8GWqQa9CLWs3b+3VpPpIFeOgmi535sx3Ncc0mVAn+mtFTSmqWb0D7aeOv/GYyHfFALHaFeQQBN14aICoWmjg/moDChRjaR1lpfB9A2ZzQgIRkVBD0rBQ1dgYdCS1J5E56TNEb32/+T9lqO3GW/CZXGSixVVtAPvmV0258TxPV6UeUM9SlzrZjJVupF4g5FoWoeonVRWK9+ezIH6hitjLRkYzS3ICn17fE5LCjAzAetiNnHU5g/o+sxl/XgvLlLccubnoCpVNjmiGZE8bCym3hRm3rcZuFFhzyHXhzoR4LvZ5FbNNb2ymVC9gaK0BLfdFpARtIvZuJZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 29 May 2023 11:00:23 PDT (-0700), bjorn@kernel.org wrote: > From: Björn Töpel > > The RISC-V port requires that kernel PGD entries are to be > synchronized between MMs. This is done via the vmalloc_fault() > function, that simply copies the PGD entries from init_mm to the > faulting one. > > Historically, faulting in PGD entries have been a source for both bugs > [1], and poor performance. > > One way to get rid of vmalloc faults is by pre-allocating the PGD > entries. Pre-allocating the entries potientially wastes 64 * 4K (65 on > SV39). The pre-allocation function is pulled from Jörg Rödel's x86 > work, with the addition of 3-level page tables (PMD allocations). > > The pmd_alloc() function needs the ptlock cache to be initialized > (when split page locks is enabled), so the pre-allocation is done in a > RISC-V specific pgtable_cache_init() implementation. > > Pre-allocate the kernel PGD entries for the vmalloc/modules area, but > only for 64b platforms. > > Link: https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/ # [1] > Signed-off-by: Björn Töpel > --- > arch/riscv/mm/fault.c | 20 +++------------ > arch/riscv/mm/init.c | 58 +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 62 insertions(+), 16 deletions(-) > > diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c > index 8685f85a7474..6b0b5e517e12 100644 > --- a/arch/riscv/mm/fault.c > +++ b/arch/riscv/mm/fault.c > @@ -230,32 +230,20 @@ void handle_page_fault(struct pt_regs *regs) > return; > > /* > - * Fault-in kernel-space virtual memory on-demand. > - * The 'reference' page table is init_mm.pgd. > + * Fault-in kernel-space virtual memory on-demand, for 32-bit > + * architectures. The 'reference' page table is init_mm.pgd. That wording seems a little odd to me: I think English allows for these "add something after the comma to change the meaning of a sentence" things, but they're kind of complicated. Maybe it's easier to just flip the order? That said, it's very early so maybe it's fine... > * > * NOTE! We MUST NOT take any locks for this case. We may > * be in an interrupt or a critical region, and should > * only copy the information from the master page table, > * nothing more. > */ > - if (unlikely((addr >= VMALLOC_START) && (addr < VMALLOC_END))) { > + if (!IS_ENABLED(CONFIG_64BIT) && > + unlikely(addr >= VMALLOC_START && addr < VMALLOC_END)) { > vmalloc_fault(regs, code, addr); > return; > } > > -#ifdef CONFIG_64BIT > - /* > - * Modules in 64bit kernels lie in their own virtual region which is not > - * in the vmalloc region, but dealing with page faults in this region > - * or the vmalloc region amounts to doing the same thing: checking that > - * the mapping exists in init_mm.pgd and updating user page table, so > - * just use vmalloc_fault. > - */ > - if (unlikely(addr >= MODULES_VADDR && addr < MODULES_END)) { > - vmalloc_fault(regs, code, addr); > - return; > - } > -#endif > /* Enable interrupts if they were enabled in the parent context. */ > if (!regs_irqs_disabled(regs)) > local_irq_enable(); > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 747e5b1ef02d..38bd4dd95276 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -1363,3 +1363,61 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > return vmemmap_populate_basepages(start, end, node, NULL); > } > #endif > + > +#ifdef CONFIG_64BIT > +/* > + * Pre-allocates page-table pages for a specific area in the kernel > + * page-table. Only the level which needs to be synchronized between > + * all page-tables is allocated because the synchronization can be > + * expensive. > + */ > +static void __init preallocate_pgd_pages_range(unsigned long start, unsigned long end, > + const char *area) > +{ > + unsigned long addr; > + const char *lvl; > + > + for (addr = start; addr < end && addr >= start; addr = ALIGN(addr + 1, PGDIR_SIZE)) { > + pgd_t *pgd = pgd_offset_k(addr); > + p4d_t *p4d; > + pud_t *pud; > + pmd_t *pmd; > + > + lvl = "p4d"; > + p4d = p4d_alloc(&init_mm, pgd, addr); > + if (!p4d) > + goto failed; > + > + if (pgtable_l5_enabled) > + continue; > + > + lvl = "pud"; > + pud = pud_alloc(&init_mm, p4d, addr); > + if (!pud) > + goto failed; > + > + if (pgtable_l4_enabled) > + continue; > + > + lvl = "pmd"; > + pmd = pmd_alloc(&init_mm, pud, addr); > + if (!pmd) > + goto failed; > + } > + return; > + > +failed: > + /* > + * The pages have to be there now or they will be missing in > + * process page-tables later. > + */ > + panic("Failed to pre-allocate %s pages for %s area\n", lvl, area); > +} > + > +void __init pgtable_cache_init(void) > +{ > + preallocate_pgd_pages_range(VMALLOC_START, VMALLOC_END, "vmalloc"); > + if (IS_ENABLED(CONFIG_MODULES)) > + preallocate_pgd_pages_range(MODULES_VADDR, MODULES_END, "bpf/modules"); > +} > +#endif > > base-commit: ac9a78681b921877518763ba0e89202254349d1b Reviewed-by: Palmer Dabbelt aside from the build issue, which seems pretty straight-forward. I'm going to drop this from patchwork.