From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3945EB64D7 for ; Tue, 20 Jun 2023 08:01:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 615168D0003; Tue, 20 Jun 2023 04:01:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C4948D0001; Tue, 20 Jun 2023 04:01:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48C518D0003; Tue, 20 Jun 2023 04:01:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3A1408D0001 for ; Tue, 20 Jun 2023 04:01:26 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EC5671206F7 for ; Tue, 20 Jun 2023 08:01:25 +0000 (UTC) X-FDA: 80922381330.13.D211629 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf22.hostedemail.com (Postfix) with ESMTP id EF3AFC0030 for ; Tue, 20 Jun 2023 08:01:23 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=6oEmMLuU; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687248084; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y0soM3eBW17TUN7zlLidBNvK+EaFBcT8rKMwsr5hx58=; b=hZ3EOckjnhAfkifjOa2xRGrFXej8SUBNJdXmGvV69IOIl8J/kATmdQJEI1zabmgoGCeuKy jSKpnvyU869CmElAX8wp1nq6SQp8bahX4eQzagBqBumu8xjijsthxDStiFxWzKZxBWotBv FUfXHFUDWeUyZbI7SATBHMs5zSTBhGg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=6oEmMLuU; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687248084; a=rsa-sha256; cv=none; b=pkWUrGfRqNHqUeFtUBcocIQnaLu0YShvE7Y6A+2LnYVV3Q31hSMJxPhXukAtyBET9oOOSK S6AFeFOqdWZrO3/Obv6s7c0PfLvCsqvdIpAZ+bRyZA/+vi9IgBjnfeajY21jbSxy2+Sg3C cuH+XDjwvHlBT6s4Iz/jW5zOw+Uf2Oo= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-3fde9bfb3c8so434171cf.0 for ; Tue, 20 Jun 2023 01:01:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687248083; x=1689840083; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y0soM3eBW17TUN7zlLidBNvK+EaFBcT8rKMwsr5hx58=; b=6oEmMLuUKY8JFmpDg5+v8E4MdTfurRjLoX1ABWiD8rzU/LoDjwWm0ElBunYvvTrVA9 8jootSjhRUZ8mYyleJGBxXK5CcUGV9LC89/FsE1fwM3MFc18DbQLdNMT/KUlg2oXiLnf 3hYu82RMtbq9gvVpHdRYAhY1c5cul9XOj4lfNfa8XuMmfActB+whqEOf7AttaUfJ7c0C 3vzpiNwE1EtKzD+/va+/gJE+1XGIbiurMGkbGIv3LXcIiB+WsYskiA7/AEN02+rUbgu0 NlCzIuWkeu5kq2y4PCTaglIB6hcUeYig1h/77sLoDyWgn/n47w+phuzDHthZW/SS7dfY 7K6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687248083; x=1689840083; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y0soM3eBW17TUN7zlLidBNvK+EaFBcT8rKMwsr5hx58=; b=H/MVGxP56bNYoW3mElBF0Jjqys7mJXdhGk7t2ELxynQMbHnZflS5L5DzGhlF5OFrYH /UwxBLoYHhYBVuawQEEHzqPuNRr3Yu1sYHqnhL3YcuA0hhS2xgfnolclTF9la06QAier fzD3F4uM1CahCly807ffcxeyd3nJrB5SkCt8QkjPMjM5OOBp1C0L+qpGgZyGYQSvv5cd lwGXOUyamGQ86NgXB+1XiV8MVrLMvvDKlvPYadJXtVL0exBGPVGeCVxv/smmBetFOWF2 32YxSDj0z1fSa7Deb+ohgPsHbX5bUIyASuCEK6N4DpU4zZGKO277iDqT0eNNpPG10hCf AsCA== X-Gm-Message-State: AC+VfDwBDdSKjyB9Jb/cM2k4I5u1eDi03L6kKmonuH2bBjAslCN+I1aZ +bREEEKUC2qH9WxbDGY5ujZSHWl1KzDkepPYU6yYOg== X-Google-Smtp-Source: ACHHUZ4xBX0RlLFDAXi6zryiB7+5YG80ltFJqS/gHVwlzdVsIwOci9YnZ/I3ji95of7DLw9rWv+GufohPkCcn7y0vzA= X-Received: by 2002:a05:622a:589:b0:3ed:210b:e698 with SMTP id c9-20020a05622a058900b003ed210be698mr1007941qtb.7.1687248082908; Tue, 20 Jun 2023 01:01:22 -0700 (PDT) MIME-Version: 1.0 References: <20230526234435.662652-1-yuzhao@google.com> <20230526234435.662652-7-yuzhao@google.com> In-Reply-To: From: Yu Zhao Date: Tue, 20 Jun 2023 02:00:46 -0600 Message-ID: Subject: Re: [PATCH mm-unstable v2 06/10] kvm/powerpc: make radix page tables RCU safe To: Nicholas Piggin Cc: Andrew Morton , Paolo Bonzini , Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: zdqrim3th8bxt31tnunbgdxb1b75fsgh X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EF3AFC0030 X-HE-Tag: 1687248083-834098 X-HE-Meta: U2FsdGVkX19o11n4I2TNEBRzZ17BXNs6YLm1VyQh/HpnJ625XLpc/yekZyo1snN0kaW7E+6OxOSLu1kSdCahfl31sgia7ABJv3PLfT1icJqBilt69QEmliuTiuTJQfr2t6ESRMSFmKJoj1C4UhBPGFpJdsdlozNwszJFJXdtkRfadhR+v/dl6oYaKUok0r1J3UnPB5hTmOeLZytWHMPT0Jv3G7SNVJ8wuaYQ3wxl4jqupZaRWkfaiDZy1b2GSAIW1c7BzENUteTQdCGQPZRSlRkfCfeoiOow3JXlDrr8Ght9StZsRXhGTGBUFgzK03nt0X1v6zdZ4HE47UEOSmhdsE7PD38S5U0jkIgdxwMCm/f3PIzoHXocZjzqdEjvhu65arBi1ZQJBxRN+hhAOQE37/0zIGDscvqKUW+xNp3p2hDYQXf/WU8FL2EFTpPUDjw5k+sLVSBumElRzFdE18yeDPKijuv7YKRglPT+ed7Kl6quqzfKn+l0Ku7IW2KjSyHUaCgJrQ3GqcJqASr5FvnZkTgEYmEDoyMjVnqQ4f1Bc9uiWICnzxdMweSUXeZMmYrIJ1M7FSxnXcWMAxMllHBhWgqFHjZbKGTCMRSgx9QufeGPl4aKs5BHtFyyFhRb+aBIrcgd2aT0djNVTtyL4oOqMoqb1MA1oJOvsiZokfD8smp5OcZmWrF8w1HYeXBfL0CAha9rD2KXxI6hzMHPcoQ8gLIcWGJTHMqOS2TbM/EI2rDNUypR05toBWDXNG2RqFdJisOvHK67wVRBk4r6RDabUfuwRMWgYGHIup+UzsLE09G8Nl/7J0LSapNLYYrkHR/lukFdTkohD/lF07jtI7gukXOSKgH/s7BNnBzo4Kix6UpX9v3OfoWoptA4RN08tkFqsV2P1hx3iMYD86VeJCB5m+IlP9h7caTxHNfG1PUqN7PqefMJJo/b3lXyem1tcmXBOqGwF9UrEKPrOQ2/F4m st3GVU7c qdTskj6WHv0t7ReEPVgouPOz7AvNyhzgqF2uJQ6BLG6fdSBcJl7FY4kOjYAFcdZTp1Mjkd4T5wEbuZDUAFTxWdXFHRs9TtH8xiiOG/f44Lzqm+n9gve3nw7S0259qHRUgOod6hUe8ebEY47wGgD0KjTz+/haKjpC4L4Uwy/SC/I1T96KWXLtlFfRVYQUjJUuGJw71cKHsZN6FSn2pc+IHrOR08KIohkCMMKTZOQ47gZBYo4Tnj7KPsRc12XP2SOSs4gDskreQlSN6JJIN4MBujZkwzQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 20, 2023 at 12:33=E2=80=AFAM Nicholas Piggin wrote: > > On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote: > > KVM page tables are currently not RCU safe against remapping, i.e., > > kvmppc_unmap_free_pmd_entry_table() et al. The previous > > Minor nit but the "page table" is not RCU-safe against something. It > is RCU-freed, and therefore some algorithm that accesses it can have > the existence guarantee provided by RCU (usually there still needs > to be more to it). > > > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with > > that operation. > > > > However, the new mmu_notifier_ops member test_clear_young() provides > > a fast path that does not take kvm->mmu_lock. To implement > > kvm_arch_test_clear_young() for that path, orphan page tables need to > > be freed by RCU. > > Short version: clear the referenced bit using RCU instead of MMU lock > to protect against page table freeing, and there is no problem with > clearing the bit in a table that has been freed. > > Seems reasonable. Thanks. All above points taken. > > Unmapping, specifically kvm_unmap_radix(), does not free page tables, > > hence not a concern. > > Not sure if you really need to make the distinction about why the page > table is freed, we might free them via unmapping. The point is just > anything that frees them while there can be concurrent access, right? Correct. > > Signed-off-by: Yu Zhao > > --- > > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/= book3s_64_mmu_radix.c > > index 461307b89c3a..3b65b3b11041 100644 > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) > > { > > unsigned long size =3D sizeof(void *) << RADIX_PTE_INDEX_SIZE; > > > > - kvm_pte_cache =3D kmem_cache_create("kvm-pte", size, size, 0, pte= _ctor); > > + kvm_pte_cache =3D kmem_cache_create("kvm-pte", size, size, > > + SLAB_TYPESAFE_BY_RCU, pte_ctor)= ; > > if (!kvm_pte_cache) > > return -ENOMEM; > > > > size =3D sizeof(void *) << RADIX_PMD_INDEX_SIZE; > > > > - kvm_pmd_cache =3D kmem_cache_create("kvm-pmd", size, size, 0, pmd= _ctor); > > + kvm_pmd_cache =3D kmem_cache_create("kvm-pmd", size, size, > > + SLAB_TYPESAFE_BY_RCU, pmd_ctor)= ; > > if (!kvm_pmd_cache) { > > kmem_cache_destroy(kvm_pte_cache); > > return -ENOMEM; > > KVM PPC HV radix PUD level page tables use the arch/powerpc allocators > (for some reason), which are not RCU freed. I think you need them too? We don't. The use of the arch/powerpc allocator for PUD tables seems appropriate to me because, unlike PMD/PTE tables, we never free PUD tables during the lifetime of a VM: * We don't free PUD/PMD/PTE tables when they become empty, i.e., not mapping any pages but still attached. (We could in theory, as x86/aarch64 do.) * We have to free PMD/PTE tables when we replace them with 1GB/2MB pages. (Otherwise we'd lose track of detached tables.) And we currently don't support huge pages at P4D level, so we never detach and free PUD tables.