From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACD10C12002 for ; Wed, 21 Jul 2021 14:58:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3242C61246 for ; Wed, 21 Jul 2021 14:58:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3242C61246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AFB866B0089; Wed, 21 Jul 2021 10:58:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AAD1C6B008A; Wed, 21 Jul 2021 10:58:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99ABF6B008C; Wed, 21 Jul 2021 10:58:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0172.hostedemail.com [216.40.44.172]) by kanga.kvack.org (Postfix) with ESMTP id 7A5EE6B0089 for ; Wed, 21 Jul 2021 10:58:36 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2A3918248047 for ; Wed, 21 Jul 2021 14:58:36 +0000 (UTC) X-FDA: 78386901432.39.A6FE58E Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf16.hostedemail.com (Postfix) with ESMTP id B738BF0017E3 for ; Wed, 21 Jul 2021 14:58:35 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id BBCC06100C; Wed, 21 Jul 2021 14:58:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626879514; bh=5qyI/dHQJICF0PI2wKZT99Bdl/k2f3fmU1dVHwc/8s0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hIb2Qs9AMnjjTxASv+1K/W04SwruC8bJ3jE/kKRcHTiVA/VFjxNybv99GVYp5KJU+ Y9u1tG5xl86/3dBZd1IFXJx/eB/TxK16lWKrYe/jfUd7h9f7i4ljjV8yJb2AUF8m6G LADuVspziD0GkzprmshJpVafgcrM2WoDclvJdF+NK7RXhekmCNf9jPrq0p1WnPMIGa vhSY6VVHZB/R6uitg1ZBSrNiVmFtFbrUTJXFnXDAGBe2bCnbj5i1728wtIOCXad7ge GZYrChF0IGZadDFcqJAeoC7y30hxdZNFReq3dRRpHfMULW6Qa3ftNd3EgPIhn0YGz/ 8aK3eBOhBjhZw== Date: Wed, 21 Jul 2021 15:58:29 +0100 From: Will Deacon To: Sean Christopherson Cc: Alexandru Elisei , Marc Zyngier , linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, linux-mm@kvack.org, Matthew Wilcox , Paolo Bonzini , Quentin Perret , James Morse , Suzuki K Poulose , kernel-team@android.com Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size Message-ID: <20210721145828.GA11003@willie-the-truck> References: <20210717095541.1486210-1-maz@kernel.org> <20210717095541.1486210-2-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B738BF0017E3 X-Stat-Signature: 6ty38z3joqtu1m7jhp67seo8z3t1tpgo Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=hIb2Qs9A; spf=pass (imf16.hostedemail.com: domain of will@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-HE-Tag: 1626879515-312075 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Sean, On Tue, Jul 20, 2021 at 08:33:46PM +0000, Sean Christopherson wrote: > On Tue, Jul 20, 2021, Alexandru Elisei wrote: > > I just can't figure out why having the mmap lock is not needed to walk the > > userspace page tables. Any hints? Or am I not seeing where it's taken? > > Disclaimer: I'm not super familiar with arm64's page tables, but the relevant KVM > functionality is common across x86 and arm64. No need for the disclaimer, there are so many moving parts here that I don't think it's possible to be familiar with them all! Thanks for taking the time to write it up so clearly. > KVM arm64 (and x86) unconditionally registers a mmu_notifier for the mm_struct > associated with the VM, and disallows calling ioctls from a different process, > i.e. walking the page tables during KVM_RUN is guaranteed to use the mm for which > KVM registered the mmu_notifier. As part of registration, the mmu_notifier > does mmgrab() and doesn't do mmdrop() until it's unregistered. That ensures the > mm_struct itself is live. > > For the page tables liveliness, KVM implements mmu_notifier_ops.release, which is > invoked at the beginning of exit_mmap(), before the page tables are freed. In > its implementation, KVM takes mmu_lock and zaps all its shadow page tables, a.k.a. > the stage2 tables in KVM arm64. The flow in question, get_user_mapping_size(), > also runs under mmu_lock, and so effectively blocks exit_mmap() and thus is > guaranteed to run with live userspace tables. Unless I missed a case, exit_mmap() only runs when mm_struct::mm_users drops to zero, right? The vCPU tasks should hold references to that afaict, so I don't think it should be possible for exit_mmap() to run while there are vCPUs running with the corresponding page-table. > Looking at the arm64 code, one thing I'm not clear on is whether arm64 correctly > handles the case where exit_mmap() wins the race. The invalidate_range hooks will > still be called, so userspace page tables aren't a problem, but > kvm_arch_flush_shadow_all() -> kvm_free_stage2_pgd() nullifies mmu->pgt without > any additional notifications that I see. x86 deals with this by ensuring its > top-level TDP entry (stage2 equivalent) is valid while the page fault handler is > running. But the fact that x86 handles this race has me worried. What am I missing? I agree that, if the race can occur, we don't appear to handle it in the arm64 backend. Cheers, Will