From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD494C4345F for ; Thu, 11 Apr 2024 17:28:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C60B6B008A; Thu, 11 Apr 2024 13:28:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29D126B008C; Thu, 11 Apr 2024 13:28:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118A26B0092; Thu, 11 Apr 2024 13:28:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E806F6B008A for ; Thu, 11 Apr 2024 13:28:43 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A4A71C0C71 for ; Thu, 11 Apr 2024 17:28:43 +0000 (UTC) X-FDA: 81997935726.03.DF99BF3 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf19.hostedemail.com (Postfix) with ESMTP id CB3261A000D for ; Thu, 11 Apr 2024 17:28:41 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oPQ5RUth; spf=pass (imf19.hostedemail.com: domain of dmatlack@google.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712856521; a=rsa-sha256; cv=none; b=w9MsbskhyTnpS5rI9LC7X3TIV+XS8sG6OYVfv3kIEfdAjHa9bdDPXWP+mieIOXRFhBUFKz XAMB7jnvP/qbxchxkXNRDXsCM6mndZtpqMekE24DIQPRxGSLRb8D1fAhopuGHjQviinluH 2ZpoxKEL1ixugfW3hzQXJayOrB2YfVQ= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oPQ5RUth; spf=pass (imf19.hostedemail.com: domain of dmatlack@google.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712856521; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=89Ohds8IaE0Yxua4zAHj3lBzHQSIcJMIu/8p76cYkP8=; b=OWpv/3ksO+/wUrsTD0LIf7u+kZ7uJacPhcG7Asjx02W1X2ExCbfy6FFZ/HSvtsCIACTa3A pLFKlCe1LsnNsATbgNAICZhEvjhJbE0RUxzggnNjOVmuYmqw6MUZOtt0vtU8JagjITp5H5 h1qqgCwloMHPCCbZfI9Gq3A9RM7E66g= Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6eddff25e4eso103238b3a.3 for ; Thu, 11 Apr 2024 10:28:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712856521; x=1713461321; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=89Ohds8IaE0Yxua4zAHj3lBzHQSIcJMIu/8p76cYkP8=; b=oPQ5RUthKdeFvTTgto9j2u69UO3XZPReQ+xTC0b9d4cjZhzU0B13AFnJYVTONgyxBM z9ZIvOEljK1ADqT7v0/BEaabGfl3dbVuVkvafmCfhNuVLXD+jrJmkcrOyhDe56T8qi8d Yfp+HHLAY8DRnpGsWRLCl6j7H2SfNLOOpFRmFSVdTjAS2ZFQGKJZ2zPdmECYZv3HkSbR FsT27EauQ8pBmRYU4FoUzs1Dty2o37ykTab++mt9aHGunxHVM3sgOx665/mWyiFuNjU+ uN0YFgioTmz2scRijU1sPV2wLeOl9Zmz+yrEeBgb5W54kfi8YO4cwIejm7m0WrrON8CD lXrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712856521; x=1713461321; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=89Ohds8IaE0Yxua4zAHj3lBzHQSIcJMIu/8p76cYkP8=; b=sN5+VdCYUmUNd4sniSUkXyWNAXeDwfjZxSz6fAQ/UjbBikBCEwcEa5NzJUqr6IdNWQ 7FJwxLT3sC5gcSuNY+K9NvjCvedvUYzn/htBg2Y54lP8DWWjJg9Q5bVG4gULPAVq82hb QwMvukdjp7knrh8eJQcxWcLBft7eUVcwdhDjYIZHxunHKV1jx7FabJBwTXY+MQvVnOPP jQ4mUf+mJ2aBydsI5gHpiBNZxfmIVqi8CqhjyKUvfN0GPqPntcVEBNnCeXDVj9im9Hec b39+Sv/yUTx7OawsNj0bguS1r226b/FLX/Q0bHjnbN+vlamBIjx+SrVqZbe54SiXsZcx BmVA== X-Forwarded-Encrypted: i=1; AJvYcCU91iyOuce/gQ0a39sjx3g2NmRJ7TACT0G4gVf42IKN6g0yQ31rYH9DI2Onzl67zbxSZLmY5v6ENaab9+UgLEHmvV4= X-Gm-Message-State: AOJu0YwilSlNKzso6QvLt7FWWKKpHTl4EFuitW+oJYiOK4anHnThkdUA DHnaQKOxCN5VszFwDF+EiN4nJIM899OW0s/erBBURKpCmIA8uiHlVz9voJgL8A== X-Google-Smtp-Source: AGHT+IGk9fMHrTOJhbVacJwX4HkkiAnG2/NY48bYHirkrQIHsXa3eN5po5lEPEP/llFVVtvrszeHoA== X-Received: by 2002:a05:6a00:2d22:b0:6ec:e733:c66f with SMTP id fa34-20020a056a002d2200b006ece733c66fmr466537pfb.0.1712856520380; Thu, 11 Apr 2024 10:28:40 -0700 (PDT) Received: from google.com (210.73.125.34.bc.googleusercontent.com. [34.125.73.210]) by smtp.gmail.com with ESMTPSA id m14-20020a63580e000000b005dc4806ad7dsm1340545pgb.40.2024.04.11.10.28.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Apr 2024 10:28:39 -0700 (PDT) Date: Thu, 11 Apr 2024 10:28:35 -0700 From: David Matlack To: James Houghton Cc: Andrew Morton , Paolo Bonzini , Yu Zhao , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: Re: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging Message-ID: References: <20240401232946.1837665-1-jthoughton@google.com> <20240401232946.1837665-6-jthoughton@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CB3261A000D X-Stat-Signature: uwgmmqt85grygzttz8w9sndfxmap5pst X-Rspam-User: X-HE-Tag: 1712856521-845411 X-HE-Meta: U2FsdGVkX1/St6lm8qpKA7P7RpPbE2eTDD6m6yK7iyawM+NU/+d/TFioz/MU1/Te3/FVD2PALP0D8qjyqffRso9USApF3EPrEi5rot3J4I9Alx5YspWVedZWkckXxZyx+6INNyZN+y8qVyNREa35wCCUkD3v+HFUUnimH4j0YPxf/TePqjHU7VkgVRZSxXPViZAZy8rBdgR+DysH1aNfVwdaR8P/pALXpWI58RBIprMPdm7+mX4ddp2nP8nB/8zKArl01ofl1FkyLhJ+l8rg2RWvQHKSmptkDXSKNLiiaTN1bo22Mt46ZWTl/WeB8vHcpjY4fuNf7a5QbdUqA0pSZ0ANA/3hafvaQhTFoaMNGshsu7LOXGIwwqf31zxHPNXKSXHF+p0Qww6Wxpob679GV90A5fk0L7OiS7kYfexwpjCRE3y4LXOD0+IzPsevG4xPtTOSNaJII6TgLkpfr6lWEIIL3f41Q0+g15IYZoxSVJ9dhCsZBZ5Fjtjo+9co+Bz+fZRuThKYUqCToOJjp/p1h5keq+z+TkGVzWPge6cmSMTBf92EuZXJmNCkrZ8VE4atwKyg/yyixgiXihZvXdYmOTI/fHg26FrLDByIQ6QMfrnQi1jgVSBri5XfZCZy9PsTecqVY8MkHt/0p2HB88M/a+h0E0qx7twjUgvV2r/3wU43CSxTdQgq8r4giMVlEDfAoUM3phcvrmnu4RCUIxJuf3UXwk74v/4kljFLQeBZ8bIANuSkJJHxC33Krb3WtKhe9ds0XGrW6toooGxRgBCkDIUyOqL6h23ETx5CjIzWsjL1abdPEjmslcEpXznRynjyz1uC6wWCeD3JFoS8pTsr+gUUu9i8keHNfjZ2qlVqyDNbd33Ao5F3Efon/FEpJxQwpJQxZvxz7AS4nxZ09NKADnvJqJc6hHacCeZYTCRJcHuQ4Tfpu80KgwIxBH4+g7vjDqkf/s1gxaIDBUVdiBB GbS/bFTi 3apV+/B/q3Gks2+kp3INoYjoO4GdF2vWKeuM9MhDoK1CrQYzSM3IgzrlIkktlU+N6s48bVOfg0BztTnuq92z6iaAV3JqfP3NeaKddE4t21b3G7tGI6n+DzRY9bIU9tYx3vWJ3TP29cY73EVUrqnRM44FhhaHK4jrI3cpjsaKqpKE8Cs2cUwSFTGNnjLkwbuxwW2DbpdQ38BfnPjdE1OElTOgaQ2JTkZIH+1+3SbfCGoX3f3a0xS3aXRJdmwHbxsvJA7VXZzIJszZgcLTl9PKZAIb5l+Jhm+hvXHup9IYkTqV0u2v7krLMZH0c5FF0CSUDoxbjvzRyyBmC0Jlpdpauh3p5OFrA1ZYLyvh4cuCia8YbMg7rDQBlO5aRGhpzZCJFrbRnNHR9Dp+obW8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-04-11 10:08 AM, David Matlack wrote: > On 2024-04-01 11:29 PM, James Houghton wrote: > > Only handle the TDP MMU case for now. In other cases, if a bitmap was > > not provided, fallback to the slowpath that takes mmu_lock, or, if a > > bitmap was provided, inform the caller that the bitmap is unreliable. > > > > Suggested-by: Yu Zhao > > Signed-off-by: James Houghton > > --- > > arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++ > > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++-- > > arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++++++- > > 3 files changed, 37 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 3b58e2306621..c30918d0887e 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -2324,4 +2324,18 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); > > */ > > #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) > > > > +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age > > +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *mn) > > +{ > > + /* > > + * Indicate that we support bitmap-based aging when using the TDP MMU > > + * and the accessed bit is available in the TDP page tables. > > + * > > + * We have no other preparatory work to do here, so we do not need to > > + * redefine kvm_arch_finish_bitmap_age(). > > + */ > > + return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled > > + && shadow_accessed_mask; > > +} > > + > > #endif /* _ASM_X86_KVM_HOST_H */ > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 992e651540e8..fae1a75750bb 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1674,8 +1674,14 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > bool young = false; > > > > - if (kvm_memslots_have_rmaps(kvm)) > > + if (kvm_memslots_have_rmaps(kvm)) { > > + if (range->lockless) { > > + kvm_age_set_unreliable(range); > > + return false; > > + } > > If a VM has TDP MMU enabled, supports A/D bits, and is using nested > virtualization, MGLRU will effectively be blind to all accesses made by > the VM. > > kvm_arch_prepare_bitmap_age() will return true indicating that the > bitmap is supported. But then kvm_age_gfn() and kvm_test_age_gfn() will > return false immediately and indicate the bitmap is unreliable because a > shadow root is allocate. The notfier will then return > MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. > > Looking at the callers, MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE is never > consumed or used. So I think MGLRU will assume all memory is > unaccessed? > > One way to improve the situation would be to re-order the TDP MMU > function first and return young instead of false, so that way MGLRU at > least has visibility into accesses made by L1 (and L2 if EPT is disable > in L2). But that still means MGLRU is blind to accesses made by L2. > > What about grabbing the mmu_lock if there's a shadow root allocated and > get rid of MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE altogether? > > if (kvm_memslots_have_rmaps(kvm)) { > write_lock(&kvm->mmu_lock); > young |= kvm_handle_gfn_range(kvm, range, kvm_age_rmap); > write_unlock(&kvm->mmu_lock); > } > > The TDP MMU walk would still be lockless. KVM only has to take the > mmu_lock to collect accesses made by L2. > > kvm_age_rmap() and kvm_test_age_rmap() will need to become bitmap-aware > as well, but that seems relatively simple with the helper functions. Wait, even simpler, just check kvm_memslots_have_rmaps() in kvm_arch_prepare_bitmap_age() and skip the shadow MMU when processing a bitmap request. i.e. static inline bool kvm_arch_prepare_bitmap_age(struct kvm *kvm, struct mmu_notifier *mn) { /* * Indicate that we support bitmap-based aging when using the TDP MMU * and the accessed bit is available in the TDP page tables. * * We have no other preparatory work to do here, so we do not need to * redefine kvm_arch_finish_bitmap_age(). */ return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled && shadow_accessed_mask && !kvm_memslots_have_rmaps(kvm); } bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); return young; } Sure this could race with the creation of a shadow root but so can the non-bitmap code.