From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AC37C4345F for ; Fri, 19 Apr 2024 20:48:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0BA46B0087; Fri, 19 Apr 2024 16:48:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBB8F6B0088; Fri, 19 Apr 2024 16:48:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5B916B0089; Fri, 19 Apr 2024 16:48:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A7E2F6B0087 for ; Fri, 19 Apr 2024 16:48:31 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6924E161593 for ; Fri, 19 Apr 2024 20:48:31 +0000 (UTC) X-FDA: 82027469622.20.01B1F6E Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf28.hostedemail.com (Postfix) with ESMTP id BCAE6C001D for ; Fri, 19 Apr 2024 20:48:29 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="kpX4f/PI"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713559709; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jdwBN6tcT1rkPndt3s/q21EflrWzoZs9NJ2XL0c5/ME=; b=5yqBRqkoBvRGK4u2VCnXW97sp50rQhND/RVSKydmAozMleE05VTj6hTNbpzsqDWvjIuOSa l51FXl59H7ZnT6Z9HhGGHMGGPPzXfAA5EEm+0i69uxG+jaSSrgZ0URf9O1tN6k2J5IdPGe XlXmdmuDooFkudCYIP/XDP1fUooOb2E= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="kpX4f/PI"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713559709; a=rsa-sha256; cv=none; b=IINZXbQNqAuu3fHZ3ZimS4CTp9XKkCcZUG2SsAmYCi/M0KWMifyewDbTRf9SI93LckRcQU +79jKRbYi0x42uJIll/PXN5oXwb9VC9TJjUcAMrgUixzN6I8LC2z/KbUaS0YOdfFLBKKlG mcx0iUNIUhcez730ZB2fh/4eCTHiTM0= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-434b5abbb0dso87381cf.0 for ; Fri, 19 Apr 2024 13:48:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713559709; x=1714164509; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jdwBN6tcT1rkPndt3s/q21EflrWzoZs9NJ2XL0c5/ME=; b=kpX4f/PI65Nre0B8zNcBm9mAmsUPt78oAZ4mELxEu4Pd3AB68RI02pKFDu/MvHE+UL uybkyamfyxIL+hGv4zpLQ6ma6kQbS1TujogvqAn6spVNumNEhHhR1PkPUr0vmmPOXvXM FQz0pTRQbrpe+6tHX5dFMOpferQc9GGzOYdE6XR00mYmYCunbaTvXgo9gorhDuyg+/6b Qk0yFv+Z88MvCQttHwmslc7juWn8MCArPRyQLEn9L1FEPu7x44Z2RzG3yqmeCDLFIfAF RagUMJ0pxJC2k/ehx8BxE0XIpOi6MZscSBhWE5PdcuAYSKcDEJlVzbj4SjMd0iqghMSy 6Ueg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713559709; x=1714164509; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jdwBN6tcT1rkPndt3s/q21EflrWzoZs9NJ2XL0c5/ME=; b=wLV7jyQgGXxpJjySJivkp8hRKnuDQI+P2iFfPEytKIAmaYF7UHV7Z5MT3gdtDdaRU+ lto2pGCJ15Jvu1Kr+KzBLtYPuNBOiE/134BiDIZlkCGkiCEQH0I/yM1m8FZPpiyNtfnC dyZCV4BQRUqyezAqJs1+SVGbSua2Zzp0dVPBJ5uVLpB314IA0wD8oR1mNasEIchaRZXi qiOT6L38VhtarmojymxM/rscH2h11fT9X1wG84J4cSyDFa2cG4N+HHXJBHKlA7IvwkVW eYENAgqQcTpXYFQlw8NohrLN/BbuD7YMPSSh/AkOgO+e7wVmhso1JeqO0w66ZPu8wleo 1iJg== X-Forwarded-Encrypted: i=1; AJvYcCXwybLDoCi4FxsQp3JHLP+glz3+nWdbGNvIhmzZgJ9EVk8o7IFsjmyzkvkrXuYFz+UZQsOboMnMRRZc2B5dJVW5zdQ= X-Gm-Message-State: AOJu0Yxad36JQrN+6BjEWLdWc0DiA7LwSlZNvX+jH68WFcw8of1oyHtr W7AO+w+bnoYTVxJJSzPJRhTA+9fUKF9Tgw0sofTDDLCocqGfcshcqtGTs6R17fquK3EHEad9tPV 0iawpiS+V0E+e0lCZ6GvrYH4IPcIL/Ugsfexu X-Google-Smtp-Source: AGHT+IHnP1X/uJdeFnAFt5SWKpANerf/z+ZZ3d5hKj8mNm8Ve8tSTI9oyLDn94kRt0shtjukZWu2UUMD2xkTzRuS724= X-Received: by 2002:ac8:5309:0:b0:437:b985:8523 with SMTP id t9-20020ac85309000000b00437b9858523mr8583qtn.25.1713559708789; Fri, 19 Apr 2024 13:48:28 -0700 (PDT) MIME-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> <20240401232946.1837665-6-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Fri, 19 Apr 2024 13:47:52 -0700 Message-ID: Subject: Re: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging To: David Matlack Cc: Andrew Morton , Paolo Bonzini , Yu Zhao , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BCAE6C001D X-Rspam-User: X-Stat-Signature: o5u8853ki7inkq13o1dx9tuprpwzwm6n X-HE-Tag: 1713559709-203122 X-HE-Meta: U2FsdGVkX196C4H7OOiGuqRc4tMChLT3Trgf44MZX+jbdKW1CBCTG2+SmBK69QYJwayMyqFywj1X64qqloBQgsesEk7VFdOlgnqAXbnwUU3F0/rGqfZCPkJ/6fPkxB5sRM7k99MtTxUPaMrHzNgBIYa+RUTKGjY1IcwPtN/cd4vdol0Lj72S9y+WKkMFTzIdhTtAUgCJS8768PbZOoVLIMNClq24RRtGC3y62miGc7tYICE5ficYQheDdsWdMuxsbxwTDhhRrJiesVOt5MuGI+sv2hIm36mAVEYldAFvMuqAW+NLDEGBTifsrwXOR0Tlt0HIK8h8NmMEoD9wMLw3nPdsHzANQd1T9lefwO/fiMipWokEpe4BVjnNwqpilqfvfUjkcvFbVOa2yPFDe6mslGV7SPEhEFYa8soItHTHn/WJXggc+WKJeuJOA5SP2LYDepNr8g9l6EzIaRpI1pWReWo/ITHhpWZRZdsFlCSLAW5HCpCZe1EjL8KjtWG67rlQCnlSpvkBL2UbALe4QiN6fjUSk/5t0sIyOHludJhDqMbRKY5GxaTmPuk99wfDhbpjoc82Qw0/JnV8bafFwudPJ3GVLdWYT1LG6L/J8h9ZsIxsQFP5PMIPZrknVX8/G2tJ9LcFu39KUtdBkVbx8l4DUVc2gT8L7dccUed63Bft1ChTZB9aSuy6+u9sUfSlBT3cbXxYN4IO8aA4zoQqX5NDfq+Ye6eX+Eai70wYVqa8sbH6tvrkE1zooe7n1HsoXqkwCu7SeB3870Li0GZRdM/Jpvw9/58CvJ2e+7ouO9lK6CdCYHBSZKgD5uzp4MJzzUlfeH5GI4if7dArE4WLOKS3dZphZ4vVtZRl0ZPMmEUfSdigVtVopoklY6ev0Ekqv3yD8gGR88niAKHClzgsadZXZXz98jHB9nhA/dKOth6nXuMMZ5HZNMti1nhrjYBYKNAcAQ39ynnAyyvXpeC+sdc 1WsPsXGv H+0NI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 11, 2024 at 10:28=E2=80=AFAM David Matlack wrote: > > On 2024-04-11 10:08 AM, David Matlack wrote: > > On 2024-04-01 11:29 PM, James Houghton wrote: > > > Only handle the TDP MMU case for now. In other cases, if a bitmap was > > > not provided, fallback to the slowpath that takes mmu_lock, or, if a > > > bitmap was provided, inform the caller that the bitmap is unreliable. > > > > > > Suggested-by: Yu Zhao > > > Signed-off-by: James Houghton > > > --- > > > arch/x86/include/asm/kvm_host.h | 14 ++++++++++++++ > > > arch/x86/kvm/mmu/mmu.c | 16 ++++++++++++++-- > > > arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++++++- > > > 3 files changed, 37 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/k= vm_host.h > > > index 3b58e2306621..c30918d0887e 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -2324,4 +2324,18 @@ int memslot_rmap_alloc(struct kvm_memory_slot = *slot, unsigned long npages); > > > */ > > > #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) > > > > > > +#define kvm_arch_prepare_bitmap_age kvm_arch_prepare_bitmap_age > > > +static inline bool kvm_arch_prepare_bitmap_age(struct mmu_notifier *= mn) > > > +{ > > > + /* > > > + * Indicate that we support bitmap-based aging when using the TDP= MMU > > > + * and the accessed bit is available in the TDP page tables. > > > + * > > > + * We have no other preparatory work to do here, so we do not nee= d to > > > + * redefine kvm_arch_finish_bitmap_age(). > > > + */ > > > + return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled > > > + && shadow_accessed_mask; > > > +} > > > + > > > #endif /* _ASM_X86_KVM_HOST_H */ > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 992e651540e8..fae1a75750bb 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -1674,8 +1674,14 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_g= fn_range *range) > > > { > > > bool young =3D false; > > > > > > - if (kvm_memslots_have_rmaps(kvm)) > > > + if (kvm_memslots_have_rmaps(kvm)) { > > > + if (range->lockless) { > > > + kvm_age_set_unreliable(range); > > > + return false; > > > + } > > > > If a VM has TDP MMU enabled, supports A/D bits, and is using nested > > virtualization, MGLRU will effectively be blind to all accesses made by > > the VM. > > > > kvm_arch_prepare_bitmap_age() will return true indicating that the > > bitmap is supported. But then kvm_age_gfn() and kvm_test_age_gfn() will > > return false immediately and indicate the bitmap is unreliable because = a > > shadow root is allocate. The notfier will then return > > MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE. > > > > Looking at the callers, MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE is never > > consumed or used. So I think MGLRU will assume all memory is > > unaccessed? > > > > One way to improve the situation would be to re-order the TDP MMU > > function first and return young instead of false, so that way MGLRU at > > least has visibility into accesses made by L1 (and L2 if EPT is disable > > in L2). But that still means MGLRU is blind to accesses made by L2. > > > > What about grabbing the mmu_lock if there's a shadow root allocated and > > get rid of MMU_NOTIFIER_YOUNG_BITMAP_UNRELIABLE altogether? > > > > if (kvm_memslots_have_rmaps(kvm)) { > > write_lock(&kvm->mmu_lock); > > young |=3D kvm_handle_gfn_range(kvm, range, kvm_age_rmap)= ; > > write_unlock(&kvm->mmu_lock); > > } > > > > The TDP MMU walk would still be lockless. KVM only has to take the > > mmu_lock to collect accesses made by L2. > > > > kvm_age_rmap() and kvm_test_age_rmap() will need to become bitmap-aware > > as well, but that seems relatively simple with the helper functions. > > Wait, even simpler, just check kvm_memslots_have_rmaps() in > kvm_arch_prepare_bitmap_age() and skip the shadow MMU when processing a > bitmap request. > > i.e. > > static inline bool kvm_arch_prepare_bitmap_age(struct kvm *kvm, struct mm= u_notifier *mn) > { > /* > * Indicate that we support bitmap-based aging when using the TDP= MMU > * and the accessed bit is available in the TDP page tables. > * > * We have no other preparatory work to do here, so we do not nee= d to > * redefine kvm_arch_finish_bitmap_age(). > */ > return IS_ENABLED(CONFIG_X86_64) > && tdp_mmu_enabled > && shadow_accessed_mask > && !kvm_memslots_have_rmaps(kvm); > } > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > { > bool young =3D false; > > if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) > young =3D kvm_handle_gfn_range(kvm, range, kvm_age_rmap); > > if (tdp_mmu_enabled) > young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); > > return young; > } > > bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > { > bool young =3D false; > > if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps(kvm)) > young =3D kvm_handle_gfn_range(kvm, range, kvm_test_age_r= map); > > if (tdp_mmu_enabled) > young |=3D kvm_tdp_mmu_test_age_gfn(kvm, range); > > return young; Yeah I think this is the right thing to do. Given your other suggestions (on patch 3), I think this will look something like this -- let me know if I've misunderstood something: bool check_rmap =3D !bitmap && kvm_memslot_have_rmaps(kvm); if (check_rmap) KVM_MMU_LOCK(kvm); rcu_read_lock(); // perhaps only do this when we don't take the MMU lock? if (check_rmap) kvm_handle_gfn_range(/* ... */ kvm_test_age_rmap) if (tdp_mmu_enabled) kvm_tdp_mmu_test_age_gfn() // modified to be RCU-safe rcu_read_unlock(); if (check_rmap) KVM_MMU_UNLOCK(kvm); > } > > Sure this could race with the creation of a shadow root but so can the > non-bitmap code.