From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F402C433F5 for ; Thu, 23 Sep 2021 18:39:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27D2B61241 for ; Thu, 23 Sep 2021 18:39:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 27D2B61241 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 84DF3900002; Thu, 23 Sep 2021 14:39:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D69B6B0071; Thu, 23 Sep 2021 14:39:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64FF5900002; Thu, 23 Sep 2021 14:39:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id 503DE6B006C for ; Thu, 23 Sep 2021 14:39:25 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 063B718132D1D for ; Thu, 23 Sep 2021 18:39:24 +0000 (UTC) X-FDA: 78619701090.09.7820309 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 6A0585060480 for ; Thu, 23 Sep 2021 18:39:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632422363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gl3rmZvVThFaeFfRLTdG0cOEV5CZHPaggzd6TpSp/rE=; b=Ypp8xa3dFBtHoCoZYmw/lRo0lesloqFE8XKoboiQ3atCOVAQ2DZ/nXHVX8cuSm8GP10FoG i8rZyR36T9Nvhh5jkbjXakXK2eR1HPEKd+XSKQHThbxtGs6/unTfY1hQxINTWdlrkmRph5 xdaSkAONNekb5abvQAtcYNsWgx+T0kU= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-320-bIx4a4azP2220OTrjDNDVA-1; Thu, 23 Sep 2021 14:39:22 -0400 X-MC-Unique: bIx4a4azP2220OTrjDNDVA-1 Received: by mail-wr1-f72.google.com with SMTP id f11-20020adfc98b000000b0015fedc2a8d4so5972634wrh.0 for ; Thu, 23 Sep 2021 11:39:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=gl3rmZvVThFaeFfRLTdG0cOEV5CZHPaggzd6TpSp/rE=; b=S6YelA/X8ezinVY2+BtKbxS4raEhVsI14Sd47Sejxb+1q2DcJJ91pj+bR/bVwIENpB ko2QG4oFDpd2fF5M24HlRbmFVliwm6WBWcCsu3oSkwiv1ZuFs/kAPRZ4lQIOyrdmhw4D sOPz4LtzCZqQx6XfIaDxXGn+L4J79UGcDsFjvW6f0wEhwyjZD6K8aPMGbfN1EaoWn+/r 35GDkRQzBsXdSgb7MIOSwkO3/+qCizpMaQQQooCPIC8vGCYJhPtonr1W5Cza6A3D+DWA FLyC+oqHRhayRpm/coSJWHQ7+ldMKwCWXw3JMlqbc1EqYb8MLxOaQ8xpX3jcckSSzFdN u2ag== X-Gm-Message-State: AOAM530M28mKjrsM6zCtebXGDayPwrmYHIdzwKX+SRxkiM1xQ13W/lBZ /DH6R0JPKW3bkS8n29AvRAru+Ky45w9IPE3V2okJAn4iqUU1u+EhdbN9daeDhimOoKEdcZi+gFH 38opgAyqzMHc= X-Received: by 2002:a1c:9851:: with SMTP id a78mr17377015wme.107.1632422361374; Thu, 23 Sep 2021 11:39:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwIFESmjQrx6emyOVymLS1YeXQOOTBEsXjK57+Zczby+mJ8smnriGz36fqHD+SNd+0fKZevBw== X-Received: by 2002:a1c:9851:: with SMTP id a78mr17376966wme.107.1632422361061; Thu, 23 Sep 2021 11:39:21 -0700 (PDT) Received: from work-vm (cpc109011-salf6-2-0-cust1562.10-2.cable.virginm.net. [82.29.118.27]) by smtp.gmail.com with ESMTPSA id x21sm9982202wmc.14.2021.09.23.11.39.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Sep 2021 11:39:20 -0700 (PDT) Date: Thu, 23 Sep 2021 19:39:17 +0100 From: "Dr. David Alan Gilbert" To: Brijesh Singh Cc: x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-crypto@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Joerg Roedel , Tom Lendacky , "H. Peter Anvin" , Ard Biesheuvel , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Andy Lutomirski , Dave Hansen , Sergio Lopez , Peter Gonda , Peter Zijlstra , Srinivas Pandruvada , David Rientjes , Dov Murik , Tobin Feldman-Fitzthum , Borislav Petkov , Michael Roth , Vlastimil Babka , "Kirill A . Shutemov" , Andi Kleen , tony.luck@intel.com, marcorr@google.com, sathyanarayanan.kuppuswamy@linux.intel.com Subject: Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Message-ID: References: <20210820155918.7518-1-brijesh.singh@amd.com> <20210820155918.7518-22-brijesh.singh@amd.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/2.0.7 (2021-05-04) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline X-Rspamd-Queue-Id: 6A0585060480 X-Stat-Signature: 66uodp35bdx8715kfknn6ap663xxaox5 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ypp8xa3d; spf=none (imf05.hostedemail.com: domain of dgilbert@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=dgilbert@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-HE-Tag: 1632422364-410268 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: * Brijesh Singh (brijesh.singh@amd.com) wrote: >=20 > On 9/22/21 1:55 PM, Dr. David Alan Gilbert wrote: > > * Brijesh Singh (brijesh.singh@amd.com) wrote: > >> Implement a workaround for an SNP erratum where the CPU will incorre= ctly > >> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with= the > >> RMP entry of a VMCB, VMSA or AVIC backing page. > >> > >> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and = AVIC > >> backing pages as "in-use" in the RMP after a successful VMRUN. Th= is is > >> done for _all_ VMs, not just SNP-Active VMs. > > Can you explain what 'globally enabled' means? >=20 > This means that SNP is enabled in=A0 host SYSCFG_MSR.Snp=3D1. Once its > enabled then RMP checks are enforced. >=20 >=20 > > Or more specifically, can we trip this bug on public hardware that ha= s > > the SNP enabled in the bios, but no SNP init in the host OS? >=20 > Enabling the SNP support on host is 3 step process: >=20 > step1 (bios): reserve memory for the RMP table. >=20 > step2 (host): initialize the RMP table memory, set the SYSCFG msr to > enable the SNP feature >=20 > step3 (host): call the SNP_INIT to initialize the SNP firmware (this is > needed only if you ever plan to launch SNP guest from this host). >=20 > The "SNP globally enabled" means the step 1 to 2. The RMP checks are > enforced as soon as step 2 is completed. So I think that means we don't need to backport this to older kernels that don't know about SNP but might run on SNP enabled hardware (1), sinc= e those kernels won't do step2. Dave > thanks >=20 > > > > Dave > > > >> If the hypervisor accesses an in-use page through a writable transla= tion, > >> the CPU will throw an RMP violation #PF. On early SNP hardware, if = an > >> in-use page is 2mb aligned and software accesses any part of the ass= ociated > >> 2mb region with a hupage, the CPU will incorrectly treat the entire = 2mb > >> region as in-use and signal a spurious RMP violation #PF. > >> > >> The recommended is to not use the hugepage for the VMCB, VMSA or > >> AVIC backing page. Add a generic allocator that will ensure that the= page > >> returns is not hugepage (2mb or 1gb) and is safe to be used when SEV= -SNP > >> is enabled. > >> > >> Co-developed-by: Marc Orr > >> Signed-off-by: Marc Orr > >> Signed-off-by: Brijesh Singh > >> --- > >> arch/x86/include/asm/kvm-x86-ops.h | 1 + > >> arch/x86/include/asm/kvm_host.h | 1 + > >> arch/x86/kvm/lapic.c | 5 ++++- > >> arch/x86/kvm/svm/sev.c | 35 +++++++++++++++++++++++++++= +++ > >> arch/x86/kvm/svm/svm.c | 16 ++++++++++++-- > >> arch/x86/kvm/svm/svm.h | 1 + > >> 6 files changed, 56 insertions(+), 3 deletions(-) > >> > >> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/a= sm/kvm-x86-ops.h > >> index a12a4987154e..36a9c23a4b27 100644 > >> --- a/arch/x86/include/asm/kvm-x86-ops.h > >> +++ b/arch/x86/include/asm/kvm-x86-ops.h > >> @@ -122,6 +122,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush) > >> KVM_X86_OP_NULL(migrate_timers) > >> KVM_X86_OP(msr_filter_changed) > >> KVM_X86_OP_NULL(complete_emulated_msr) > >> +KVM_X86_OP(alloc_apic_backing_page) > >> =20 > >> #undef KVM_X86_OP > >> #undef KVM_X86_OP_NULL > >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/= kvm_host.h > >> index 974cbfb1eefe..5ad6255ff5d5 100644 > >> --- a/arch/x86/include/asm/kvm_host.h > >> +++ b/arch/x86/include/asm/kvm_host.h > >> @@ -1453,6 +1453,7 @@ struct kvm_x86_ops { > >> int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); > >> =20 > >> void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector)= ; > >> + void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); > >> }; > >> =20 > >> struct kvm_x86_nested_ops { > >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > >> index ba5a27879f1d..05b45747b20b 100644 > >> --- a/arch/x86/kvm/lapic.c > >> +++ b/arch/x86/kvm/lapic.c > >> @@ -2457,7 +2457,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, i= nt timer_advance_ns) > >> =20 > >> vcpu->arch.apic =3D apic; > >> =20 > >> - apic->regs =3D (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); > >> + if (kvm_x86_ops.alloc_apic_backing_page) > >> + apic->regs =3D static_call(kvm_x86_alloc_apic_backing_page)(vcpu)= ; > >> + else > >> + apic->regs =3D (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); > >> if (!apic->regs) { > >> printk(KERN_ERR "malloc apic regs error for vcpu %x\n", > >> vcpu->vcpu_id); > >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > >> index 1644da5fc93f..8771b878193f 100644 > >> --- a/arch/x86/kvm/svm/sev.c > >> +++ b/arch/x86/kvm/svm/sev.c > >> @@ -2703,3 +2703,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_= vcpu *vcpu, u8 vector) > >> break; > >> } > >> } > >> + > >> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) > >> +{ > >> + unsigned long pfn; > >> + struct page *p; > >> + > >> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > >> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > >> + > >> + /* > >> + * Allocate an SNP safe page to workaround the SNP erratum where > >> + * the CPU will incorrectly signal an RMP violation #PF if a > >> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA > >> + * or AVIC backing page. The recommeded workaround is to not use t= he > >> + * hugepage. > >> + * > >> + * Allocate one extra page, use a page which is not 2mb aligned > >> + * and free the other. > >> + */ > >> + p =3D alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1); > >> + if (!p) > >> + return NULL; > >> + > >> + split_page(p, 1); > >> + > >> + pfn =3D page_to_pfn(p); > >> + if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) { > >> + pfn++; > >> + __free_page(p); > >> + } else { > >> + __free_page(pfn_to_page(pfn + 1)); > >> + } > >> + > >> + return pfn_to_page(pfn); > >> +} > >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > >> index 25773bf72158..058eea8353c9 100644 > >> --- a/arch/x86/kvm/svm/svm.c > >> +++ b/arch/x86/kvm/svm/svm.c > >> @@ -1368,7 +1368,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vc= pu) > >> svm =3D to_svm(vcpu); > >> =20 > >> err =3D -ENOMEM; > >> - vmcb01_page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > >> + vmcb01_page =3D snp_safe_alloc_page(vcpu); > >> if (!vmcb01_page) > >> goto out; > >> =20 > >> @@ -1377,7 +1377,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vc= pu) > >> * SEV-ES guests require a separate VMSA page used to contain > >> * the encrypted register state of the guest. > >> */ > >> - vmsa_page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > >> + vmsa_page =3D snp_safe_alloc_page(vcpu); > >> if (!vmsa_page) > >> goto error_free_vmcb_page; > >> =20 > >> @@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm) > >> return 0; > >> } > >> =20 > >> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu) > >> +{ > >> + struct page *page =3D snp_safe_alloc_page(vcpu); > >> + > >> + if (!page) > >> + return NULL; > >> + > >> + return page_address(page); > >> +} > >> + > >> static struct kvm_x86_ops svm_x86_ops __initdata =3D { > >> .hardware_unsetup =3D svm_hardware_teardown, > >> .hardware_enable =3D svm_hardware_enable, > >> @@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initda= ta =3D { > >> .complete_emulated_msr =3D svm_complete_emulated_msr, > >> =20 > >> .vcpu_deliver_sipi_vector =3D svm_vcpu_deliver_sipi_vector, > >> + > >> + .alloc_apic_backing_page =3D svm_alloc_apic_backing_page, > >> }; > >> =20 > >> static struct kvm_x86_init_ops svm_init_ops __initdata =3D { > >> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > >> index d1f1512a4b47..e40800e9c998 100644 > >> --- a/arch/x86/kvm/svm/svm.h > >> +++ b/arch/x86/kvm/svm/svm.h > >> @@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm); > >> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)= ; > >> void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int= cpu); > >> void sev_es_unmap_ghcb(struct vcpu_svm *svm); > >> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu); > >> =20 > >> /* vmenter.S */ > >> =20 > >> --=20 > >> 2.17.1 > >> > >> >=20 --=20 Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK