From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 427D7C4167B for ; Tue, 7 Nov 2023 16:32:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD9BA8D0049; Tue, 7 Nov 2023 11:32:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8BBD8D0001; Tue, 7 Nov 2023 11:32:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2B758D0049; Tue, 7 Nov 2023 11:32:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9F19B8D0001 for ; Tue, 7 Nov 2023 11:32:39 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6E0CC4024F for ; Tue, 7 Nov 2023 16:32:39 +0000 (UTC) X-FDA: 81431701638.18.7A13604 Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf02.hostedemail.com (Postfix) with ESMTP id EDFEA8001C for ; Tue, 7 Nov 2023 16:32:36 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=Ir2P9eUe; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf02.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699374757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/66xR9WwT2nRxnZsZiVrrkdqU3jUBlpqlG2vSrMefUg=; b=4r6XrlEM239d88ZsifnkxX8NKRLGwqzUuAlQzdZ2CCwVTETW45TpLOxj2aJN5TFq5T0L88 Sl9D8kX4pZgsNOroT8BSWUoZ5LQPEYXaxu6Es2Y6+CActMttetTT39Q+hEYXWXNkmKMx92 vfW6Btjpg5K5xv4K8/AGfvj3lb/nNk4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=Ir2P9eUe; dmarc=pass (policy=none) header.from=alien8.de; spf=pass (imf02.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699374757; a=rsa-sha256; cv=none; b=Vtapv7fP+Iszd4j1J5jKKpwbO9qx9ofGrX+L3lbF1398xt+dBi+cLJdSYFqIsUKD7MBwDx CK8tn41wuDVNutAZ0RbFpMHI9YiyIWGqZfyz6x98Oom4cxvzTDcyLL+eOlx4JMNCHdBn44 wMV+XwKbUvOhP4ZvALCb6/7fc05SR+8= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id C61F840E01A4; Tue, 7 Nov 2023 16:32:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id g1-qSrUy7Zvx; Tue, 7 Nov 2023 16:32:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1699374750; bh=/66xR9WwT2nRxnZsZiVrrkdqU3jUBlpqlG2vSrMefUg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ir2P9eUeMj8FlnblX4WEHf2sRZaTnOn4y3N4kA74rLKXhsZEFIYfRicDarY5Wy8y4 lL0smUZvdgqddTRDmeHAp5aZk7Q8Udtan+Fr7JVZ7ekWTfgXPmUe4UZ0TPWXa0KBkI k19fbhPpUVz8NPj3ig6Q3bQH5XfZbnIf6pL64L2c64HA3zqCwUNgE741S4XpFBq5Cg /SmNdQM2iYkZyVqFa7gRc96FBjT4zs2ZSds9cMSmAtCcsrHhyMecsqIHmSxlzmzmAw oriD6xITv5/zgGg4F7JP0NllNQt9XUfY7qWWrYBepJMo3GjmnAoKOL+DXQhI3QaY8R gGq8ZmUtR/jEJOKSEc/0HJFVM2kIlT5AkcCC6bm6ruGnXExeghDeb53oSJdJQpwvA+ faRvbpc2Lu0HC+CCR8PSvLqpZ0jWMIk1Zf2pTwMOmg9Jc3BGkl6Vh96cfGPL4qUCVm Cnl36cVAnK5R67SlIw+QRh23Hbd/Pt4GDD3nnLE7s1H6bnUPcZzWz8g9RZGlfJ7COU 6lcrAiwbqL4adLdVYZElKNTynazvC0aLKGc9lvxBhpGT3hFKxE7DBB5Jh1RkZxo8CF VBsphHJCfIYnvLAlELR3g6uAoCcaUvcIbgcZuUlKoPC3Jov4o8xSh+K+mVGNfuUUIL A4yoxiNdT0B4LxfLNZkL6NJY= Received: from zn.tnic (pd95304da.dip0.t-ipconnect.de [217.83.4.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 3430040E0191; Tue, 7 Nov 2023 16:31:50 +0000 (UTC) Date: Tue, 7 Nov 2023 17:31:42 +0100 From: Borislav Petkov To: Michael Roth Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, jroedel@suse.de, thomas.lendacky@amd.com, hpa@zytor.com, ardb@kernel.org, pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, jmattson@google.com, luto@kernel.org, dave.hansen@linux.intel.com, slp@redhat.com, pgonda@google.com, peterz@infradead.org, srinivas.pandruvada@linux.intel.com, rientjes@google.com, dovmurik@linux.ibm.com, tobin@ibm.com, vbabka@suse.cz, kirill@shutemov.name, ak@linux.intel.com, tony.luck@intel.com, marcorr@google.com, sathyanarayanan.kuppuswamy@linux.intel.com, alpergun@google.com, jarkko@kernel.org, ashish.kalra@amd.com, nikunj.dadhania@amd.com, pankaj.gupta@amd.com, liam.merwick@oracle.com, zhi.a.wang@intel.com, Brijesh Singh Subject: Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support Message-ID: <20231107163142.GAZUpmbt/i3himIf+E@fat_crate.local> References: <20231016132819.1002933-1-michael.roth@amd.com> <20231016132819.1002933-7-michael.roth@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231016132819.1002933-7-michael.roth@amd.com> X-Rspamd-Queue-Id: EDFEA8001C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: drmp98w3r1sg68n7trqqdqqcixgijnhm X-HE-Tag: 1699374756-243826 X-HE-Meta: U2FsdGVkX1/iS4W0lLoN0EONBV8nF7hEKXYqyGtZbwEZACov4pDSbuDdneVmo8NGSkAxazh0RCbK7/YS2EnUbAvJWlmS39+ri3VjFe3U807CPcN0foxlNWdaHtbNNRdEt7bO//KPiojILpthsEo6vuIDeZ9cj9Y4lEk5tCJnlAu/ugCkVoX2u6JmBnZY2fsgxmlsKPaWL3vQI1WIhWz17AfVapxFhf4DhxG7/0G37RJvWB6p/jMKwzmIWG78vz7YLPVO0OYG9PvERp9loR0kn8CzFgpX2iSjuCjqHKKOut/jBZf2yHkwCO35/KmkCKQJsMZv25R+P62piRCSZW0Oy7HJVywmsRZU2gp7ZzNTWvFcHr/zd5WnSaAtvXrE22LXF7Tkj4fqyezc/yG8+CRYS+ipuBCnHMEQ85z7xBahNpKFp/hwl6BVV5t5wKTvEmhSX8LEbBwGM6HdHgcHwScvRLR8bxmPKpG97FT293LuXB3rbHmvIsx6O9Hcwpd9ADgFLFluE19K7cNRUzXtocLrxW9QbUaqEDltFZHzBBg42xkU/MMWbTsEw3gCRFgnCBYzkXGZFQHLX81kkqzLncdCSmEPJY6mOKm33b5oYIoTzw4fZRKmpAOOSGX5lAqWXwIblguWYrHlTyuks+B3c153Kep2h1wYtbaWBi7jkhNZj3PYNgQCwKn7xcksC9kuKspGitIuSFIJDU15zxUD0frShVwMOmwoQ0FAaBw/N+Aj+CAITzi58DMMEyfLLbX65Frn5h/KFVBw/gooBFvKHteV2ITShhrJRukBWDWNTO/RhqqpwWiL2ov4RlpAFnl+hAXOZC1XCxvmzdF8iQyJfkVkvUnTsv8lwRprrGrqDBg++q0i6rb8kCgRyoIMqDhTRRwSfsquZz27rEGVxY2xqaMcDmU0zO5sTZxDW/31g2chZv+NyRUF/TcvcAFaHrwe2tCrk/kObIpySEtsP8CVqnQ Jw+Wnk9w Re9htNNmpQAyzQ2baxfpTLh7Nl+5exrgIeTvGgjz3FqlYVLKY5vpTEi39xecI0qkqV/M+5L/RhLkaVdjLucdd/QU6f6TQoSo7ozZ1KigDzgZwwJ5n0rPq4Zs6ch2e8BVUq3UeLREASipKFGK2KedMwlbD12kMcoWxXkIIwIHyXjM0MI4eQQl6+XJdFQYA5qOz2RB1jitD++uTcZM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote: > +static bool early_rmptable_check(void) > +{ > + u64 rmp_base, rmp_size; > + > + /* > + * For early BSP initialization, max_pfn won't be set up yet, wait until > + * it is set before performing the RMP table calculations. > + */ > + if (!max_pfn) > + return true; This already says that this is called at the wrong point during init. Right now we have early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt which runs only on the BSP but then early_init_amd() is called in init_amd() too so that it takes care of the APs too. Which ends up doing a lot of unnecessary work on each AP in early_detect_mem_encrypt() like calculating the RMP size on each AP unnecessarily where this needs to happen exactly once. Is there any reason why this function cannot be moved to init_amd() where it'll do the normal, per-AP init? And the stuff that needs to happen once, needs to be called once too. > + > + return snp_get_rmptable_info(&rmp_base, &rmp_size); > +} > + > static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) > { > u64 msr; > @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) > if (!(msr & MSR_K7_HWCR_SMMLOCK)) > goto clear_sev; > > + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check()) > + goto clear_snp; > + > return; > > clear_all: > @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) > clear_sev: > setup_clear_cpu_cap(X86_FEATURE_SEV); > setup_clear_cpu_cap(X86_FEATURE_SEV_ES); > +clear_snp: > setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); > } > } ... > +bool snp_get_rmptable_info(u64 *start, u64 *len) > +{ > + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end; > + > + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base); > + rdmsrl(MSR_AMD64_RMP_END, rmp_end); > + > + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) { > + pr_err("Memory for the RMP table has not been reserved by BIOS\n"); > + return false; > + } If you're masking off bits 0-12 above... > + > + if (rmp_base > rmp_end) { ... why aren't you using the masked out vars further on? I know, the hw will say, yeah, those bits are 0 but still. IOW, do: rmp_base &= RMP_ADDR_MASK; rmp_end &= RMP_ADDR_MASK; after reading them. > + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end); > + return false; > + } > + > + rmp_sz = rmp_end - rmp_base + 1; > + > + /* > + * Calculate the amount the memory that must be reserved by the BIOS to > + * address the whole RAM, including the bookkeeping area. The RMP itself > + * must also be covered. > + */ > + max_rmp_pfn = max_pfn; > + if (PHYS_PFN(rmp_end) > max_pfn) > + max_rmp_pfn = PHYS_PFN(rmp_end); > + > + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ; > + > + if (calc_rmp_sz > rmp_sz) { > + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n", > + calc_rmp_sz, rmp_sz); > + return false; > + } > + > + *start = rmp_base; > + *len = rmp_sz; > + > + return true; > +} > + > +static __init int __snp_rmptable_init(void) > +{ > + u64 rmp_base, rmp_size; > + void *rmp_start; > + u64 val; > + > + if (!snp_get_rmptable_info(&rmp_base, &rmp_size)) > + return 1; > + > + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", That's "RMP table physical range" > + rmp_base, rmp_base + rmp_size - 1); > + > + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB); > + if (!rmp_start) { > + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size); No need to dump rmp_base and rmp_size again here - you're dumping them above. > + return 1; > + } > + > + /* > + * Check if SEV-SNP is already enabled, this can happen in case of > + * kexec boot. > + */ > + rdmsrl(MSR_AMD64_SYSCFG, val); > + if (val & MSR_AMD64_SYSCFG_SNP_EN) > + goto skip_enable; > + > + /* Initialize the RMP table to zero */ Again: useless comment. > + memset(rmp_start, 0, rmp_size); > + > + /* Flush the caches to ensure that data is written before SNP is enabled. */ > + wbinvd_on_all_cpus(); > + > + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */ First of all, use the APM bit name here pls: MtrrFixDramModEn. And then, for the life of me, I can't find any mention in the APM why this bit is needed. Neither in "15.36.2 Enabling SEV-SNP" nor in "15.34.3 Enabling SEV". Looking at the bit defintions of WrMem an RdMem - read and write requests get directed to system memory instead of MMIO so I guess you don't want to be able to write MMIO for certain physical ranges when SNP is enabled but it'll be good to have this properly explained instead of a "this must happen" information-less sentence. > + on_each_cpu(mfd_enable, NULL, 1); > + > + /* Enable SNP on all CPUs. */ Useless comment. > + on_each_cpu(snp_enable, NULL, 1); > + > +skip_enable: > + rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ; > + rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ; > + > + rmptable_start = (struct rmpentry *)rmp_start; > + rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1; > + > + return 0; > +} > + > +static int __init snp_rmptable_init(void) > +{ > + int family, model; > + > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > + return 0; > + > + family = boot_cpu_data.x86; > + model = boot_cpu_data.x86_model; Looks useless - just use boot_cpu_data directly below. As mentioned here already https://lore.kernel.org/all/Y9ubi0i4Z750gdMm@zn.tnic/ And I already mentioned that for v9: https://lore.kernel.org/r/20230621094236.GZZJLGDAicp1guNPvD@fat_crate.local Next time I'm NAKing this patch until you incorporate all review comments or you give a technical reason why you disagree with them. > + /* > + * RMP table entry format is not architectural and it can vary by processor and > + * is defined by the per-processor PPR. Restrict SNP support on the known CPU > + * model and family for which the RMP table entry format is currently defined for. > + */ > + if (family != 0x19 || model > 0xaf) > + goto nosnp; > + > + if (amd_iommu_snp_enable()) > + goto nosnp; > + > + if (__snp_rmptable_init()) > + goto nosnp; > + > + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL); > + > + return 0; > + > +nosnp: > + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); > + return -ENOSYS; > +} > + > +/* > + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable() > + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be > + * called after subsys_initcall(). > + * > + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA > + * directly into guest private memory. In case of SNP, the IOMMU ensures that > + * the page(s) used for DMA are hypervisor owned. > + */ > +fs_initcall(snp_rmptable_init); This looks backwards. AFAICT, the IOMMU code should call arch code to enable SNP at the right time, not the other way around - arch code calling driver code. Especially if the SNP table enablement depends on some exact IOMMU init_state: if (init_state > IOMMU_ENABLED) { pr_err("SNP: Too late to enable SNP for IOMMU.\n"); -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette