From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A16F2EB64D7 for ; Wed, 28 Jun 2023 12:23:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCB668D0002; Wed, 28 Jun 2023 08:23:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7A928D0001; Wed, 28 Jun 2023 08:23:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1B578D0002; Wed, 28 Jun 2023 08:23:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A28368D0001 for ; Wed, 28 Jun 2023 08:23:56 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6E7DA120A93 for ; Wed, 28 Jun 2023 12:23:56 +0000 (UTC) X-FDA: 80952073272.06.010925F Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf05.hostedemail.com (Postfix) with ESMTP id BDE5010001D for ; Wed, 28 Jun 2023 12:23:53 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LEOvjQq1; spf=none (imf05.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.20) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687955034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8k6CHJex9HWQlvc1fNM9VB8VX/Ra5vlYZ6LBw5kZI0U=; b=IhssXQmQhZXq8J9sVgTRHS2Y0JkddhEMeZYTFGkpy2uoWDqCpMmqSr7tiScm0tIc0P+tsJ bWVLQmL4W0Rxy9j4Eac2BvGZAY/z4pknS/H+zc8Pd8TEWxMxHZUy0x68+jDPK7cgTL3SMa 59QtEF/Z6nTN7iwhCMXdA+8e/3MbqCE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LEOvjQq1; spf=none (imf05.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.20) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687955034; a=rsa-sha256; cv=none; b=Rxln+qy9SQZC5ctQGfHPbWCqlz8VZArtHZl5Luu/ud/xuPfTGmqSr5cHi3wMe0PIL7D4fw de1WNZjh0bSA/YKH/fv4eGSxtaeU/6wW37W5l2hoMSCvlLkXxytISkgtVhntx6AFzKDsJP 8ShWqX4wDkq1TRo8ozfouM/XAw8tzNw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687955033; x=1719491033; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=C3DgP6Fkxxe9p5bn588wcInDqX31hZVPVaT7AmQDj98=; b=LEOvjQq1kgM2qVDRsPKtTNOX9LXw5fsZ8dfcdrruAE7Kktp0s7cZzs07 9GN0NOq4sq5kDQv05wxBIp9SW8zyfj+RBomDt1v78f/T4ceJSjWl/fP2i XNZfFJg/TCgkGuzmDs0QvPLVc+PbwHeCikIZnTkC8nXYCcqAPayYij1dk 52IZ8usx8haqSSbTIa2h5KF0ssc57GC6LxmJ3s7XFA8VCliAxHtr1GO2F s2HuvF5Y48sf/bCy0ijNsxbVvSVRXtf7J/moNURqP9Ab7oWR5aBj4G2wS 8OxTDdDwNr1j/jiVIesb+gBozZOijkGg6nFkLGN22gM0IekuD4z9eYwQc g==; X-IronPort-AV: E=McAfee;i="6600,9927,10754"; a="351613983" X-IronPort-AV: E=Sophos;i="6.01,165,1684825200"; d="scan'208";a="351613983" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2023 05:23:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10754"; a="830091017" X-IronPort-AV: E=Sophos;i="6.01,165,1684825200"; d="scan'208";a="830091017" Received: from rajritu-mobl2.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.47.187]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2023 05:23:45 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 889041095C8; Wed, 28 Jun 2023 15:23:42 +0300 (+03) Date: Wed, 28 Jun 2023 15:23:42 +0300 From: kirill.shutemov@linux.intel.com To: Kai Huang Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, dave.hansen@intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, david@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ashok.raj@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, ying.huang@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, nik.borisov@suse.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com Subject: Re: [PATCH v12 18/22] x86/virt/tdx: Keep TDMRs when module initialization is successful Message-ID: <20230628122342.zdnqsgnugalqj6ix@box.shutemov.name> References: <7d06fe5fda0e330895c1c9043b881f3c2a2d4f3f.1687784645.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7d06fe5fda0e330895c1c9043b881f3c2a2d4f3f.1687784645.git.kai.huang@intel.com> X-Rspamd-Queue-Id: BDE5010001D X-Rspam-User: X-Stat-Signature: rsyxdzisz3gu5kchu3rqki9khj7idqdh X-Rspamd-Server: rspam01 X-HE-Tag: 1687955033-335113 X-HE-Meta: U2FsdGVkX18NhQzHU8HKXlgtmaXUiHuHpHS6O0ABU6+9KqQpsWVfp/LBMoSEL9c7k28WZR/zFFiS8bJ1z/dTt4qPE0Au6dx+DyVqBCtAlYvvzM/9Ow63ysNSrastYkrE4JHLMC9rP6g9HCqEGzMT2ER5WlIq/MxJ8xh2voFdXM4IEZcc57iX+jfyIoii1VFZxzIFhaJ/rk1KscR3c9ALNwpUeTdQV+rDDm7oRiwTJqbQxsTwLi+uPBYO8peP2arRrkpSibg2+VnEFNT5ZPk+zHUebR0r4mjGtGUM0n6yn0uk2+gGtsu14CLWpf6HZddMeQRYWIkV7YvMjIS92UWEL6LW/N1N8QdyVM4KpZ4m4I+QzgvnUde25OZ+ipp6cw8RHBIzLIzBDvXPcMs4ezPD6KTJJLCOSmXftNlLt5CME+/tftNIdrJzIWE0ufav8rrzrFbTsUyxGeLgNJZ36IYN5vp69Wj9HVVbvGfDoKIfm2Jygr7E7qIfSHHZ2bmYB1b2w1XBkTGxVVkEO3nUC+3Mwz1W+f9b9Z/SdhD7RDKme+zJ+HZ9qE5kKfWPxJZyyoVJAggqJyd7FxTLx8YQgjfCet2jGXNfrgJvr7Oajvrspt7GgcNPL3cG0UQPY5LFis0yoRkppZFygnCADKaP/ZU3RIpD4ZectttAW4TJCeW5KbyI9NXsxcldw8ot2mZ9RR6Yt4vHhOjUYM6LqjHHWyKiKCUqviup/HrlxvD+8I3j6By0stJS0Gk6/T2a3R0jymI73PdrYg9oR++liHptE7tYcRZ53cdh42xPcNfn/VN8sd9XzgfDnPedFRN9Hy6oD/BPDebSZFCaTclCZWx1SADZ7auPXz16RxXB+faBDjgqktT0l338uNVva13mD5AldfCzoweiY/Or6gDICS2XWgS00mRG3+LhyX65ZgHjdzntjExieaMRk+QyV7tY8OJ2pIkNkTna5YDr37HC9w6Pcgy m9gsdnHc zFt/clcWyxiVuRk1PrJ78NDPSaW7b8Fl4He+iXvPVopptMMREtgSelZ/zPTXapEZvsrV22WUPrpHaDW3XOyxLovk/tgvVTcFV4OywbwuN7KD3HtjWJdt1wl1t0ZYeUlgcH1Swn6M4t7Tpy+s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 27, 2023 at 02:12:48AM +1200, Kai Huang wrote: > On the platforms with the "partial write machine check" erratum, the > kexec() needs to convert all TDX private pages back to normal before > booting to the new kernel. Otherwise, the new kernel may get unexpected > machine check. > > There's no existing infrastructure to track TDX private pages. Change > to keep TDMRs when module initialization is successful so that they can > be used to find PAMTs. > > With this change, only put_online_mems() and freeing the buffer of the > TDSYSINFO_STRUCT and CMR array still need to be done even when module > initialization is successful. Adjust the error handling to explicitly > do them when module initialization is successful and unconditionally > clean up the rest when initialization fails. > > Signed-off-by: Kai Huang > --- > > v11 -> v12 (new patch): > - Defer keeping TDMRs logic to this patch for better review > - Improved error handling logic (Nikolay/Kirill in patch 15) > > --- > arch/x86/virt/vmx/tdx/tdx.c | 84 ++++++++++++++++++------------------- > 1 file changed, 42 insertions(+), 42 deletions(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 52b7267ea226..85b24b2e9417 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -49,6 +49,8 @@ static DEFINE_MUTEX(tdx_module_lock); > /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ > static LIST_HEAD(tdx_memlist); > > +static struct tdmr_info_list tdx_tdmr_list; > + > /* > * Wrapper of __seamcall() to convert SEAMCALL leaf function error code > * to kernel error code. @seamcall_ret and @out contain the SEAMCALL > @@ -1047,7 +1049,6 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list) > static int init_tdx_module(void) > { > struct tdsysinfo_struct *sysinfo; > - struct tdmr_info_list tdmr_list; > struct cmr_info *cmr_array; > int ret; > > @@ -1088,17 +1089,17 @@ static int init_tdx_module(void) > goto out_put_tdxmem; > > /* Allocate enough space for constructing TDMRs */ > - ret = alloc_tdmr_list(&tdmr_list, sysinfo); > + ret = alloc_tdmr_list(&tdx_tdmr_list, sysinfo); > if (ret) > goto out_free_tdxmem; > > /* Cover all TDX-usable memory regions in TDMRs */ > - ret = construct_tdmrs(&tdx_memlist, &tdmr_list, sysinfo); > + ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, sysinfo); > if (ret) > goto out_free_tdmrs; > > /* Pass the TDMRs and the global KeyID to the TDX module */ > - ret = config_tdx_module(&tdmr_list, tdx_global_keyid); > + ret = config_tdx_module(&tdx_tdmr_list, tdx_global_keyid); > if (ret) > goto out_free_pamts; > > @@ -1118,51 +1119,50 @@ static int init_tdx_module(void) > goto out_reset_pamts; > > /* Initialize TDMRs to complete the TDX module initialization */ > - ret = init_tdmrs(&tdmr_list); > + ret = init_tdmrs(&tdx_tdmr_list); > + if (ret) > + goto out_reset_pamts; > + > + pr_info("%lu KBs allocated for PAMT.\n", > + tdmrs_count_pamt_kb(&tdx_tdmr_list)); > + > + /* > + * @tdx_memlist is written here and read at memory hotplug time. > + * Lock out memory hotplug code while building it. > + */ > + put_online_mems(); > + /* > + * For now both @sysinfo and @cmr_array are only used during > + * module initialization, so always free them. > + */ > + free_page((unsigned long)sysinfo); > + > + return 0; > out_reset_pamts: > - if (ret) { > - /* > - * Part of PAMTs may already have been initialized by the > - * TDX module. Flush cache before returning PAMTs back > - * to the kernel. > - */ > - wbinvd_on_all_cpus(); > - /* > - * According to the TDX hardware spec, if the platform > - * doesn't have the "partial write machine check" > - * erratum, any kernel read/write will never cause #MC > - * in kernel space, thus it's OK to not convert PAMTs > - * back to normal. But do the conversion anyway here > - * as suggested by the TDX spec. > - */ > - tdmrs_reset_pamt_all(&tdmr_list); > - } > + /* > + * Part of PAMTs may already have been initialized by the > + * TDX module. Flush cache before returning PAMTs back > + * to the kernel. > + */ > + wbinvd_on_all_cpus(); > + /* > + * According to the TDX hardware spec, if the platform > + * doesn't have the "partial write machine check" > + * erratum, any kernel read/write will never cause #MC > + * in kernel space, thus it's OK to not convert PAMTs > + * back to normal. But do the conversion anyway here > + * as suggested by the TDX spec. > + */ > + tdmrs_reset_pamt_all(&tdx_tdmr_list); > out_free_pamts: > - if (ret) > - tdmrs_free_pamt_all(&tdmr_list); > - else > - pr_info("%lu KBs allocated for PAMT.\n", > - tdmrs_count_pamt_kb(&tdmr_list)); > + tdmrs_free_pamt_all(&tdx_tdmr_list); > out_free_tdmrs: > - /* > - * Always free the buffer of TDMRs as they are only used during > - * module initialization. > - */ > - free_tdmr_list(&tdmr_list); > + free_tdmr_list(&tdx_tdmr_list); > out_free_tdxmem: > - if (ret) > - free_tdx_memlist(&tdx_memlist); > + free_tdx_memlist(&tdx_memlist); > out_put_tdxmem: > - /* > - * @tdx_memlist is written here and read at memory hotplug time. > - * Lock out memory hotplug code while building it. > - */ > put_online_mems(); > out: > - /* > - * For now both @sysinfo and @cmr_array are only used during > - * module initialization, so always free them. > - */ > free_page((unsigned long)sysinfo); > return ret; > } This diff is extremely hard to follow, but I think the change to error handling Nikolay proposed has to be applied to the function from the beginning, not changed drastically in this patch. -- Kiryl Shutsemau / Kirill A. Shutemov