From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 419E4C54EBD for ; Fri, 6 Jan 2023 19:36:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A22C68E0002; Fri, 6 Jan 2023 14:36:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D29F8E0001; Fri, 6 Jan 2023 14:36:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89A2A8E0002; Fri, 6 Jan 2023 14:36:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7BEC08E0001 for ; Fri, 6 Jan 2023 14:36:36 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2B24BC0EE4 for ; Fri, 6 Jan 2023 19:36:36 +0000 (UTC) X-FDA: 80325381192.22.E54E2CA Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf26.hostedemail.com (Postfix) with ESMTP id D817E14001A for ; Fri, 6 Jan 2023 19:36:33 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MuWm6hkd; spf=pass (imf26.hostedemail.com: domain of dave.hansen@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673033794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sb50H7S+IdNzt81SGv7GiEoJ9XkuYg348Z1/5l2Tgqs=; b=fWd9wHt8x1A6Bw4xuUEncHe0F07ihzyuHNJ/9s0s/qwakePscVKgrRbwLuQ3mk+2hgfnJ7 Ahd3pzONp7i8VS5g2j9tCDxqLRmGcOTn6NvM0xCZF/0yIR8sGrwrnDXvfv0bSgKacMcGSn ytAjYl9fvVIh+3Jp6i7Gzb+zk9RowcA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MuWm6hkd; spf=pass (imf26.hostedemail.com: domain of dave.hansen@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673033794; a=rsa-sha256; cv=none; b=Dy2NzVb3eOh2pgZKbTTVG34i09LH+xXxaIfBt7vqmd692qS7e0367bPW3MEB8Rk6NX7ngs sRmF974gayYsJvAwh4AASicTqD8jZm14vS9mNIJM1gU0Uxn9uVBA0Uf95H2l1hO5+l6TRs oZJQyX4mE+vr2NORi8F4lpxnjBnTfnI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673033793; x=1704569793; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=JKQwhg3SKN3PwT6OwIBONjP4dNuxovJ0f2FMLC1fPXY=; b=MuWm6hkdhJ98dY/OSjk9zBz8nkf6l3KFqiPngK4x7IcO1LELh5kyi3J4 XzngdNPQE0YOPp32Jn4c65igwLnlH8ewEwC+XHinMRBv1Z/Ewx4Qpzzx1 IGC406DczgqS2NnayZKKPlemdvxB4pHN2McSeaEpl2rDnTdNbSj8DmtMr +uKZ6ZABtfJLh3OL8InJ3qJMcEtQztd4cCkkFSintOgnyxl3SgwNamMYC FseVzmdg+kMl5DZOOStafgwxKQgh3aitn8dIdazZ5PcPa7NqRZ4jvU4F1 hBE2qU9CWGruKjWpL/bapa6ar1KCotmLaxbUpbLkXMeEgTybdLAnGcWqB g==; X-IronPort-AV: E=McAfee;i="6500,9779,10582"; a="323791773" X-IronPort-AV: E=Sophos;i="5.96,306,1665471600"; d="scan'208";a="323791773" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jan 2023 11:36:32 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10582"; a="686563906" X-IronPort-AV: E=Sophos;i="5.96,306,1665471600"; d="scan'208";a="686563906" Received: from xiangyuy-mobl.amr.corp.intel.com (HELO [10.212.251.186]) ([10.212.251.186]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jan 2023 11:36:31 -0800 Message-ID: <725de6e9-e468-48ef-3bae-1e8a1b7ef0f7@intel.com> Date: Fri, 6 Jan 2023 11:36:31 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v8 09/16] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions Content-Language: en-US To: Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com References: <6f9c0bc1074501fa2431bde73bdea57279bf0085.1670566861.git.kai.huang@intel.com> From: Dave Hansen In-Reply-To: <6f9c0bc1074501fa2431bde73bdea57279bf0085.1670566861.git.kai.huang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D817E14001A X-Stat-Signature: f9zjrxm53cnspz9ha6ux3uc1tuycdpdc X-Rspam-User: X-HE-Tag: 1673033793-377330 X-HE-Meta: U2FsdGVkX1+1EKuG6/n5EJNfPdgevJf0JL5FGE3+Fp/SU/C32WdB4PzcLkyoJo5Sm7ZFD4G1njYWrL9rxbWeMvYVALkmRzhsgmH9Xm9u51AfaY7dV9oY8HRL/Vr2iN2/qczWQedZRNVA8N8SzVW1S/zQRABGnHuMooY00AoHq4izlAC02xgpTtUk41zzeplVw9N26ENuzcc5oOBk4othV4/7Y5ZgzTTdi68J1qDpkQWFyKMyhp5aghH2ZBnAfBU7/ggIEjUWvo5vukhCLxJJJyL61ODxIY/ZLnjQBMf49ni5e3P4D/87h/gfHOP3et7QKKhsuah+s2oBD0MCeGFwZCV9iBoeJTnUM69PEckFdKd3oeFelF5kuJ/1f2qVUb6YeI9m/cXMWCz5Y3h9P9DJB2OcucMZb7Fe0E1mvUtX88gM0lXl/P0QhPPCUVDSMMsFfaNoIRk5XfMMyp7mbPU2RSpVuGwADIfbKNGnmeg+mYJ/rm9Ko5S3wV2mrOzxjE0PXpJPhUWEwSsoTsat0fMsdAooAD79IZgz3d2ZApbGUNo2qflg7OkXLHJ1cQdlqgF1CkCmCXUMhJWHE75I8S6BpXq4+3/ZRi8rZTjEoIqahd+y6FKwmRi0s/g+d1+gaKIDKtVBfAPyGXifqQWXfl06l4HyPiu1wWcJYDniqmGTsi+22QUU+ikdytZ+mKEDBt1IWY6y/tFxF8/s3CEWgbG6q0jW6YKOXLjpeUdSviqFX5baa0T/zsrEpwgR4+4C0UcPbPVOKyEMb8l8R6Fb9rTZQj4ckSthlcsqUzQYiYDVGKQnbAUte6ojg3u6iFJtz99iVu7QYYdgNB2B1FoIVkq3b039V1xTuRHpN6IsIC2vUqo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/8/22 22:52, Kai Huang wrote: > Start to transit out the "multi-steps" to construct a list of "TD Memory > Regions" (TDMRs) to cover all TDX-usable memory regions. > > The kernel configures TDX-usable memory regions by passing a list of > TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains > the information of the base/size of a memory region, the base/size of the > associated Physical Address Metadata Table (PAMT) and a list of reserved > areas in the region. > > Do the first step to fill out a number of TDMRs to cover all TDX memory > regions. To keep it simple, always try to use one TDMR for each memory > region. As the first step only set up the base/size for each TDMR. > > Each TDMR must be 1G aligned and the size must be in 1G granularity. > This implies that one TDMR could cover multiple memory regions. If a > memory region spans the 1GB boundary and the former part is already > covered by the previous TDMR, just use a new TDMR for the remaining > part. > > TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs > are consumed but there is more memory region to cover. This could probably use some discussion of why it is not being future-proofed. Maybe: There are fancier things that could be done like trying to merge adjacent TDMRs. This would allow more pathological memory layouts to be supported. But, current systems are not even close to exhausting the existing TDMR resources in practice. For now, keep it simple. > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index d36ac72ef299..5b1de0200c6b 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -407,6 +407,90 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); > } > > +/* Get the TDMR from the list at the given index. */ > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, > + int idx) > +{ > + return (struct tdmr_info *)((unsigned long)tdmr_list->first_tdmr + > + tdmr_list->tdmr_sz * idx); > +} I think that's more complicated and has more casting than necessary. This looks nicer: int tdmr_info_offset = tdmr_list->tdmr_sz * idx; return (void *)tdmr_list->first_tdmr + tdmr_info_offset; Also, it might even be worth keeping ->first_tdmr as a void*. It isn't a real C array and keeping it as void* would keep anyone from doing: tdmr_foo = tdmr_list->first_tdmr[foo]; > +#define TDMR_ALIGNMENT BIT_ULL(30) > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) > +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) > +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) > + > +static inline u64 tdmr_end(struct tdmr_info *tdmr) > +{ > + return tdmr->base + tdmr->size; > +} > + > +/* > + * Take the memory referenced in @tmb_list and populate the > + * preallocated @tdmr_list, following all the special alignment > + * and size rules for TDMR. > + */ > +static int fill_out_tdmrs(struct list_head *tmb_list, > + struct tdmr_info_list *tdmr_list) > +{ > + struct tdx_memblock *tmb; > + int tdmr_idx = 0; > + > + /* > + * Loop over TDX memory regions and fill out TDMRs to cover them. > + * To keep it simple, always try to use one TDMR to cover one > + * memory region. > + * > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to > + * cover all memory regions in reality if the admin doesn't use > + * 'memmap' to create a bunch of discrete memory regions. When > + * there's a real problem, enhancement can be done to merge TDMRs > + * to reduce the final number of TDMRs. > + */ > + list_for_each_entry(tmb, tmb_list, list) { > + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx); > + u64 start, end; > + > + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn)); > + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn)); > + > + /* > + * A valid size indicates the current TDMR has already > + * been filled out to cover the previous memory region(s). > + */ > + if (tdmr->size) { > + /* > + * Loop to the next if the current memory region > + * has already been fully covered. > + */ > + if (end <= tdmr_end(tdmr)) > + continue; > + > + /* Otherwise, skip the already covered part. */ > + if (start < tdmr_end(tdmr)) > + start = tdmr_end(tdmr); > + > + /* > + * Create a new TDMR to cover the current memory > + * region, or the remaining part of it. > + */ > + tdmr_idx++; > + if (tdmr_idx >= tdmr_list->max_tdmrs) > + return -E2BIG; > + > + tdmr = tdmr_entry(tdmr_list, tdmr_idx); > + } > + > + tdmr->base = start; > + tdmr->size = end - start; > + } > + > + /* @tdmr_idx is always the index of last valid TDMR. */ > + tdmr_list->nr_tdmrs = tdmr_idx + 1; > + > + return 0; > +} > + > /* > * Construct a list of TDMRs on the preallocated space in @tdmr_list > * to cover all TDX memory regions in @tmb_list based on the TDX module > @@ -416,16 +500,23 @@ static int construct_tdmrs(struct list_head *tmb_list, > struct tdmr_info_list *tdmr_list, > struct tdsysinfo_struct *sysinfo) > { > + int ret; > + > + ret = fill_out_tdmrs(tmb_list, tdmr_list); > + if (ret) > + goto err; > + > /* > * TODO: > * > - * - Fill out TDMRs to cover all TDX memory regions. > * - Allocate and set up PAMTs for each TDMR. > * - Designate reserved areas for each TDMR. > * > * Return -EINVAL until constructing TDMRs is done > */ > - return -EINVAL; > + ret = -EINVAL; > +err: > + return ret; > } > > static int init_tdx_module(void) Otherwise this actually looks fine.