From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4C6BC3DA6E for ; Mon, 8 Jan 2024 18:37:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0643C6B007B; Mon, 8 Jan 2024 13:37:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 013D96B007D; Mon, 8 Jan 2024 13:37:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1DC76B0082; Mon, 8 Jan 2024 13:37:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CB4A86B007B for ; Mon, 8 Jan 2024 13:37:37 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9DAE51A0838 for ; Mon, 8 Jan 2024 18:37:37 +0000 (UTC) X-FDA: 81657002154.14.C9EDFE1 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by imf10.hostedemail.com (Postfix) with ESMTP id F126CC001C for ; Mon, 8 Jan 2024 18:37:34 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BZ9PTVDr; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf10.hostedemail.com: domain of sathyanarayanan.kuppuswamy@linux.intel.com has no SPF policy when checking 198.175.65.11) smtp.mailfrom=sathyanarayanan.kuppuswamy@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704739055; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0wRpwREyeOeRN+89zj9SUzr5fecNMGS1mSPKVmMxX/Q=; b=tgLvnSgI74kG9KkucXHdXpVaMa45eOHO8sgyTaGJRq7gMuCK0yjlotCeMwNBMkGUG3f2qs 8Ub6vMYBcNOeQ1Fnkw4KoX2mZJdTFu6MiVG8cvBAd/7EbXo+gWagyDoeI33xu3MS7vtHZr M0UWHKgYoUu4wMZTxPu0XsZ8pzPaJhg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BZ9PTVDr; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf10.hostedemail.com: domain of sathyanarayanan.kuppuswamy@linux.intel.com has no SPF policy when checking 198.175.65.11) smtp.mailfrom=sathyanarayanan.kuppuswamy@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704739055; a=rsa-sha256; cv=none; b=XErs14d0j94n/BnXT8GS8a++wGD+ihpvBPzNO1lxzLvmc8dGJZexNlqBo/sJ8bgXxVD9UU Jzy72zVQhEC9+fVF8Op73ksxGD9VYGzwiFfDX9l5icb40p57BzbuKqmeHFk8johCBbEDl3 LAvn/mSiUktIe4+pnSWckw7AI3f0G2U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704739055; x=1736275055; h=message-id:date:mime-version:subject:to:references:from: in-reply-to:content-transfer-encoding; bh=OHCMnUym3kiJu7MowwwFDStEQeC8LwsWW80R0Z2s7Lo=; b=BZ9PTVDrrtPbspqSjfgAZX+hj1kIFAmzcvFg2oyTz+shPk7/k0VBSloo ekZB9Kj4vKG7P2xh5iEEvu1o3ojyErwfdcv3LPC0U3F4sdx71kI5P+Rwd 7JvFQ92WLcHmdui3G4q3tCzlTM9+CQwD/sH9eY6CIXkQTjKe/klzZG48B S4r02BwEf1F77D6sAqtXatX/DfvIJB0NxTMBo+c2CN+h9PgB/49ynI7fx 6gKsg/NG1xDQkdi3r4nUGJg+xTOpch/C4mLTZILmo9g1WXseV+h09cQp5 d1yLIXi93Fd0kJk+AcExKKHWvR7bb8Irsgf3qrVufFjVnusyuUIBoSA8C g==; X-IronPort-AV: E=McAfee;i="6600,9927,10947"; a="4717259" X-IronPort-AV: E=Sophos;i="6.04,180,1695711600"; d="scan'208";a="4717259" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jan 2024 10:37:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10947"; a="757693603" X-IronPort-AV: E=Sophos;i="6.04,180,1695711600"; d="scan'208";a="757693603" Received: from nsingiri-mobl2.amr.corp.intel.com (HELO [10.212.166.188]) ([10.212.166.188]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jan 2024 10:37:32 -0800 Message-ID: Date: Mon, 8 Jan 2024 10:37:28 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad() Content-Language: en-US To: mhklinux@outlook.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, kirill.shutemov@linux.intel.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, luto@kernel.org, peterz@infradead.org, akpm@linux-foundation.org, urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, thomas.lendacky@amd.com, ardb@kernel.org, jroedel@suse.de, seanjc@google.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, linux-hyperv@vger.kernel.org, linux-mm@kvack.org References: <20240105183025.225972-1-mhklinux@outlook.com> <20240105183025.225972-4-mhklinux@outlook.com> From: Kuppuswamy Sathyanarayanan In-Reply-To: <20240105183025.225972-4-mhklinux@outlook.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 3gtxmwihy81oruhm5m3a3dgtprsx9fjs X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F126CC001C X-HE-Tag: 1704739054-890806 X-HE-Meta: U2FsdGVkX1933MYu5jFnCQ24rctef7KytWi2WnBuj6xyIsMnScvBH3pqDNJUaqr6Kt8WahFJ9GlEJGUt6s2N0AfpcWIB03gGb7ZmROmbGYuwQS3gxOdP+COfZTyX1qUVgO7NQTO6TH1ClPoJhFzcnZgFS+orHIbqoRfsBv4rHX/+SJGqfUopLA+1uMxQ9PtOXOovSMUIXhOhGd8PBNwORJOYgpH5KmhJp3X0d/r13CCry5JnIIYd7xIsJozb/bOCtpl6ZaUYdZ5geOA3yrmsK0eJwpD5E++EqDb0BxE/rmxEepuBXFhmyE5mnlHuFY1A/CMk5GnLGzgr3I2FWU60sMrw4nm+5z1n2lBWUHxO/DYhiGjW3FeZxyaQPZfFmkoe6BDPOTOKmPn/PqzfK4Y5RxinS0E0N/n/RQGLI7olweBdUXm68KCklUSyaMSRyfGh+xZbfpMthuzhCMlU0EB4EdKhB+VU5m5dMCQYHONDZeIWJqyxTzfQQCtbHF0QbcmLyrdGaMmZiAy2osai3ArJk/+eXy5/qYLeTyYDtwHMt/+VkvVR5nuM7d+fmU91lMMwfiCgqnBBcDhcoQuY+H3xH3a9bJ8bnp6ugS0bK2AoG05XpB5d9nVXUCgyPFt2uxxPtwg+duxPZe/uD5jqkue0nEHEVHiKY1cpF0ZJbZ4kGqvQxjtBaAOVs+4UPOjm9HYd/wna6yp2mvEt5ZFoGqRmq2ep9tKVrGNYQYtE3EdhFT48BSj2ihftPojhZ1jMLySXMhpYoDajunKZvgExzO9RL1pU/6MNgrNhOdLr5gxO+oUI9D+WvJoC8SUBGp+O5i1+yn79awGM9qCqCxJ6pu2Fw9JmN/reGxpDd6PiuInuZa0qbVBAvWBFQRP1ZVvVXBmAhCO+eVYRprmEgL+i5W23unaU93GSc1YsbM1A1Jx1dALtChDcKafUsyv5UWxhyKbXTLYYfigyoxj89iN/Vs5 kAhiPuUE fI40vnR8FQOlT0gKZsr6X/RUmkVn5WLtkH4xHSPwbllwxVevv+MZqbm0KVGJsA46jdkfP22grJOxC+6iZwVcTbKMo97eNGMvtWvsF2K/zLnnwQUFrBUbImcVNiG2Aj5FcHtuddZFrfVhC4tbjZSo4xGSfYn6HaHfVH/rB4Dfc0kTvriy0m9mL4XBYxSyPB5urtP5Bsz6ve7xDWQOzEh7fc+Lu1m+rjfUC0WR2yJ05LmX2Q0K8Uhtd2DiXOPT6uwk1HIOl5mDE/n9ZerBx68jM1u6L6AFg6WPYA6QC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/5/2024 10:30 AM, mhkelley58@gmail.com wrote: > From: Michael Kelley > > In a CoCo VM, when transitioning memory from encrypted to decrypted, or > vice versa, the caller of set_memory_encrypted() or set_memory_decrypted() > is responsible for ensuring the memory isn't in use and isn't referenced > while the transition is in progress. The transition has multiple steps, > and the memory is in an inconsistent state until all steps are complete. > A reference while the state is inconsistent could result in an exception > that can't be cleanly fixed up. > > However, the kernel load_unaligned_zeropad() mechanism could cause a stray > reference that can't be prevented by the caller of set_memory_encrypted() > or set_memory_decrypted(), so there's specific code to handle this case. > But a CoCo VM running on Hyper-V may be configured to run with a paravisor, > with the #VC or #VE exception routed to the paravisor. There's no > architectural way to forward the exceptions back to the guest kernel, and > in such a case, the load_unaligned_zeropad() specific code doesn't work. > > To avoid this problem, mark pages as "not present" while a transition > is in progress. If load_unaligned_zeropad() causes a stray reference, a > normal page fault is generated instead of #VC or #VE, and the > page-fault-based fixup handlers for load_unaligned_zeropad() resolve the > reference. When the encrypted/decrypted transition is complete, mark the > pages as "present" again. Change looks good to me. But I am wondering why are adding it part of prepare and finish callbacks instead of directly in set_memory_encrypted() function. Reviewed-by: Kuppuswamy Sathyanarayanan > > Signed-off-by: Michael Kelley > --- > arch/x86/hyperv/ivm.c | 49 ++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 46 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c > index 8ba18635e338..5ad39256a5d2 100644 > --- a/arch/x86/hyperv/ivm.c > +++ b/arch/x86/hyperv/ivm.c > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -502,6 +503,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[], > return -EFAULT; > } > > +/* > + * When transitioning memory between encrypted and decrypted, the caller > + * of set_memory_encrypted() or set_memory_decrypted() is responsible for > + * ensuring that the memory isn't in use and isn't referenced while the > + * transition is in progress. The transition has multiple steps, and the > + * memory is in an inconsistent state until all steps are complete. A > + * reference while the state is inconsistent could result in an exception > + * that can't be cleanly fixed up. > + * > + * But the Linux kernel load_unaligned_zeropad() mechanism could cause a > + * stray reference that can't be prevented by the caller, so Linux has > + * specific code to handle this case. But when the #VC and #VE exceptions > + * routed to a paravisor, the specific code doesn't work. To avoid this > + * problem, mark the pages as "not present" while the transition is in > + * progress. If load_unaligned_zeropad() causes a stray reference, a normal > + * page fault is generated instead of #VC or #VE, and the page-fault-based > + * handlers for load_unaligned_zeropad() resolve the reference. When the > + * transition is complete, hv_vtom_set_host_visibility() marks the pages > + * as "present" again. > + */ > +static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc) > +{ > + return !set_memory_np(kbuffer, pagecount); > +} > + > /* > * hv_vtom_set_host_visibility - Set specified memory visible to host. > * > @@ -521,7 +547,7 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo > > pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL); > if (!pfn_array) > - return false; > + goto err_set_memory_p; > > for (i = 0, pfn = 0; i < pagecount; i++) { > /* > @@ -545,14 +571,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo > } > } > > - err_free_pfn_array: > +err_free_pfn_array: > kfree(pfn_array); > + > +err_set_memory_p: > + /* > + * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present() > + * did. Do this even if there is an error earlier in this function in > + * order to avoid leaving the memory range in a "broken" state. Setting > + * the PRESENT bits shouldn't fail, but return an error if it does. > + */ > + if (set_memory_p(kbuffer, pagecount)) > + result = false; > + > return result; > } > > static bool hv_vtom_tlb_flush_required(bool private) > { > - return true; > + /* > + * Since hv_vtom_clear_present() marks the PTEs as "not present" > + * and flushes the TLB, they can't be in the TLB. That makes the > + * flush controlled by this function redundant, so return "false". > + */ > + return false; > } > > static bool hv_vtom_cache_flush_required(void) > @@ -615,6 +657,7 @@ void __init hv_vtom_init(void) > x86_platform.hyper.is_private_mmio = hv_is_private_mmio; > x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required; > x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required; > + x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present; > x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility; > > /* Set WB as the default cache mode. */ -- Sathyanarayanan Kuppuswamy Linux Kernel Developer