From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84463C7EE37 for ; Fri, 9 Jun 2023 10:14:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFAE58E0002; Fri, 9 Jun 2023 06:14:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DAB1F8E0001; Fri, 9 Jun 2023 06:14:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9A288E0002; Fri, 9 Jun 2023 06:14:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B83428E0001 for ; Fri, 9 Jun 2023 06:14:49 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6E088801D8 for ; Fri, 9 Jun 2023 10:14:49 +0000 (UTC) X-FDA: 80882800698.03.29C0E19 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf10.hostedemail.com (Postfix) with ESMTP id 2E969C002D for ; Fri, 9 Jun 2023 10:14:46 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=c4HN+fEV; spf=none (imf10.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686305687; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+nVO/9d+ItPZloIppBFoeffh303bEXDgpd3LglExEHM=; b=lGm9jghrMSOiHwdJOGk4H42CtDCVduKzT/l5n5uLTiZ0dF9ISTjWpoUQagnZiHHXl2OfjC z8KaUBaVqOwkCrJkxfdPCK5y732pp2nn1jl7wrxXCF2ps8B1w7834NEHsCuu7i3EE2z/jI NULqYjWmNGZH84HNC4OzVx4qDKus+aQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686305687; a=rsa-sha256; cv=none; b=JqRSsHm+r93CmZsEzRqZKLfqfCkLBQbDMxm/gFAo0rknkf6UO3N8k+zatS/lJAFsiphHz6 /6z9WUq3xRvKXvcuy8x98asim9rJwHxzXMRNEzGr/3XxIv8LWfwPgFFxVT0hFk/xVRyFfe 3RytxGCoo9cyh8KUGGDWVCOo5317e/4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=c4HN+fEV; spf=none (imf10.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686305687; x=1717841687; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=SVayVWVqa392N8MIrWO6VixuwsosPLRpQlzN6/QEniU=; b=c4HN+fEVH8Gd9Ka0t6v2YQsfDyZZhZzEkkLbqHnUcUEclR13CxSza4r1 rN+wlQGHNURdqgRKk8MtYkV3ZetYXEUw3sD2jSxDQcSzou3uyho3qnfEl 1uxt7L3oP6oFLRTxm0oQz4tzo4lmk/+zd2Tok1Ez2t/FYWl/zldag7m0E qPjsQfCk9NXK5Mo5wDR8MochRRt7qBNmSZ8jwz+iRs4V2m8/b2w9v16X2 fUBGDEq2AiaLEuU/jIvuJKTuXCFBIyYBPbkXjVPYXrDBas7FKxf8q0MNf vVDSgnakbiMq0GlKdaJNwS8d5HrJ8tSeOXG9AyqSJofyXQZloBuNJMxGq A==; X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="421158872" X-IronPort-AV: E=Sophos;i="6.00,228,1681196400"; d="scan'208";a="421158872" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2023 03:14:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="704501041" X-IronPort-AV: E=Sophos;i="6.00,228,1681196400"; d="scan'208";a="704501041" Received: from mbahx-mobl1.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.43.216]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2023 03:14:38 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 9938C104C0F; Fri, 9 Jun 2023 13:14:35 +0300 (+03) Date: Fri, 9 Jun 2023 13:14:35 +0300 From: kirill.shutemov@linux.intel.com To: Kai Huang Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, dave.hansen@intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, david@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com Subject: Re: [PATCH v11 17/20] x86/kexec: Flush cache of TDX private memory Message-ID: <20230609101435.xmz3kgydseddrty7@box.shutemov.name> References: <17bcbe3e154415ee7a4c77489809a3db0c5ddf3f.1685887183.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17bcbe3e154415ee7a4c77489809a3db0c5ddf3f.1685887183.git.kai.huang@intel.com> X-Stat-Signature: m6x5dr3afgsz5zc9jgyw3ny199go9tuu X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2E969C002D X-Rspam-User: X-HE-Tag: 1686305686-304041 X-HE-Meta: U2FsdGVkX1+LhuToPe8z2W+R44VXECpen+hYYzcNY4ZtjVwtSanRsqdHJXZ68I+mzid80Xwi4hQNMjUSg2AWhb448xW5f1rpQPqp7qBfwg5E+N2z6vaBLDXT1Sjh/6w8CopTHsNvvro/gYN4eRzVDlWJzFGyfAwU65mj9wUDTBNGNi4ylz9yftt53dri5OJSZjU9dQjqtd1frgi56lZntlVJGMd4NRzAZRczqx4LUatnMBIuDHl+wZo6mFmjt2/T1uLB9kXwy76cqtlTPY6qp21RL4oHU4ZU4I/5bGNc1yPJgFbpHjQA5ZzvPcQPg3Lh7+mQGON32gIw1rFAF1grGFXCC9j2PRdSLK1B7oak3EZTYSgopPvRGfa9O1DDjHiwgQ6mriH981l++1P2JgSLyckE5YIlaEsINss9EhYkrNs+nbOxNrY8J8z3UnKmE1bv14Amo6WceOKRhbEpK0KJf+cqljGYwyQZmPwNKah4GsLjmAkONBiKi9JN9Vnve45hWGkxIGiFhNHJ5p279YHgZJckIIU9BuyolpEKgBuxZdUX8TUeyGItbAJ9ib6YjPuXSAmkgTI0Lc3uM2WgDEm1BrFJdWwlE71FDwRlhgVqYl+rYYL4VeUz4Zewq1IW1UTVLTbGaoqWEfOKW1RG3Cx6+tRTgSk1TW/N56ZY6CnQzePTkwaHHU6hvuuOnX9EMfcSF1wzFA+9KcSpMDEnq8uKPK5Ww649OIXPtWxzVPsQwoE1R6mqOseZen+pMzRU8AyTLMYFkds9y3Rv1wjZ9140SVlptv7ndo15YCrsg0N2GqunlwFaawMV5g/dQgzHbZCZxmhmgTpJQlq+nNMxFxtaJH8yD3nNiLFsKUY37J2RPbSgTGwzgBPMGhZFfcl9d9UvefJpxEohf7cb7zIOmGVowPc4ok8328YbM7aOEl84TAtJfnC/IjHzSGI9kyCOn4BPVljeL1bqB/ZXe+r+yAC 1LcU8Vcu QHUuhJ6aSYnTIaVVthk3vAafqbgSr5qDQIkDmzhBVc661xDVs+sJzVAa7uC8+zUVqvy42P/Nq6U8NxoP6XO/DBs22OplgFVLtt36mAxSIunjFgTyOhj6EJ7QT7E6sgNTIKLvwnVAuqbn5SmcHQb/RW+s8ckgf/1+4gxlUy6TmyyLGHTo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 05, 2023 at 02:27:30AM +1200, Kai Huang wrote: > There are two problems in terms of using kexec() to boot to a new kernel > when the old kernel has enabled TDX: 1) Part of the memory pages are > still TDX private pages; 2) There might be dirty cachelines associated > with TDX private pages. > > The first problem doesn't matter on the platforms w/o the "partial write > machine check" erratum. KeyID 0 doesn't have integrity check. If the > new kernel wants to use any non-zero KeyID, it needs to convert the > memory to that KeyID and such conversion would work from any KeyID. > > However the old kernel needs to guarantee there's no dirty cacheline > left behind before booting to the new kernel to avoid silent corruption > from later cacheline writeback (Intel hardware doesn't guarantee cache > coherency across different KeyIDs). > > There are two things that the old kernel needs to do to achieve that: > > 1) Stop accessing TDX private memory mappings: > a. Stop making TDX module SEAMCALLs (TDX global KeyID); > b. Stop TDX guests from running (per-guest TDX KeyID). > 2) Flush any cachelines from previous TDX private KeyID writes. > > For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME > support. And in this way 1) happens for free as there's no TDX activity > between wbinvd() and the native_halt(). > > Flushing cache in stop_this_cpu() only flushes cache on remote cpus. On > the cpu which does kexec(), unlike SME which does the cache flush in > relocate_kernel(), do the cache flush right after stopping remote cpus > in machine_shutdown(). This is because on the platforms with above > erratum, the kernel needs to convert all TDX private pages back to > normal before a fast warm reset reboot or booting to the new kernel in > kexec(). Flushing cache in relocate_kernel() only covers the kexec() > but not the fast warm reset reboot. > > Theoretically, cache flush is only needed when the TDX module has been > initialized. However initializing the TDX module is done on demand at > runtime, and it takes a mutex to read the module status. Just check > whether TDX is enabled by the BIOS instead to flush cache. > > Signed-off-by: Kai Huang > Reviewed-by: Isaku Yamahata Reviewed-by: Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov