From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B762EB64DB for ; Tue, 20 Jun 2023 16:21:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C85458D0002; Tue, 20 Jun 2023 12:21:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C0C648D0001; Tue, 20 Jun 2023 12:21:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AADA88D0002; Tue, 20 Jun 2023 12:21:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9AF818D0001 for ; Tue, 20 Jun 2023 12:21:43 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6867B140672 for ; Tue, 20 Jun 2023 16:21:43 +0000 (UTC) X-FDA: 80923642086.02.DC35550 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf05.hostedemail.com (Postfix) with ESMTP id 8ABF7100017 for ; Tue, 20 Jun 2023 16:21:40 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hXkbFdv1; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of dave.hansen@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dave.hansen@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687278101; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x6duUwuQ9s7uQmjst1wPu5zim2i1dNwrZaWwPiLob14=; b=fHh84JpSkfvShnN/Da7PhfeSRpDeJ/BKB48HfIDuJL3gqOdR0ALwGZjgxPf6A9MJIMVjav juPhTd8licpxv4ox4u0e25Dmx+Qsl5c29u3cB5Fk4mOuN8eGo4j4CxvhpSChMiXWlmO+Hm lzEmTI10AFwzympcWkI2bXdWUexsGUM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hXkbFdv1; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of dave.hansen@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dave.hansen@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687278101; a=rsa-sha256; cv=none; b=BZZQhK3YBUWi9xDyAN515ZS1WMm2k7sXeYydYKHmxc6RJFFgvHbijPHRBtPUHFs3C8SbRS RlEp4DnEkuXv4jpz8XhUK1gLQMuXhglXPSlhbbyG318S3Qrxn+axy2D8PwZtwCchePgcc9 v/0U6ytOJ40wIP9L5GyGOU1/ezk0cBw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687278100; x=1718814100; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=JdugfFBFPuP9CneMUY34icuY2vnuoN7wjB/Ecq9vP+0=; b=hXkbFdv1TgALxiiMzqzkjwEAX9S4w+0lW7CysgAHwalMX0ZtBYkxSnkD JnRXi86BmytV8Up6/7HAFdVoKST0+Ddg1YPuBpatfFMXoESOsmsAvhefC iL/2p8Pmy7EYi4702+eKA/OKFxvOHvRYJrbuX5ZXfl47JYyECHo+i/HIc hwzl57yfiX2+yqz1fqWxgdRH1jhXCTUjtZGgb2C8duYc54sKxFH4Bb//S xEHsB6802c6rM8FFHz3b9ku/V7oPpj0hcoRmmU5L1kaCwzKcrrCr/HHlX mxk2TjXwV5K5EoEg2A41O78cnaYpU0zd1IkW/UOR2w1zyUWz+fMhsG80L Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10747"; a="344650731" X-IronPort-AV: E=Sophos;i="6.00,257,1681196400"; d="scan'208";a="344650731" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2023 09:21:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10747"; a="714094674" X-IronPort-AV: E=Sophos;i="6.00,257,1681196400"; d="scan'208";a="714094674" Received: from rashmigh-mobl.amr.corp.intel.com (HELO [10.255.228.28]) ([10.255.228.28]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2023 09:21:38 -0700 Message-ID: Date: Tue, 20 Jun 2023 09:21:38 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum Content-Language: en-US To: David Hildenbrand , Kai Huang , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, kirill.shutemov@linux.intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, "Raj, Ashok" References: <86f2a8814240f4bbe850f6a09fc9d0b934979d1b.1685887183.git.kai.huang@intel.com> <723dd9da-ebd5-edb0-e9e5-2d8c14aaffe2@redhat.com> <216753fd-c659-711e-12d0-d12e34110efc@redhat.com> From: Dave Hansen In-Reply-To: <216753fd-c659-711e-12d0-d12e34110efc@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 8ABF7100017 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: jfpsmkqjjyh4ha6pii9gqzdua338go67 X-HE-Tag: 1687278100-812918 X-HE-Meta: U2FsdGVkX1+ig0vVxKz6V11hldbMWD+nY/FOUpV6PHqFuzXWwbRvnDhI49e0oPxj3lH/ZNW3uCdrMbEgftqZEFWgRhU9NS36lWJZxMbcwKnlcGXB+7Cg3/RwSjc99O/2oATwp6ByjrmlQfqtlZmVvmKv4Fdxz1aRKp6Rd7C8tfZnpmwSF1JGaJnE8vWISSzNGaQVYDh1Qyw3k4e+le4pGJEqc7vFWwov0uITKibRHGRih1NjyiZl6eh2/TYbsRyC6f2c2ytQs6Fkr7Wtkt8YXRzHVhtN0SFm0ZvMl5oloafCcGnR7dzjdzthtBTOs67S4uNeBKGYMYsR7OhvJtftONZ2H9SXMFNgRBiQoAY1KTU7IQkHhHjhaKpklhIclO8sgDNrUS5jEkbbRbSdfM5noHGvCEh3/Nzpe5v8puERwXdIrXnJwcGbW9buBQd/6teBS/C9Nx5Y2o9fmyaOuw6M95iqlp4sI5v0VAFdktHnTMWJtNxyzDxf/f99ckn3ZyMiDVvey+l6KzqtYz7en6eD+rVtZqRgVJcLsdzH2IpqUK9aP6npbyz4r25iHhm/6sFug/QKckTnKsVUV2m+Qqtqu4m5l9zCYEB++Drq7md1a/9jGHJweADNtMmFZt9RjEF3b0GP7/wqqtTPMU8d2R+5MMKL0ZFpcfrY9v1KgI0xC7T14eEwuJ9j6YZZ8Cl+lMxpC/X9m5PWC/CUGliPZiYOszZw6PoAMXx3XlTuHulXCjzmgqvlk0x87ngaAGBS9WszB01yD6vxKXnCVENRIoSEm7GROIrYBVR78WgEi3uS8qYV0M/zesFWN9nXBYCO2RL0iANgCYa79Ff6rhb1+lvYC+ZTKEJ6pcfwo2TYDjAkajKKoIEqkMK52/QGHI9ldZaJtjanPRiHDDff3P0ezhE+/DXEGedxZ6EOFt4V4KUsEk8YU5R7AnPBo1FnGpru7OTwuPdsT2asLvmnQUny1XC fmZMRmor CWCuOK4R5JjLVC4Y0o5Hb3Cea1oH77csWlowiiBO71a1qpt+9IYPKiXAP216FIYrVTiRJwwCvdNLl00L6MF68t+ZUF/Nu4cKM94pe+oaWM4rpdxptgdEoU6XyVyWh7OmoyM0JL2kJMI+V6L34GJ7gAJyMkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.010939, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/20/23 09:03, David Hildenbrand wrote: > On 20.06.23 17:39, Dave Hansen wrote: >> On 6/19/23 05:21, David Hildenbrand wrote: >>> So, ordinary writes to TD private memory are not a problem? I thought >>> one motivation for the unmapped-guest-memory discussion was to prevent >>> host (userspace) writes to such memory because it would trigger a MC and >>> eventually crash the host. >> >> Those are two different problems. >> >> Problem #1 (this patch): The host encounters poison when going about its >> normal business accessing normal memory.  This happens when something in >> the host accidentally clobbers some TDX memory and *then* reads it. >> Only occurs with partial writes. >> >> Problem #2 (addressed with unmapping): Host *userspace* intentionally >> and maliciously clobbers some TDX memory and then the TDX module or a >> TDX guest can't run because the memory integrity checks (checksum or TD >> bit) fail.  This can also take the system down because #MC's are nasty. >> >> Host userspace unmapping doesn't prevent problem #1 because it's the >> kernel who screwed up with the _kernel_ mapping. > > Ahh, thanks for verifying. I was hoping that problem #2 would get fixed > in HW as well (and treated like a BUG). No, it's really working as designed. #1 _can_ be fixed because the hardware can just choose to let the host run merrily along corrupting TDX data and blissfully unaware of the carnage until TDX stumbles on the mess. Blissful ignorance really is a useful feature here. It means, for instance, that if the kernel screws up, it can still blissfully kexec(), reboot , boot a new kernel, or dump to the console without fear of #MC. #2 is much harder because the TDX data is destroyed and yet the TDX side still wants to run. The SEV folks chose page faults on write to stop SEV from running and the TDX folks chose #MC on reads as the mechanism. All of the nastiness on the TDX side is (IMNHO) really a consequence of that decision to use machine checks. (Aside: I'm not specifically crapping on the TDX CPU designers here. I don't particularly like the SEV approach either. But this mess is a result of the TDX design choices. There are other messes in other patch series from SEV. ) > Because problem #2 also sounds like something that directly violates the > first paragraph of this patch description "violations of > this integrity protection are supposed to only affect TDX operations and > are never supposed to affect the host kernel itself." > > So I would expect the TDX guest to fail hard, but not other TDX guests > (or the host kernel). This is more fallout from the #MC design choice. Let's use page faults as an example since our SEV friends are using them. *ANY* instruction that reads memory can page fault, have the kernel fix up the fault, and continue merrily along its way. #MC is fundamentally different. The exceptions can be declared to be unrecoverable. The CPU says, "whoopsie, I managed to deliver this #MC, but it would be too hard for me so I can't continue." These "too hard" scenarios are shrinking over time, but they do exist. They're fatal.