From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EE65C7EE25 for ; Mon, 12 Jun 2023 07:59:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAE276B0072; Mon, 12 Jun 2023 03:59:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B5D238E0005; Mon, 12 Jun 2023 03:59:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A25638E0003; Mon, 12 Jun 2023 03:59:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9291B6B0072 for ; Mon, 12 Jun 2023 03:59:27 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 62AA114014A for ; Mon, 12 Jun 2023 07:59:27 +0000 (UTC) X-FDA: 80893345974.12.60C9B3F Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf04.hostedemail.com (Postfix) with ESMTP id 0E3D24001E for ; Mon, 12 Jun 2023 07:59:24 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fCrIt9zA; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf04.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=kirill.shutemov@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686556765; a=rsa-sha256; cv=none; b=nPgBhyn5ZxUa35syCJfQL1h9tHy+/VB7rJeHtSQizOtSILSwesLPuwjdM39hTJwNPlzubq ZNV1bvP+dnP2X7syPH64m09LMEn2kr/sSG2C7wiEMPGZa36wslX17216kSFXya7b6/42Fb PKUgP+GOvsbq9yQoNFEQBVDxBC6Q+Zg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fCrIt9zA; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf04.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=kirill.shutemov@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686556765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WidRurjT0ZQjqOTCZg3cgknles+iUNaqKBpmmjcz4WU=; b=089hr/RGuiMWWnHrYGxJKwlrrpxCvtrVbz0xhGwHC1E/VdfENc/CxUxC6KtooINL+hh2L1 xFSliudzHV0PBvsbXdM0W+q8NTtp3SRsX6uBSHt73/WgbMbpDWz1sb2vHXEcGgMvIgAsU6 iCl8nlZ+xyld6+VQSTUSYnkOtYnDWWI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686556763; x=1718092763; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=ardrXUUaDCUC8rYZCEBB1Od/Uz+5UBeHiqci3S1ZWy0=; b=fCrIt9zABVZTVcDIONpUCpFfLUjp8vAJ86f/6KUy6O5EFuoFVNRw69Hd j97k1IyVOpolCIKefK3scjMlRNCyoYny+UcDAeZBxSJzkQjJA8wa1tAfn YnpSm8wvQ30wCfc/0ybN9esHciBhrGznIJ5+CYntKFAKj98OBNeO0kfrk kLVbskKzw8X9XYe6OMtd4gJj2/Aa9nLrg218z/Aec9ClqAg1KjIg0yz8c 34K42M+nXSwTrt+f8Rl7oW3QEVUn4mLOm9fl/HzpyHcLB54rUdOlussJH Zkx9ELg/B1AZVDLnzNxJrIYkFssF6Y5WpFJKm/NGk2wY69v1QLT59y1S9 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10738"; a="360457576" X-IronPort-AV: E=Sophos;i="6.00,236,1681196400"; d="scan'208";a="360457576" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 00:59:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10738"; a="855514931" X-IronPort-AV: E=Sophos;i="6.00,236,1681196400"; d="scan'208";a="855514931" Received: from smizr3x-mobl3.ger.corp.intel.com (HELO box.shutemov.name) ([10.249.43.127]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 00:59:13 -0700 Received: by box.shutemov.name (Postfix, from userid 1000) id 470A010CC1C; Mon, 12 Jun 2023 10:59:10 +0300 (+03) Date: Mon, 12 Jun 2023 10:59:10 +0300 From: "kirill.shutemov@linux.intel.com" To: "Huang, Kai" Cc: "kvm@vger.kernel.org" , "Hansen, Dave" , "david@redhat.com" , "bagasdotme@gmail.com" , "ak@linux.intel.com" , "Wysocki, Rafael J" , "linux-kernel@vger.kernel.org" , "Chatre, Reinette" , "Christopherson,, Sean" , "pbonzini@redhat.com" , "tglx@linutronix.de" , "linux-mm@kvack.org" , "Yamahata, Isaku" , "Luck, Tony" , "peterz@infradead.org" , "Shahar, Sagi" , "imammedo@redhat.com" , "Gao, Chao" , "Brown, Len" , "sathyanarayanan.kuppuswamy@linux.intel.com" , "Huang, Ying" , "Williams, Dan J" Subject: Re: [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum Message-ID: <20230612075910.jqkiofjm6mkdl7cy@box.shutemov.name> References: <116cafb15625ac0bcda7b47143921d0c42061b69.1685887183.git.kai.huang@intel.com> <20230609131754.dhii5ctfwtzx667o@box.shutemov.name> <90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com> X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0E3D24001E X-Stat-Signature: h1fjz7qijfxpmqax9jyazt1inybeuxzc X-HE-Tag: 1686556764-23452 X-HE-Meta: U2FsdGVkX1+eJaZB4BSjr1XufQwh5ASR60JY3wmYJAaNgFX4OlSmH1XSPuUkZc5DS3Bs7tZ/y/xiZsW1rxNpBkUy4sJXk5+f71n2WDy1y+q+VhQHGuwS+DdJncOPbuLj2cxFdHsnttiVimg/yqf/4qU1J6l5N95BmDBlW3zOXcM+fyYr6z2o6oFeMKoSII1cgPFwcICAZDhDV3tbEGB3v34gIvyypGf4ohpgxTNbAufayJskUvK3UppKXA+KCtErOeqBu3ezf1fzkvFHxm3iwzgSfUkSpUN22F40AQWEUtDMQpL9tfipIW88h3iiLmpF69oHSxUEfkNZvEDbvNgqkQ5LqeZVJfQd/D4r9TgHHkiijTtab+IK849VK1m2fjl4g4qZBzv8KdoVF6jbRolhSl7ElfArrFUX6FIy1u96VinhMMf/ph0Rg4D2ZIQPWwHbaqYiYAzvt0JQF5/IzOGgIh4iEi2td1SBIMAdadge8/FjjxYxmOF8MCzR2yPG7Gh7xo6qa+01PZJtatTONI8JIVtnP4cZCj6Crl7vYfScCi5d2010A1QR3oZOD3OLaElbD3M+EWZlo1PTFRcf0x1St+ynljbIvTnnBqmOeYx5w5dsBHTTbWTZ4U2T4QVZO9CoqiiUMUejL8Pu0MB/uKlZN6oxQ16FlXjaBJVII0n43u1M2N1N0jJIMvX5UAkE5n18JMNSyOZF+b42qJKh5oRc4g5HD5gPiikPDBfonO3KRnKEQxpcqaciHZktIW0cZy4KiyNZRUqxmJ6x93HxTDKayCArI9wrGy/GYjuUyO5uinQh4wotIK4Eh5roRA+Bot8qW2ChRDoAr97e50QYSJnBOIsADPQ0WGL9dXdfDHSiaBf/KnyMnLDtAn19naAHgnM+WeIe6Gwte8EgIyF5jb2yMX8hNr/Qz93ekf8WCBwHkjz1SLALE8UQpYDku2S09+jTEW2sAB7eMdBQ5lxcg9t E7Wn9rIk eEfajrFiJK2ZprzV6vAlIHmopOhnzfWSrzKhqcpZ/RRdKeaaT4b44CmaLL8fxjFGd1FblCvdakUMx/AvX/r/Xy1coY3d4pMmfnGXECKKDjciFsewdEQI8eLfGG910S6VQ1tovjulMhj1IPJNCNr6hGOltmm/Itph/ihYSDnhXjLw6QnoRhUkCnWBlzg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 12, 2023 at 03:08:40AM +0000, Huang, Kai wrote: > On Fri, 2023-06-09 at 16:17 +0300, kirill.shutemov@linux.intel.com wrote: > > On Mon, Jun 05, 2023 at 02:27:32AM +1200, Kai Huang wrote: > > > The first few generations of TDX hardware have an erratum. Triggering > > > it in Linux requires some kind of kernel bug involving relatively exotic > > > memory writes to TDX private memory and will manifest via > > > spurious-looking machine checks when reading the affected memory. > > > > > > == Background == > > > > > > Virtually all kernel memory accesses operations happen in full > > > cachelines. In practice, writing a "byte" of memory usually reads a 64 > > > byte cacheline of memory, modifies it, then writes the whole line back. > > > Those operations do not trigger this problem. > > > > > > This problem is triggered by "partial" writes where a write transaction > > > of less than cacheline lands at the memory controller. The CPU does > > > these via non-temporal write instructions (like MOVNTI), or through > > > UC/WC memory mappings. The issue can also be triggered away from the > > > CPU by devices doing partial writes via DMA. > > > > > > == Problem == > > > > > > A partial write to a TDX private memory cacheline will silently "poison" > > > the line. Subsequent reads will consume the poison and generate a > > > machine check. According to the TDX hardware spec, neither of these > > > things should have happened. > > > > > > To add insult to injury, the Linux machine code will present these as a > > > literal "Hardware error" when they were, in fact, a software-triggered > > > issue. > > > > > > == Solution == > > > > > > In the end, this issue is hard to trigger. Rather than do something > > > rash (and incomplete) like unmap TDX private memory from the direct map, > > > improve the machine check handler. > > > > > > Currently, the #MC handler doesn't distinguish whether the memory is > > > TDX private memory or not but just dump, for instance, below message: > > > > > > [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134 > > > [...] mce: [Hardware Error]: RIP 10: {__tlb_remove_page_size+0x10/0xa0} > > > ... > > > [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii' > > > [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel > > > [...] Kernel panic - not syncing: Fatal local machine check > > > > > > Which says "Hardware Error" and "Data load in unrecoverable area of > > > kernel". > > > > > > Ideally, it's better for the log to say "software bug around TDX private > > > memory" instead of "Hardware Error". But in reality the real hardware > > > memory error can happen, and sadly such software-triggered #MC cannot be > > > distinguished from the real hardware error. Also, the error message is > > > used by userspace tool 'mcelog' to parse, so changing the output may > > > break userspace. > > > > > > So keep the "Hardware Error". The "Data load in unrecoverable area of > > > kernel" is also helpful, so keep it too. > > > > > > Instead of modifying above error log, improve the error log by printing > > > additional TDX related message to make the log like: > > > > > > ... > > > [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel > > > [...] mce: [Hardware Error]: Machine Check: Memory error from TDX private memory. May be result of CPU erratum. > > > > The message mentions one part of issue -- CPU erratum -- but misses the > > other required part -- kernel bug that makes kernel access the memory it > > not suppose to. > > > > How about below? > > "Memory error from TDX private memory. May be result of CPU erratum caused by > kernel bug." Fine, I guess. -- Kiryl Shutsemau / Kirill A. Shutemov