From: "kirill.shutemov@linux.intel.com" <kirill.shutemov@linux.intel.com>
To: "Huang, Kai" <kai.huang@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Hansen, Dave" <dave.hansen@intel.com>,
"david@redhat.com" <david@redhat.com>,
"bagasdotme@gmail.com" <bagasdotme@gmail.com>,
"ak@linux.intel.com" <ak@linux.intel.com>,
"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Chatre, Reinette" <reinette.chatre@intel.com>, "Christopherson,,
Sean" <seanjc@google.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"Luck, Tony" <tony.luck@intel.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"Shahar, Sagi" <sagis@google.com>,
"imammedo@redhat.com" <imammedo@redhat.com>,
"Gao, Chao" <chao.gao@intel.com>,
"Brown, Len" <len.brown@intel.com>,
"sathyanarayanan.kuppuswamy@linux.intel.com"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Huang, Ying" <ying.huang@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum
Date: Mon, 12 Jun 2023 10:59:10 +0300 [thread overview]
Message-ID: <20230612075910.jqkiofjm6mkdl7cy@box.shutemov.name> (raw)
In-Reply-To: <90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com>
On Mon, Jun 12, 2023 at 03:08:40AM +0000, Huang, Kai wrote:
> On Fri, 2023-06-09 at 16:17 +0300, kirill.shutemov@linux.intel.com wrote:
> > On Mon, Jun 05, 2023 at 02:27:32AM +1200, Kai Huang wrote:
> > > The first few generations of TDX hardware have an erratum. Triggering
> > > it in Linux requires some kind of kernel bug involving relatively exotic
> > > memory writes to TDX private memory and will manifest via
> > > spurious-looking machine checks when reading the affected memory.
> > >
> > > == Background ==
> > >
> > > Virtually all kernel memory accesses operations happen in full
> > > cachelines. In practice, writing a "byte" of memory usually reads a 64
> > > byte cacheline of memory, modifies it, then writes the whole line back.
> > > Those operations do not trigger this problem.
> > >
> > > This problem is triggered by "partial" writes where a write transaction
> > > of less than cacheline lands at the memory controller. The CPU does
> > > these via non-temporal write instructions (like MOVNTI), or through
> > > UC/WC memory mappings. The issue can also be triggered away from the
> > > CPU by devices doing partial writes via DMA.
> > >
> > > == Problem ==
> > >
> > > A partial write to a TDX private memory cacheline will silently "poison"
> > > the line. Subsequent reads will consume the poison and generate a
> > > machine check. According to the TDX hardware spec, neither of these
> > > things should have happened.
> > >
> > > To add insult to injury, the Linux machine code will present these as a
> > > literal "Hardware error" when they were, in fact, a software-triggered
> > > issue.
> > >
> > > == Solution ==
> > >
> > > In the end, this issue is hard to trigger. Rather than do something
> > > rash (and incomplete) like unmap TDX private memory from the direct map,
> > > improve the machine check handler.
> > >
> > > Currently, the #MC handler doesn't distinguish whether the memory is
> > > TDX private memory or not but just dump, for instance, below message:
> > >
> > > [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134
> > > [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0}
> > > ...
> > > [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> > > [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> > > [...] Kernel panic - not syncing: Fatal local machine check
> > >
> > > Which says "Hardware Error" and "Data load in unrecoverable area of
> > > kernel".
> > >
> > > Ideally, it's better for the log to say "software bug around TDX private
> > > memory" instead of "Hardware Error". But in reality the real hardware
> > > memory error can happen, and sadly such software-triggered #MC cannot be
> > > distinguished from the real hardware error. Also, the error message is
> > > used by userspace tool 'mcelog' to parse, so changing the output may
> > > break userspace.
> > >
> > > So keep the "Hardware Error". The "Data load in unrecoverable area of
> > > kernel" is also helpful, so keep it too.
> > >
> > > Instead of modifying above error log, improve the error log by printing
> > > additional TDX related message to make the log like:
> > >
> > > ...
> > > [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> > > [...] mce: [Hardware Error]: Machine Check: Memory error from TDX private memory. May be result of CPU erratum.
> >
> > The message mentions one part of issue -- CPU erratum -- but misses the
> > other required part -- kernel bug that makes kernel access the memory it
> > not suppose to.
> >
>
> How about below?
>
> "Memory error from TDX private memory. May be result of CPU erratum caused by
> kernel bug."
Fine, I guess.
--
Kiryl Shutsemau / Kirill A. Shutemov
next prev parent reply other threads:[~2023-06-12 7:59 UTC|newest]
Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1685887183.git.kai.huang@intel.com>
[not found] ` <af4e428ab1245e9441031438e606c14472daf927.1685887183.git.kai.huang@intel.com>
[not found] ` <a2da8af2-41a9-a0cf-dbe9-7f0a14bf05fe@linux.intel.com>
2023-06-06 22:58 ` [PATCH v11 02/20] x86/virt/tdx: Detect TDX during kernel boot Huang, Kai
2023-06-06 23:44 ` Isaku Yamahata
2023-06-19 12:12 ` David Hildenbrand
2023-06-19 23:58 ` Huang, Kai
[not found] ` <ec640452a4385d61bec97f8b761ed1ff38898504.1685887183.git.kai.huang@intel.com>
2023-06-06 23:55 ` [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure Isaku Yamahata
2023-06-07 14:24 ` Dave Hansen
2023-06-07 18:53 ` Isaku Yamahata
2023-06-07 19:27 ` Dave Hansen
2023-06-07 19:47 ` Isaku Yamahata
2023-06-07 20:08 ` Sean Christopherson
2023-06-07 20:22 ` Dave Hansen
2023-06-08 0:51 ` Huang, Kai
2023-06-08 13:50 ` Dave Hansen
2023-06-07 22:56 ` Huang, Kai
2023-06-08 14:05 ` Dave Hansen
2023-06-19 12:52 ` David Hildenbrand
2023-06-20 10:37 ` Huang, Kai
2023-06-20 12:20 ` kirill.shutemov
2023-06-20 12:39 ` David Hildenbrand
2023-06-20 15:15 ` Dave Hansen
[not found] ` <86f2a8814240f4bbe850f6a09fc9d0b934979d1b.1685887183.git.kai.huang@intel.com>
[not found] ` <20230606123821.exit7gyxs42dxotz@box.shutemov.name>
2023-06-06 22:58 ` [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum Huang, Kai
2023-06-07 15:06 ` kirill.shutemov
2023-06-07 14:15 ` Dave Hansen
2023-06-07 22:43 ` Huang, Kai
2023-06-19 11:37 ` Huang, Kai
2023-06-20 15:44 ` Dave Hansen
2023-06-20 23:11 ` Huang, Kai
2023-06-19 12:21 ` David Hildenbrand
2023-06-20 10:31 ` Huang, Kai
2023-06-20 15:39 ` Dave Hansen
2023-06-20 16:03 ` David Hildenbrand
2023-06-20 16:21 ` Dave Hansen
[not found] ` <21b3a45cb73b4e1917c1eba75b7769781a15aa14.1685887183.git.kai.huang@intel.com>
2023-06-07 15:22 ` [PATCH v11 07/20] x86/virt/tdx: Add skeleton to enable TDX on demand Dave Hansen
2023-06-08 2:10 ` Huang, Kai
2023-06-08 13:43 ` Dave Hansen
2023-06-12 11:21 ` Huang, Kai
2023-06-19 13:16 ` David Hildenbrand
2023-06-19 23:28 ` Huang, Kai
[not found] ` <468533166590ff5ed11730350c4af8cdb0b99165.1685887183.git.kai.huang@intel.com>
2023-06-07 15:48 ` [PATCH v11 09/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Dave Hansen
2023-06-07 23:22 ` Huang, Kai
2023-06-08 22:40 ` kirill.shutemov
[not found] ` <f9148e67e968d7aed4707b67ea9b1aa761401255.1685887183.git.kai.huang@intel.com>
2023-06-07 15:54 ` [PATCH v11 10/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Dave Hansen
2023-06-07 15:57 ` Dave Hansen
2023-06-08 10:18 ` Huang, Kai
2023-06-08 22:52 ` kirill.shutemov
2023-06-12 2:21 ` Huang, Kai
2023-06-12 3:01 ` Dave Hansen
[not found] ` <cee2f2664aac3c5314896c6d14cba50f2617c0e5.1685887183.git.kai.huang@intel.com>
2023-06-08 0:08 ` [PATCH v11 03/20] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC kirill.shutemov
[not found] ` <9b3582c9f3a81ae68b32d9997fcd20baecb63b9b.1685887183.git.kai.huang@intel.com>
2023-06-07 8:19 ` [PATCH v11 06/20] x86/virt/tdx: Handle SEAMCALL running out of entropy error Isaku Yamahata
2023-06-07 15:08 ` Dave Hansen
2023-06-07 23:36 ` Huang, Kai
2023-06-08 0:29 ` Dave Hansen
2023-06-08 0:08 ` kirill.shutemov
2023-06-09 14:42 ` Nikolay Borisov
2023-06-12 11:04 ` Huang, Kai
2023-06-19 13:00 ` David Hildenbrand
2023-06-20 10:39 ` Huang, Kai
2023-06-20 11:14 ` David Hildenbrand
[not found] ` <50386eddbb8046b0b222d385e56e8115ed566526.1685887183.git.kai.huang@intel.com>
2023-06-07 15:25 ` [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory Dave Hansen
2023-06-08 0:27 ` kirill.shutemov
2023-06-08 2:40 ` Huang, Kai
2023-06-08 11:41 ` kirill.shutemov
2023-06-08 13:13 ` Dave Hansen
2023-06-12 2:00 ` Huang, Kai
2023-06-08 23:29 ` Isaku Yamahata
2023-06-08 23:54 ` kirill.shutemov
2023-06-09 1:33 ` Isaku Yamahata
2023-06-09 10:02 ` kirill.shutemov
2023-06-12 2:00 ` Huang, Kai
2023-06-19 13:29 ` David Hildenbrand
2023-06-19 23:51 ` Huang, Kai
2023-06-08 21:03 ` [PATCH v11 00/20] TDX host kernel support Dan Williams
2023-06-12 10:56 ` Huang, Kai
[not found] ` <4e108968c3294189ad150f62df1f146168036342.1685887183.git.kai.huang@intel.com>
2023-06-08 23:24 ` [PATCH v11 12/20] x86/virt/tdx: Allocate and set up PAMTs for TDMRs kirill.shutemov
2023-06-08 23:43 ` Dave Hansen
2023-06-12 2:52 ` Huang, Kai
2023-06-25 15:38 ` Huang, Kai
2023-06-15 7:48 ` Nikolay Borisov
[not found] ` <409448809f7c78191aa27d6d2970ba1384c2d464.1685887183.git.kai.huang@intel.com>
2023-06-08 23:53 ` [PATCH v11 13/20] x86/virt/tdx: Designate reserved areas for all TDMRs kirill.shutemov
[not found] ` <4e6cd933edd2501147366df7a17e1087560a4320.1685887183.git.kai.huang@intel.com>
2023-06-08 23:53 ` [PATCH v11 14/20] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID kirill.shutemov
[not found] ` <34853e0f8f38ec2fda66b0ba480d4df63b8aab43.1685887183.git.kai.huang@intel.com>
2023-06-08 23:56 ` [PATCH v11 20/20] Documentation/x86: Add documentation for TDX host support Dave Hansen
2023-06-12 3:41 ` Huang, Kai
2023-06-16 9:02 ` Nikolay Borisov
2023-06-16 16:26 ` Dave Hansen
[not found] ` <7bd7d0c6196deb58b54d6e629603775844b1307d.1685887183.git.kai.huang@intel.com>
2023-06-09 10:03 ` [PATCH v11 16/20] x86/virt/tdx: Initialize all TDMRs kirill.shutemov
[not found] ` <17bcbe3e154415ee7a4c77489809a3db0c5ddf3f.1685887183.git.kai.huang@intel.com>
2023-06-09 10:14 ` [PATCH v11 17/20] x86/kexec: Flush cache of TDX private memory kirill.shutemov
[not found] ` <116cafb15625ac0bcda7b47143921d0c42061b69.1685887183.git.kai.huang@intel.com>
2023-06-09 13:17 ` [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum kirill.shutemov
2023-06-12 3:08 ` Huang, Kai
2023-06-12 7:59 ` kirill.shutemov [this message]
2023-06-12 13:51 ` Dave Hansen
2023-06-12 23:31 ` Huang, Kai
[not found] ` <5aa7506d4fedbf625e3fe8ceeb88af3be1ce97ea.1685887183.git.kai.huang@intel.com>
2023-06-09 13:23 ` [PATCH v11 18/20] x86: Handle TDX erratum to reset TDX private memory during kexec() and reboot kirill.shutemov
2023-06-12 3:06 ` Huang, Kai
2023-06-12 7:58 ` kirill.shutemov
2023-06-12 10:27 ` Huang, Kai
2023-06-12 11:48 ` kirill.shutemov
2023-06-12 13:18 ` David Laight
2023-06-12 13:47 ` Dave Hansen
2023-06-13 0:51 ` Huang, Kai
2023-06-13 11:05 ` kirill.shutemov
2023-06-14 0:15 ` Huang, Kai
2023-06-13 14:25 ` Dave Hansen
2023-06-13 23:18 ` Huang, Kai
2023-06-14 0:24 ` Dave Hansen
2023-06-14 0:38 ` Huang, Kai
2023-06-14 0:42 ` Huang, Kai
2023-06-19 11:43 ` Huang, Kai
2023-06-19 14:31 ` Dave Hansen
2023-06-19 14:46 ` kirill.shutemov
2023-06-19 23:35 ` Huang, Kai
2023-06-19 23:41 ` Dave Hansen
2023-06-20 0:56 ` Huang, Kai
2023-06-20 1:06 ` Dave Hansen
2023-06-20 7:58 ` Peter Zijlstra
2023-06-25 15:30 ` Huang, Kai
2023-06-25 23:26 ` Huang, Kai
2023-06-20 7:48 ` Peter Zijlstra
2023-06-20 8:11 ` Peter Zijlstra
2023-06-20 10:42 ` Huang, Kai
2023-06-20 10:56 ` Peter Zijlstra
2023-06-14 9:33 ` Huang, Kai
2023-06-14 10:02 ` kirill.shutemov
2023-06-14 10:58 ` Huang, Kai
2023-06-14 11:08 ` kirill.shutemov
2023-06-14 11:17 ` Huang, Kai
[not found] ` <927ec9871721d2a50f1aba7d1cf7c3be50e4f49b.1685887183.git.kai.huang@intel.com>
2023-06-07 16:05 ` [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions Dave Hansen
2023-06-08 10:48 ` Huang, Kai
2023-06-08 13:11 ` Dave Hansen
2023-06-12 2:33 ` Huang, Kai
2023-06-12 14:33 ` kirill.shutemov
2023-06-12 22:10 ` Huang, Kai
2023-06-13 10:18 ` kirill.shutemov
2023-06-13 23:19 ` Huang, Kai
2023-06-08 23:02 ` kirill.shutemov
2023-06-12 2:25 ` Huang, Kai
2023-06-09 4:01 ` Sathyanarayanan Kuppuswamy
2023-06-12 2:28 ` Huang, Kai
2023-06-14 12:31 ` Nikolay Borisov
2023-06-14 22:45 ` Huang, Kai
[not found] ` <30358db4eff961c69783bbd4d9f3e50932a9a759.1685887183.git.kai.huang@intel.com>
2023-06-08 23:53 ` [PATCH v11 15/20] x86/virt/tdx: Configure global KeyID on all packages kirill.shutemov
2023-06-15 8:12 ` Nikolay Borisov
2023-06-15 22:24 ` Huang, Kai
2023-06-19 14:56 ` kirill.shutemov
2023-06-19 23:38 ` Huang, Kai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230612075910.jqkiofjm6mkdl7cy@box.shutemov.name \
--to=kirill.shutemov@linux.intel.com \
--cc=ak@linux.intel.com \
--cc=bagasdotme@gmail.com \
--cc=chao.gao@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=imammedo@redhat.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=reinette.chatre@intel.com \
--cc=sagis@google.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox