From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f197.google.com (mail-io0-f197.google.com [209.85.223.197]) by kanga.kvack.org (Postfix) with ESMTP id AFCE26B02F3 for ; Thu, 17 Aug 2017 19:32:19 -0400 (EDT) Received: by mail-io0-f197.google.com with SMTP id 41so72749096iop.2 for ; Thu, 17 Aug 2017 16:32:19 -0700 (PDT) Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id w70si2587416pgw.626.2017.08.17.16.32.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Aug 2017 16:32:18 -0700 (PDT) From: "Luck, Tony" Subject: RE: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages Date: Thu, 17 Aug 2017 23:32:16 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F61342363@ORSMSX114.amr.corp.intel.com> References: <20170816171803.28342-1-tony.luck@intel.com> <20170817150942.017f87537b6cbb48e9cfc082@linux-foundation.org> In-Reply-To: <20170817150942.017f87537b6cbb48e9cfc082@linux-foundation.org> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Borislav Petkov , "Hansen, Dave" , Naoya Horiguchi , "Elliott, Robert (Persistent Memory)" , "x86@kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" > It's unclear (to lil ole me) what the end-user-visible effects of this > are. > > Could we please have a description of that? So a) people can > understand your decision to cc:stable and b) people whose kernels are > misbehaving can use your description to decide whether your patch might > fix the issue their users are reporting. Ingo already applied this to the tip tree, so too late to fix the commit me= ssage :-( A very, very, unlucky end user with a system that supports machine check re= covery (Xeon E7, or Xeon-SP-platinum) that has recovered from one or more uncorrec= ted memory errors (lucky so far) might find a subsequent uncorrected memory err= or flagged as fatal because the machine check bank that should log the error is alread= y occupied by a log caused by a speculative access to one of the earlier uncorrected e= rrors (the unlucky part). We haven't seen this happen at the Linux OS level, but it is a theoretical = possibility. [Some BIOS that map physical memory 1:1 have seen this when doing eMCA proc= essing for the first error ... as soon as they load the address of the error from = the MCi_ADDR register they are vulnerable to some speculative access dereferencing the r= egister with=20 the address and setting the overflow bit in the machine check bank that sti= ll holds the original log]. -Tony -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org