[RFD PATCH] x86/mce: Make sure to send SIGBUS even after losing the race to poison a page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tony Luck <tony.luck@intel.com>
To: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tony Luck <tony.luck@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	Youquan Song <youquan.song@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [RFD PATCH] x86/mce: Make sure to send SIGBUS even after losing the race to poison a page
Date: Thu, 27 Aug 2020 09:32:05 -0700	[thread overview]
Message-ID: <20200827163205.23096-1-tony.luck@intel.com> (raw)

For discussion ... I'm 100% sure the patch below is the wrong way to
fix this ... for one thing it doesn't provide the virtual address of
the error to the user signal handler. For another it just looks like
a hack.  I'm just not sure whether to delve deep into the
memory_failure() path to make sure the signal is sent to the current
process in the SRAR case. Or just to do a better job at an error return
value and make the X86 specific code send the signal with the address.

I've also got a feeling that this issue has been discussed before, but
can't remember how that turned out.

First few paragraphs describe the problem (and I think are OK). Things
go off the rails with the fix.

-Tony

X86 hardware may provide two indications that a page has poison. First
the memory controller that detects the failed ECC check may log a UCNA
(uncorrected no action) signature in one machine check bank and signal
the OS with a CMCI (corrected machine check interrupt ... historical name
did not get updated for this case). Next the processor core may log a
SRAR (software recoverable action required) signature in another bank
and signal with #MC.

The #MC used to win that race and the page was taken offline and SIGBUS
sent to the task.

Changes to how Linux processes machine checks now mean that:
a) Linux will offline the page based on the UCNA siganture in the
   CMCI handler.
b) The machine check handler defers processing using task_work_add() which
   can happen after the CMCI is processed.

memory_failure() avoids races with multiple callers reporting the same
page with an atomic test and set operation to mark the page as poisoned.

The net result of all of the above is that when a task consumes poison
the page is taken offline by the UCNA/CMCI path, and the SRAR/#MC path
takes an early return without sending a SIGBUS.

Fix by changing memory_failure() to return -EEXIST in the case where
the page is already poisoned and make the machine check code path check
for this error and force a SIGBUS.

Note that -EBUSY might have been a more logical error code, but that is
already used for many other error cases from memory_failure().

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 7 +++++--
 mm/memory-failure.c            | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index fb6b5f64f7e6..8515809e0472 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1182,18 +1182,21 @@ static void kill_me_maybe(struct callback_head *cb)
 {
 	struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me);
 	int flags = MF_ACTION_REQUIRED;
+	int ret;

 	pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr);

 	if (!p->mce_ripv)
 		flags |= MF_MUST_KILL;

-	if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags)) {
+	ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags);
+	if (!ret) {
 		set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
 		return;
 	}

-	pr_err("Memory error not recovered");
+	if (ret != -EEXIST)
+		pr_err("Memory error not recovered");
 	kill_me_now(cb);
 }

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f1aa6433f404..e0486c4e0130 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1298,7 +1298,7 @@ int memory_failure(unsigned long pfn, int flags)
 	if (TestSetPageHWPoison(p)) {
 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
 			pfn);
-		return 0;
+		return -EEXIST;
 	}

 	orig_head = hpage = compound_head(p);
-- 
2.21.1

next             reply	other threads:[~2020-08-27 16:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-27 16:32 Tony Luck [this message]
2020-08-28  8:11 ` HORIGUCHI NAOYA(堀口　直也)
2020-09-03 16:55 ` Borislav Petkov
2020-09-03 17:09   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200827163205.23096-1-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox