From: Andrew Zaborowski <andrew.zaborowski@intel.com>
To: Borislav Petkov <bp@alien8.de>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Eric Biederman <ebiederm@xmission.com>,
"x86@kernel.org" <x86@kernel.org>, Tony <tony.luck@intel.com>,
Andrew Zaborowski <balrogg@gmail.com>
Subject: Re: [RESEND][PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE
Date: Sat, 10 Aug 2024 03:20:10 +0200 [thread overview]
Message-ID: <CAOq732KnHFo3VaRH9V-x0k5m=h1jyNrdtKj4quG8Yaq7wPQjKg@mail.gmail.com> (raw)
In-Reply-To: <SA1PR11MB69927AE28B46583DCB5C97DEE7BA2@SA1PR11MB6992.namprd11.prod.outlook.com>
Borislav Petkov <bp@alien8.de> wrote:
> So instead of the process getting killed, you want to return SIGBUS
> because, "hey caller, your process encountered an MCE while being
> attempted to be executed"?
The tests could be changed to expect the SIGSEGV but in this case it
seemed that the test was good and the kernel was misbehaving. One of
the authors of the MCE handling code confirmed that.
>
> > Qemu relies on the SIGBUS logic but the execve and rseq
> > cases cannot be recovered from, the main benefit of sending the
> > correct signal is perhaps information to the user.
>
> You will have that info in the logs - we're usually very loud when we
> get an MCE...
True, though that's hard to link to a specific process crash. It's
also hard to extract the page address in the process's address space
from that, although I don't think there's a current use case.
>
> > If this cannot be fixed then optimally it should be documented.
>
> I'm not convinced at all that jumping through hoops you're doing, is
> worth the effort.
That could be, again this could be fixed in the documentation instead.
>
> > As for "all that code", the memory failure handling code is of certain
> > size and this is a comparatively tiny fix for a tiny issue.
>
> No, I didn't say anything about the memory failure code - it is about
I was replying to your comment about the size of the change.
> supporting that obscure use case and the additional logic you're adding
> to the #MC handler which looks like a real mess already and us having to
> support that use case indefinitely.
Supporting something generally includes supporting the common and the
obscure cases. From the user's point of view the kernel has been
committed to supporting these scenarios indefinitely or until the
deprecation of the SIGBUS-on-memory-error logic, and simply has a bug.
>
> So why does it matter if a process which is being executed and gets an
> MCE beyond the point of no return absolutely needs to return SIGBUS vs
> it getting killed and you still get an MCE logged on the machine, in
> either case?
A SIGSEGV strongly implies a problem with the program being run, not a
specific instance of it. A SIGBUS could be not the program's fault,
like in this case.
In these tests the workload was simply relaunched on a SIGBUS which
sounded fair to me. A qemu VM could similarly be restarted on an
unrecoverable MCE in a page that doesn't belong to the VM but to qemu
itself.
Best regards
next prev parent reply other threads:[~2024-08-10 1:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 14:47 Andrew Zaborowski
2024-07-23 14:47 ` [RESEND][PATCH 2/3] execve: Ensure SIGBUS delivered on memory failure Andrew Zaborowski
2024-07-23 14:47 ` [RESEND][PATCH 3/3] rseq: " Andrew Zaborowski
2024-08-06 4:37 ` Kees Cook
2024-08-06 7:51 ` Peter Zijlstra
2024-08-06 14:19 ` Mathieu Desnoyers
2024-08-06 4:36 ` [RESEND][PATCH 1/3] x86: Add task_struct flag to force SIGBUS on MCE Kees Cook
2024-08-06 8:35 ` Borislav Petkov
[not found] ` <SA1PR11MB69926BFE8EFDA7B3C3D84560E7B82@SA1PR11MB6992.namprd11.prod.outlook.com>
[not found] ` <CAOq732KXwsKdht55E-Z18choiAYn6dMpXc-TD15B7MOUH1fpxQ@mail.gmail.com>
[not found] ` <20240808145331.GAZrTb60FX_I3p0Ukx@fat_crate.local>
2024-08-09 1:22 ` Andrew Zaborowski
2024-08-09 8:34 ` Borislav Petkov
[not found] ` <SA1PR11MB69927AE28B46583DCB5C97DEE7BA2@SA1PR11MB6992.namprd11.prod.outlook.com>
2024-08-10 1:20 ` Andrew Zaborowski [this message]
2024-08-10 3:21 ` Borislav Petkov
2024-08-10 3:55 ` Andrew Zaborowski
2024-08-10 9:25 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOq732KnHFo3VaRH9V-x0k5m=h1jyNrdtKj4quG8Yaq7wPQjKg@mail.gmail.com' \
--to=andrew.zaborowski@intel.com \
--cc=balrogg@gmail.com \
--cc=bp@alien8.de \
--cc=ebiederm@xmission.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox