From: Axel Rasmussen <axelrasmussen@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>,
James Houghton <jthoughton@google.com>,
David Hildenbrand <david@redhat.com>,
Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: BUG selftests/mm]
Date: Tue, 12 Mar 2024 09:47:31 -0700 [thread overview]
Message-ID: <CAJHvVcizoDwYVqnkroBz4nrXJRCkynC19=RBgWhT6xPdrvJ0sA@mail.gmail.com> (raw)
In-Reply-To: <ZfB28NIbflrnsqiX@x1n>
On Tue, Mar 12, 2024 at 8:38 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote:
> > On Mon, Mar 11, 2024 at 2:27 PM James Houghton <jthoughton@google.com> wrote:
> > >
> > > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu <peterx@redhat.com> wrote:
> > > >
> > > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> > > > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> > > > > UFFDIO_POISON, because those control access to lots more things
> > > > > besides, which we don't necessarily want the process using UFFD to be
> > > > > able to do. :/
> > >
> > > I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN.
> >
> > +1.
> >
> >
> > >
> > > > >
> > > > > Ratelimiting seems fairly reasonable to me. I do see the concern about
> > > > > dropping some addresses though.
> > > >
> > > > Do you know how much could an admin rely on such addresses? How frequent
> > > > would MCE generate normally in a sane system?
> > >
> > > I'm not sure about how much admins rely on the address themselves. +cc
> > > Jiaqi Yan
> >
> > I think admins mostly care about MCEs from **real** hardware. For
> > example they may choose to perform some maintenance if the number of
> > hardware DIMM errors, keyed by PFN, exceeds some threshold. And I
> > think mcelog or /sys/devices/system/node/node${X}/memory_failure are
> > better tools than dmesg. In the case all memory errors are emulated by
> > hypervisor after a live migration, these dmesgs may confuse admins to
> > think there is dimm error on host but actually it is not the case. In
> > this sense, silencing these emulated by UFFDIO_POISON makes sense (if
> > not too complicated to do).
>
> Now we have three types of such error: (1) PFN poisoned, (2) swapin error,
> (3) emulated. Both 1+2 should deserve a global message dump, while (3)
> should be process-internal, and nobody else should need to care except the
> process itself (via the signal + meta info).
>
> If we want to differenciate (2) v.s. (3), we may need 1 more pte marker bit
> to show whether such poison is "global" or "local" (while as of now 2+3
> shares the usage of the same PTE_MARKER_POISONED bit); a swapin error can
> still be seen as a "global" error (instead of a mem error, it can be a disk
> error, and the err msg still applies to it describing a VA corrupt).
> Another VM_FAULT_* flag is also needed to reflect that locality, then
> ignore a global broadcast for "local" poison faults.
It's easy to implement, as long as folks aren't too offended by taking
one more bit. :) I can send a patch for this on Monday if there are no
objections.
>
> >
> > SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory
> > corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON
> > are less useful to admins AFAIK. They are for sure crucial to
> > userspace / vmm / hypervisor, but the SIGBUS sent already contains the
> > poisoned address (in si_addr from force_sig_mceerr).
> >
> > >
> > > It's possible for a sane hypervisor dealing with a buggy guest / guest
> > > userspace to trigger lots of these pr_errs. Consider the case where a
> > > guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to
> > > ignore), and then ignores SIGBUS. It will keep getting MCEs /
> > > SIGBUSes.
> > >
> > > The sane hypervisor will use UFFDIO_POISON to prevent the guest from
> > > re-accessing *real* poison, but we will still get the pr_err, and we
> > > still keep injecting MCEs into the guest. We have observed scenarios
> > > like this before.
> > >
> > > >
> > > > > Perhaps we can mitigate that concern by defining our own ratelimit
> > > > > interval/burst configuration?
> > > >
> > > > Any details?
> > > >
> > > > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> > > > > similar. Not sure if that's considered valid or not. :)
> > > >
> > > > This, OTOH, sounds like an overkill..
> > > >
> > > > I just checked again on the detail of ratelimit code, where we by default
> > > > it has:
> > > >
> > > > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
> > > > #define DEFAULT_RATELIMIT_BURST 10
> > > >
> > > > So it allows a 10 times burst rather than 2.. IIUC it means even if
> > > > there're continous 10 MCEs it won't get suppressed, until the 11th came, in
> > > > 5 seconds interval. I think it means it's possibly even less of a concern
> > > > to directly use pr_err_ratelimited().
> > >
> > > I'm okay with any rate limiting everyone agrees on. IMO, silencing
> > > these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they
> > > did not come from real hardware MCE events) sounds like the most
> > > correct thing to do, but I don't mind. Just don't make UFFDIO_POISON
> > > require CAP_SYS_ADMIN. :)
> > >
> > > Thanks.
> >
>
> --
> Peter Xu
>
prev parent reply other threads:[~2024-03-12 16:48 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-09 19:12 Mirsad Todorovac
2024-03-11 9:31 ` David Hildenbrand
2024-03-11 14:35 ` Peter Xu
2024-03-11 14:48 ` David Hildenbrand
2024-03-11 15:12 ` Peter Xu
2024-03-11 18:59 ` Axel Rasmussen
2024-03-11 19:28 ` Peter Xu
2024-03-11 21:26 ` James Houghton
2024-03-11 22:28 ` Jiaqi Yan
2024-03-12 15:38 ` Peter Xu
2024-03-12 16:47 ` Axel Rasmussen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJHvVcizoDwYVqnkroBz4nrXJRCkynC19=RBgWhT6xPdrvJ0sA@mail.gmail.com' \
--to=axelrasmussen@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mirsad.todorovac@alu.unizg.hr \
--cc=peterx@redhat.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox