From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Aili Yao <yaoaili@kingsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>
Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early
Date: Tue, 19 Jan 2021 05:25:38 +0000 [thread overview]
Message-ID: <20210119052537.GA1642@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20210118170900.6fe9595a.yaoaili@kingsoft.com>
On Mon, Jan 18, 2021 at 05:09:00PM +0800, Aili Yao wrote:
> On Mon, 18 Jan 2021 08:57:47 +0000
> HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:
>
> > > >
> > > > For action optional cases, one error event kills *only one* process. If an
> > > > error page are shared by multiple processes, these processes will be killed
> > > > by separate error events, each of which is triggered when each process tries
> > > > to access the error memory. So these processes would be killed immediately
> > > > when accessing the error, but you don't have to kill all at the same time
> > > > (or actually you might not even have to kill it at all if the process exits
> > > > finally without accessing the error later).
> > > >
> > > > Maybe the function variable "force_early" is named confusingly (it sounds
> > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect).
> > > > I'll submit a fix later. (I'll add your "Reported-by" because you made me
> > > > find it, thank you.)
> > > >
> > > I think we should do more for non current process error case, we should mark it AO for processes to be signaled
> > > or we may take wrong action.
> >
> > I'm not sure what you mean by "non current process error case" and "we
> > should mark it AO", so could you explain more specifically about your error
> > scenario?
> I will share my test code and i will submit another patch to this scenario.
> please give me some time, thanks!
> And I think you are right, AR is only current process.
>
> > Especially I'd like to know about who triggers hard offline on
> > what hardware events and what "wrong action" could happen. Maybe just
> > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because
> > it's not enough for us to see that your scenario is possible. Current
> > implementation implicitly assumes some hardware behavior, and does not work
> > for the case which never happens under the assumption.
> >
> This action is from mcelog daemon, normally softpage offlie is default, but we can configure
> hardpage offline for CE storms, to get related processes signaled.
Thanks, so which interface did you use for error injection? I guess first
you used /sys/devices/system/memory/hard_offline_page, but if it's true,
then the error event should be action optional (no MF_ACTION_REQUIRED set).
So now I'm wondering why you are observing action required events?
My another guess is that you might have used mce-inject tool, if that's true,
please use hard_offline_page, then current kernel code should properly send
SIGBUS to dedicated process.
- Naoya
next prev parent reply other threads:[~2021-01-19 5:25 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-15 7:55 Aili Yao
2021-01-15 8:49 ` Oscar Salvador
2021-01-15 9:26 ` Aili Yao
2021-01-15 9:31 ` Aili Yao
2021-01-15 9:40 ` Oscar Salvador
2021-01-15 9:53 ` Aili Yao
2021-01-15 10:31 ` Oscar Salvador
2021-01-18 5:15 ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18 5:57 ` Aili Yao
2021-01-18 6:50 ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18 7:16 ` Aili Yao
2021-01-18 8:15 ` Aili Yao
2021-01-18 8:57 ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18 9:09 ` Aili Yao
2021-01-19 5:25 ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-01-19 6:04 ` Aili Yao
2021-01-19 7:33 ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18 9:24 ` Oscar Salvador
2021-01-18 9:38 ` Aili Yao
2021-01-18 10:09 ` Oscar Salvador
2021-01-19 4:21 ` Aili Yao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210119052537.GA1642@hori.linux.bs1.fc.nec.co.jp \
--to=naoya.horiguchi@nec.com \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=yangfeng1@kingsoft.com \
--cc=yaoaili@kingsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox