linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: Aili Yao <yaoaili@kingsoft.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>
Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early
Date: Mon, 18 Jan 2021 10:24:23 +0100	[thread overview]
Message-ID: <20210118092419.GA4234@linux> (raw)
In-Reply-To: <20210118085747.GA904@hori.linux.bs1.fc.nec.co.jp>

On Mon, Jan 18, 2021 at 08:57:47AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> I'm not sure what you mean by "non current process error case" and "we
> should mark it AO", so could you explain more specifically about your error
> scenario?  Especially I'd like to know about who triggers hard offline on
> what hardware events and what "wrong action" could happen.  Maybe just
> "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because
> it's not enough for us to see that your scenario is possible. Current
> implementation implicitly assumes some hardware behavior, and does not work
> for the case which never happens under the assumption.

So, the scenario case is a multithread application with the same page mapped.
And PF_MCE_KILL_EARLY flag was set.

IIUIC, Aili Yao concern is that when the MCE machinery calls memory_failure
which MF_ACTION_REQUIRED, only the process that triggered the MCE exception
will receive a SIGBUG, and not the other threads that might have PF_MCE_EARLY.
Aili Yao would like memory_failure() to also signal those threads who might
have the flag set, in case they want to do something with that information.

But reading the code, I do not think that is what the code expects.
Looking at the comment above find_early_kill_thread:

"/*
 * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
 * on behalf of the thread group. Return task_struct of the (first found)
 * dedicated thread if found, and return NULL otherwise.
 *
 * We already hold read_lock(&tasklist_lock) in the caller, so we don't
 * have to call rcu_read_lock/unlock() in this function.
 */"

What I understand from that is:

"
 If memory_failure() was not triggered by any concrete process (aka: no one was
 trying to manipulate the corrupted area), we need to find the main thread who
 might have set the MCE policy by pcrtl and see if they want to be signaled
 __before__ they access the corrupted area.
 
"

Note that if the PF_MCE policy was not set, we check the global knob
sysctm_memory_early_kill.
And if that is not set either, we defer the signaling till later when a process
actually tries to operate the corrupted area.

Does that makes sense?

Actually, unless I am mistaken, if a multithread process receives a signal,
all threads belonging to the process will receive the signal as well:

"The signal disposition is a per-process attribute: in a
multithreaded application, the disposition of a particular signal
is the same for all threads."

-- 
Oscar Salvador
SUSE L3


  parent reply	other threads:[~2021-01-18  9:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15  7:55 Aili Yao
2021-01-15  8:49 ` Oscar Salvador
2021-01-15  9:26   ` Aili Yao
2021-01-15  9:31     ` Aili Yao
2021-01-15  9:40       ` Oscar Salvador
2021-01-15  9:53         ` Aili Yao
2021-01-15 10:31     ` Oscar Salvador
2021-01-18  5:15     ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18  5:57       ` Aili Yao
2021-01-18  6:50         ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18  7:16           ` Aili Yao
2021-01-18  8:15           ` Aili Yao
2021-01-18  8:57             ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18  9:09               ` Aili Yao
2021-01-19  5:25                 ` HORIGUCHI NAOYA(堀口 直也)
2021-01-19  6:04                   ` Aili Yao
2021-01-19  7:33                     ` HORIGUCHI NAOYA(堀口 直也)
2021-01-18  9:24               ` Oscar Salvador [this message]
2021-01-18  9:38                 ` Aili Yao
2021-01-18 10:09                   ` Oscar Salvador
2021-01-19  4:21               ` Aili Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210118092419.GA4234@linux \
    --to=osalvador@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox