From: Tony Luck <tony.luck@gmail.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "iskra@mcs.anl.gov" <iskra@mcs.anl.gov>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Andi Kleen <andi@firstfloor.org>, Borislav Petkov <bp@suse.de>,
"gong.chen@linux.jf.intel.com" <gong.chen@linux.jf.intel.com>
Subject: Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
Date: Tue, 27 May 2014 22:09:54 -0700 [thread overview]
Message-ID: <FDBACF11-D9F6-4DE5-A0D4-800903A243B7@gmail.com> (raw)
In-Reply-To: <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>
I'm exploring options to see what writers of threaded applications might want/need. I'm very doubtful that they would really want "broadcast to all threads". What if there are hundreds or thousands of threads? We send the signals from the context of the thread that hit the error. But that might take a while. Meanwhile any of those threads that were already scheduled on other CPUs are back running again. So there are big races even if we broadcast.
Sent from my iPhone
> On May 27, 2014, at 17:15, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
>
> On Tue, May 27, 2014 at 03:53:55PM -0700, Tony Luck wrote:
>>> - make sure that every thread in a recovery aware application should have
>>> a SIGBUS handler, inside which
>>> * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
>>> * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread
>>
>> But how does the kernel know which is the special thread that
>> should see the "AO" signal? Broadcasting the signal to all
>> threads seems to be just as likely to cause problems to
>> an application as the h/w broadcasting MCE to all processors.
>
> I thought that kernel doesn't have to know about which thread is the
> special one if the AO signal is broadcasted to all threads, because
> in such case the special thread always gets the AO signal.
>
> The reported problem happens only the application sets PF_MCE_EARLY flag,
> and such application is surely recovery aware, so we can assume that the
> coders must implement SIGBUS handler for all threads. Then all other threads
> but the special one can intentionally ignore AO signal. This is to avoid the
> default behavior for SIGBUS ("kill all threads" as Kamil said in the previous
> email.)
>
> And I hope that downside of signal broadcasting is smaller than MCE
> broadcasting because the range of broadcasting is limited to a process group,
> not to the whole system.
>
> # I don't intend to rule out other possibilities like adding another prctl
> # flag, so if you have a patch, that's would be great.
>
> Thanks,
> Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-05-28 5:09 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-20 17:35 [PATCH 0/2] Fix some machine check application recovery cases Tony Luck
2014-05-20 16:28 ` [PATCH 1/2] memory-failure: Send right signal code to correct thread Tony Luck
2014-05-20 17:54 ` Naoya Horiguchi
[not found] ` <1400608486-alyqz521@n-horiguchi@ah.jp.nec.com>
2014-05-20 20:56 ` Luck, Tony
2014-05-23 3:34 ` Chen, Gong
2014-05-23 16:48 ` Tony Luck
2014-05-27 16:16 ` Kamil Iskra
2014-05-27 17:50 ` Naoya Horiguchi
[not found] ` <5384d07e.4504e00a.2680.ffff8c31SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-27 22:53 ` Tony Luck
2014-05-28 0:15 ` Naoya Horiguchi
[not found] ` <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-28 5:09 ` Tony Luck [this message]
2014-05-28 18:47 ` [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread Naoya Horiguchi
[not found] ` <53862f6c.91148c0a.5fb0.2d0cSMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-28 22:00 ` Tony Luck
2014-05-29 1:45 ` Naoya Horiguchi
[not found] ` <5386915f.4772e50a.0657.ffffcda4SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-29 17:03 ` Tony Luck
2014-05-29 18:38 ` Naoya Horiguchi
2014-05-30 6:51 ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
2014-05-30 6:51 ` [PATCH 1/3] memory-failure: Send right signal code to correct thread Naoya Horiguchi
2014-06-02 22:44 ` Andrew Morton
2014-06-03 1:12 ` Naoya Horiguchi
2014-05-30 6:51 ` [PATCH 2/3] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Naoya Horiguchi
2014-05-30 6:51 ` [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) Naoya Horiguchi
2014-06-02 22:42 ` Andrew Morton
2014-06-03 1:03 ` Naoya Horiguchi
2014-05-30 17:25 ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Luck, Tony
2014-05-30 18:24 ` Naoya Horiguchi
[not found] ` <5388cd0e.463edd0a.755d.6f61SMTPIN_ADDED_BROKEN@mx.google.com>
2014-06-02 22:43 ` Andrew Morton
2014-06-02 23:37 ` Luck, Tony
[not found] ` <1401327939-cvm7qh0m@n-horiguchi@ah.jp.nec.com>
2014-05-30 19:52 ` [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread Kamil Iskra
2014-05-20 16:46 ` [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Tony Luck
2014-05-20 17:59 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FDBACF11-D9F6-4DE5-A0D4-800903A243B7@gmail.com \
--to=tony.luck@gmail.com \
--cc=andi@firstfloor.org \
--cc=bp@suse.de \
--cc=gong.chen@linux.jf.intel.com \
--cc=iskra@mcs.anl.gov \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox