From: Seiji Aguchi <seiji.aguchi@hds.com>
To: "rdunlap@xenotime.net" <rdunlap@xenotime.net>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"x86@kernel.org" <x86@kernel.org>,
"ebiederm@xmission.com" <ebiederm@xmission.com>,
"andi@firstfloor.org" <andi@firstfloor.org>,
"akpm@linuxfoundation.org" <akpm@linuxfoundation.org>,
"eugeneteo@kernel.org" <eugeneteo@kernel.org>,
"kees.cook@canonical.com" <kees.cook@canonical.com>,
"drosenberg@vsecurity.com" <drosenberg@vsecurity.com>,
"ying.huang@intel.com" <ying.huang@intel.com>,
"len.brown@intel.com" <len.brown@intel.com>,
"seto.hidetoshi@jp.fujitsu.com" <seto.hidetoshi@jp.fujitsu.com>,
"paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>,
"gregkh@suse.de" <gregkh@suse.de>,
"davem@davemloft.net" <davem@davemloft.net>,
"hadi@cyberus.ca" <hadi@cyberus.ca>,
"hawk@comx.dk" <hawk@comx.dk>,
"opurdila@ixiacom.com" <opurdila@ixiacom.com>,
"hidave.darkstar@gmail.com" <hidave.darkstar@gmail.com>,
"dzickus@redhat.com" <dzickus@redhat.com>,
"eric.dumazet@gmail.com" <eric.dumazet@gmail.com>,
"ext-andriy.shevchenko@nokia.com"
<ext-andriy.shevchenko@nokia.com>,
"tj@kernel.org" <tj@kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"dle-develop@lists.sourceforge.net"
<dle-develop@lists.sourceforge.net>
Cc: Satoru Moriya <satoru.moriya@hds.com>,
Seiji Aguchi <seiji.aguchi@hds.com>
Subject: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
Date: Wed, 22 Dec 2010 18:35:40 -0500 [thread overview]
Message-ID: <5C4C569E8A4B9B42A84A977CF070A35B2C132F68FC@USINDEVS01.corp.hds.com> (raw)
Hi,
[Purpose]
Kexec may trigger additional hardware errors and multiply the damage
if it works after MCE occurred because there are some hardware-related
operations in kexec as follows.
- Sending NMI to cpus
- Initializing hardware during boot process of second kernel.
- Accessing to memory and dumping it to disks.
So, I propose adding a new option controlling kexec behaviour when MCE
occurred.
This patch prevents unnecessary hardware errors and avoid expanding
the damage.
[Patch Description]
I added a sysctl option ,kernel.kexec_on_mce, controlling kexec behaviour
when MCE occurred.
- Permission
- 0644
- Value(default is "1")
- non-zero: Kexec is enabled regardless of MCE.
- 0: Kexec is disabled when MCE occurred.
Matrix of kernel.kexec_on_mce value, MCE and kexec behaviour
--------------------------------------------------
kernel.kexec_on_mce| MCE | kexec behaviour
--------------------------------------------------
non-zero | occurred | enabled
-------------------------------
| not occurred | enabled
--------------------------------------------------
0 | occurred | disabled
|------------------------------
| not occurred | enabled
--------------------------------------------------
Any comments and suggestions are welcome.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
---
Documentation/sysctl/kernel.txt | 12 ++++++++++++
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mcheck/mce.c | 4 ++++
include/linux/sysctl.h | 1 +
kernel/kexec.c | 7 +++++++
kernel/sysctl.c | 12 ++++++++++++
kernel/sysctl_binary.c | 1 +
mm/memory-failure.c | 9 +++++++++
8 files changed, 48 insertions(+), 0 deletions(-)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..ce3240e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
- hotplug
- java-appletviewer [ binfmt_java, obsolete ]
- java-interpreter [ binfmt_java, obsolete ]
+- kexec_on_mce [ X86 only ]
- kstack_depth_to_print [ X86 only ]
- l2cr [ PPC only ]
- modprobe ==> Documentation/debugging-modules.txt
@@ -261,6 +262,17 @@ This flag controls the L2 cache of G3 processor boards. If
==============================================================
+kexec_on_mce: (X86 only)
+
+Controls the kexec behaviour when MCE occurred.
+Default value is 1.
+
+0: Kexec is disabled when MCE occurred.
+non-zero: Kexec is enabled regardless of MCE.
+
+
+==============================================================
+
kstack_depth_to_print: (X86 only)
Controls the number of words to print when dumping the raw
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index c62c13c..062dabd 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -123,6 +123,8 @@ extern struct atomic_notifier_head x86_mce_decoder_chain;
extern int mce_disabled;
extern int mce_p5_enabled;
+extern int kexec_on_mce;
+extern int mce_flag;
#ifdef CONFIG_X86_MCE
int mcheck_init(void);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 7a35b72..edbaf77 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -85,6 +85,8 @@ static int mce_dont_log_ce __read_mostly;
int mce_cmci_disabled __read_mostly;
int mce_ignore_ce __read_mostly;
int mce_ser __read_mostly;
+int kexec_on_mce = 1;
+int mce_flag;
struct mce_bank *mce_banks __read_mostly;
@@ -944,6 +946,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
percpu_inc(mce_exception_count);
+ mce_flag = 1;
+
if (notify_die(DIE_NMI, "machine check", regs, error_code,
18, SIGKILL) == NOTIFY_STOP)
goto out;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 7bb5cb6..0ebe708 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -153,6 +153,7 @@ enum
KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
+ KERN_KEXEC_ON_MCE=77, /* int: whether we will dump memory on mce */
};
diff --git a/kernel/kexec.c b/kernel/kexec.c
index b55045b..3e5c41a 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -39,6 +39,7 @@
#include <asm/io.h>
#include <asm/system.h>
#include <asm/sections.h>
+#include <asm/mce.h>
/* Per cpu memory for storing cpu states in case of system crash. */
note_buf_t __percpu *crash_notes;
@@ -1074,6 +1075,12 @@ void crash_kexec(struct pt_regs *regs)
* of memory the xchg(&kexec_crash_image) would be
* sufficient. But since I reuse the memory...
*/
+#ifdef CONFIG_X86_MCE
+ if (!kexec_on_mce && mce_flag) {
+ printk(KERN_WARNING "Kexec is disabled because MCE occurred\n");
+ return;
+ }
+#endif
if (mutex_trylock(&kexec_mutex)) {
if (kexec_crash_image) {
struct pt_regs fixed_regs;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..3a64cd6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -81,6 +81,9 @@
#include <linux/nmi.h>
#endif
+#ifdef CONFIG_X86_MCE
+#include <asm/mce.h>
+#endif
#if defined(CONFIG_SYSCTL)
@@ -963,6 +966,15 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec,
},
#endif
+#if defined(CONFIG_X86_MCE)
+ {
+ .procname = "kexec_on_mce",
+ .data = &kexec_on_mce,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif
/*
* NOTE: do not add new entries to this table unless you have read
* Documentation/sysctl/ctl_unnumbered.txt
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 1357c57..a25f971 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -138,6 +138,7 @@ static const struct bin_table bin_kern_table[] = {
{ CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" },
{ CTL_INT, KERN_NMI_WATCHDOG, "nmi_watchdog" },
{ CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" },
+ { CTL_INT, KERN_KEXEC_ON_MCE, "kexec_on_mce" },
{}
};
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 46ab2c0..3ec075a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -52,6 +52,11 @@
#include <linux/swapops.h>
#include <linux/hugetlb.h>
#include <linux/memory_hotplug.h>
+
+#ifdef CONFIG_X86_MCE
+#include <asm/mce.h>
+#endif
+
#include "internal.h"
int sysctl_memory_failure_early_kill __read_mostly = 0;
@@ -949,6 +954,10 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
int res;
unsigned int nr_pages;
+#ifdef CONFIG_X86_MCE
+ mce_flag = 1;
+#endif
+
if (!sysctl_memory_failure_recovery)
panic("Memory failure from trap %d on page %lx", trapno, pfn);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2010-12-23 0:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-22 23:35 Seiji Aguchi [this message]
2010-12-23 0:29 ` Greg KH
2010-12-23 7:43 ` Andi Kleen
2010-12-23 9:18 ` Borislav Petkov
2010-12-23 17:31 ` Seiji Aguchi
2010-12-23 19:56 ` Eric W. Biederman
2010-12-25 14:56 ` Seiji Aguchi
2010-12-25 17:19 ` Eric W. Biederman
2010-12-25 18:33 ` H. Peter Anvin
2010-12-25 21:40 ` Eric W. Biederman
2010-12-27 1:56 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5C4C569E8A4B9B42A84A977CF070A35B2C132F68FC@USINDEVS01.corp.hds.com \
--to=seiji.aguchi@hds.com \
--cc=akpm@linuxfoundation.org \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=dle-develop@lists.sourceforge.net \
--cc=drosenberg@vsecurity.com \
--cc=dzickus@redhat.com \
--cc=ebiederm@xmission.com \
--cc=eric.dumazet@gmail.com \
--cc=eugeneteo@kernel.org \
--cc=ext-andriy.shevchenko@nokia.com \
--cc=gregkh@suse.de \
--cc=hadi@cyberus.ca \
--cc=hawk@comx.dk \
--cc=hidave.darkstar@gmail.com \
--cc=hpa@zytor.com \
--cc=kees.cook@canonical.com \
--cc=kexec@lists.infradead.org \
--cc=len.brown@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=opurdila@ixiacom.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rdunlap@xenotime.net \
--cc=satoru.moriya@hds.com \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox