linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, acpi memory hotplug, add parameter to disable memory hotplug
@ 2014-01-13 21:56 Prarit Bhargava
  2014-01-13 22:17 ` KOSAKI Motohiro
  2014-01-14  1:45 ` Dave Young
  0 siblings, 2 replies; 14+ messages in thread
From: Prarit Bhargava @ 2014-01-13 21:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Len Brown, Rafael J. Wysocki, Linn Crosetto, Pekka Enberg,
	Yinghai Lu, Andrew Morton, Toshi Kani, Tang Chen, Wen Congyang,
	Vivek Goyal, kosaki.motohiro, dyoung, linux-acpi, linux-mm

When booting a kexec/kdump kernel on a system that has specific memory hotplug
regions the boot will fail with warnings like:

[    2.939467] swapper/0: page allocation failure: order:9, mode:0x84d0
[    2.946564] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.10.0-65.el7.x86_64 #1
[    2.954532] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS
QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
[    2.964926]  0000000000000000 ffff8800341bd8c8 ffffffff815bcc67
ffff8800341bd950
[    2.973224]  ffffffff8113b1a0 ffff880036339b00 0000000000000009
00000000000084d0
[    2.981523]  ffff8800341bd950 ffffffff815b87ee 0000000000000000
0000000000000200
[    2.989821] Call Trace:
[    2.992560]  [<ffffffff815bcc67>] dump_stack+0x19/0x1b
[    2.998300]  [<ffffffff8113b1a0>] warn_alloc_failed+0xf0/0x160
[    3.004817]  [<ffffffff815b87ee>] ?
__alloc_pages_direct_compact+0xac/0x196
[    3.012594]  [<ffffffff8113f14f>] __alloc_pages_nodemask+0x7ff/0xa00
[    3.019692]  [<ffffffff815b417c>] vmemmap_alloc_block+0x62/0xba
[    3.026303]  [<ffffffff815b41e9>] vmemmap_alloc_block_buf+0x15/0x3b
[    3.033302]  [<ffffffff815b1ff6>] vmemmap_populate+0xb4/0x21b
[    3.039718]  [<ffffffff815b461d>] sparse_mem_map_populate+0x27/0x35
[    3.046717]  [<ffffffff815b400f>] sparse_add_one_section+0x7a/0x185
[    3.053720]  [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
[    3.059656]  [<ffffffff81047359>] arch_add_memory+0x59/0xd0
[    3.065877]  [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
[    3.071713]  [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
[    3.078813]  [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
[    3.085719]  [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
[    3.092716]  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[    3.100004]  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[    3.107293]  [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
[    3.113904]  [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
[    3.119933]  [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
[    3.126153]  [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
[    3.131987]  [<ffffffff81a1fd58>] ? acpi_sleep_proc_init+0x2a/0x2a
[    3.138889]  [<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[    3.145210]  [<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
[    3.152111]  [<ffffffff819e18d0>] ? do_early_param+0x88/0x88
[    3.158430]  [<ffffffff8159fea0>] ? rest_init+0x80/0x80
[    3.164264]  [<ffffffff8159feae>] kernel_init+0xe/0x180
[    3.170097]  [<ffffffff815cca2c>] ret_from_fork+0x7c/0xb0
[    3.176123]  [<ffffffff8159fea0>] ? rest_init+0x80/0x80
[    3.181956] Mem-Info:
[    3.184490] Node 0 DMA per-cpu:
[    3.188007] CPU    0: hi:    0, btch:   1 usd:   0
[    3.193353] Node 0 DMA32 per-cpu:
[    3.197060] CPU    0: hi:   42, btch:   7 usd:   0
[    3.202410] active_anon:0 inactive_anon:0 isolated_anon:0
[    3.202410]  active_file:0 inactive_file:0 isolated_file:0
[    3.202410]  unevictable:0 dirty:0 writeback:0 unstable:0
[    3.202410]  free:872 slab_reclaimable:13 slab_unreclaimable:1880
[    3.202410]  mapped:0 shmem:0 pagetables:0 bounce:0
[    3.202410]  free_cma:0

because the system has run out of memory at boot time.  This occurs
because of the following sequence in the boot:

Main kernel boots and sets E820 map.  The second kernel is booted with a
map generated by the kdump service using memmap= and memmap=exactmap.
These parameters are added to the kernel parameters of the kexec/kdump
kernel.   The kexec/kdump kernel has limited memory resources so as not
to severely impact the main kernel.

The system then panics and the kdump/kexec kernel boots (which is a
completely new kernel boot).  During this boot ACPI is initialized and the
kernel (as can be seen above) traverses the ACPI namespace and finds an
entry for a memory device to be hotadded.

ie)

[    3.053720]  [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
[    3.059656]  [<ffffffff81047359>] arch_add_memory+0x59/0xd0
[    3.065877]  [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
[    3.071713]  [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
[    3.078813]  [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
[    3.085719]  [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
[    3.092716]  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[    3.100004]  [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[    3.107293]  [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
[    3.113904]  [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
[    3.119933]  [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
[    3.126153]  [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6

At this point the kernel adds page table information and the the kexec/kdump
kernel runs out of memory.

This can also be reproduced with a "regular" kernel by using the
memmap=exactmap and mem=X parameters on the main kernel and booting.

This patchset resolves the problem by adding a kernel parameter,
acpi_no_memhotplug, to disable ACPI memory hotplug.  ACPI memory hotplug
should also be disabled by default when a user specified a memory mapping with
"memmap=exactmap" or "mem=X".

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Linn Crosetto <linn@hp.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kosaki.motohiro@gmail.com
Cc: dyoung@redhat.com
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-mm@kvack.org
---
 Documentation/kernel-parameters.txt |    3 +++
 arch/x86/kernel/e820.c              |    4 ++++
 drivers/acpi/acpi_memhotplug.c      |   18 ++++++++++++++++++
 include/linux/memory_hotplug.h      |    9 +++++++++
 4 files changed, 34 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b9e9bd8..ea93f75 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -343,6 +343,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			no: ACPI OperationRegions are not marked as reserved,
 			no further checks are performed.
 
+	acpi_no_memhotplug [ACPI] Disable memory hotplug.  Useful for kexec
+			   and kdump kernels.
+
 	add_efi_memmap	[EFI; X86] Include EFI memory map in
 			kernel's map of available physical RAM.
 
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 174da5f..3c431fe 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -20,6 +20,7 @@
 #include <linux/firmware-map.h>
 #include <linux/memblock.h>
 #include <linux/sort.h>
+#include <linux/memory_hotplug.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -834,6 +835,8 @@ static int __init parse_memopt(char *p)
 		return -EINVAL;
 	e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);
 
+	set_acpi_no_memhotplug();
+
 	return 0;
 }
 early_param("mem", parse_memopt);
@@ -857,6 +860,7 @@ static int __init parse_memmap_one(char *p)
 #endif
 		e820.nr_map = 0;
 		userdef = 1;
+		set_acpi_no_memhotplug();
 		return 0;
 	}
 
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 551dad7..d104a7d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -361,7 +361,25 @@ static void acpi_memory_device_remove(struct acpi_device *device)
 	acpi_memory_device_free(mem_device);
 }
 
+static bool acpi_no_memhotplug;
+
+void set_acpi_no_memhotplug(void)
+{
+	acpi_no_memhotplug = true;
+	pr_info_once("ACPI: Memory Hotplug Disabled\n");
+}
+
 void __init acpi_memory_hotplug_init(void)
 {
+	if (acpi_no_memhotplug)
+		return;
+
 	acpi_scan_add_handler_with_hotplug(&memory_device_handler, "memory");
 }
+
+static int __init disable_acpi_memory_hotplug(char *str)
+{
+	set_acpi_no_memhotplug();
+	return 1;
+}
+__setup("acpi_no_memhotplug", disable_acpi_memory_hotplug);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4ca3d95..3cdb6e0 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -12,6 +12,15 @@ struct pglist_data;
 struct mem_section;
 struct memory_block;
 
+#ifdef CONFIG_ACPI_HOTPLUG_MEMORY
+/* set flag to disable ACPI memory hotplug */
+extern void set_acpi_no_memhotplug(void);
+#else
+static inline void set_acpi_no_memhotplug(void)
+{
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 
 /*
-- 
1.7.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-01-14 15:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-13 21:56 [PATCH] x86, acpi memory hotplug, add parameter to disable memory hotplug Prarit Bhargava
2014-01-13 22:17 ` KOSAKI Motohiro
2014-01-13 23:41   ` Prarit Bhargava
2014-01-14  1:11     ` Yasuaki Ishimatsu
2014-01-14  1:41       ` Toshi Kani
2014-01-14  2:43         ` Yasuaki Ishimatsu
2014-01-14 10:58           ` Prarit Bhargava
2014-01-14 15:17           ` Toshi Kani
2014-01-14 11:05         ` Prarit Bhargava
2014-01-14 14:37           ` Vivek Goyal
2014-01-14 15:26           ` Vivek Goyal
2014-01-14 15:31             ` Toshi Kani
2014-01-14 14:36         ` Vivek Goyal
2014-01-14  1:45 ` Dave Young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox