From: Tang Chen <tangchen@cn.fujitsu.com>
To: tglx@linutronix.de, mingo@elte.hu, hpa@zytor.com,
akpm@linux-foundation.org, tj@kernel.org, trenn@suse.de,
yinghai@kernel.org, jiang.liu@huawei.com, wency@cn.fujitsu.com,
laijs@cn.fujitsu.com, isimatu.yasuaki@jp.fujitsu.com,
izumi.taku@jp.fujitsu.com, mgorman@suse.de, minchan@kernel.org,
mina86@mina86.com, gong.chen@linux.intel.com,
vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
riel@redhat.com, jweiner@redhat.com, prarit@redhat.com,
zhangyanfei@cn.fujitsu.com, yanghy@cn.fujitsu.com
Cc: x86@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-acpi@vger.kernel.org
Subject: Re: [PATCH 00/21] Arrange hotpluggable memory as ZONE_MOVABLE.
Date: Mon, 22 Jul 2013 10:48:15 +0800 [thread overview]
Message-ID: <51EC9D6F.9010807@cn.fujitsu.com> (raw)
In-Reply-To: <1374220774-29974-1-git-send-email-tangchen@cn.fujitsu.com>
Hi,
Forgot to mention, this patch-set is based on linux-3.10 release.
Thanks. :)
On 07/19/2013 03:59 PM, Tang Chen wrote:
> This patch-set aims to solve some problems at system boot time
> to enhance memory hotplug functionality.
>
> [Background]
>
> The Linux kernel cannot migrate pages used by the kernel because
> of the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
> physical address is changed, we cannot simply update the kernel
> pagetable. On the contrary, we have to update all the pointers
> pointing to the virtual address, which is very difficult to do.
>
> In order to do memory hotplug, we should prevent the kernel to use
> hotpluggable memory.
>
> In ACPI, there is a table named SRAT(System Resource Affinity Table).
> It contains system NUMA info (CPUs, memory ranges, PXM), and also a
> flag field indicating which memory ranges are hotpluggable.
>
>
> [Problem to be solved]
>
> At the very early time when the system is booting, we use a bootmem
> allocator, named memblock, to allocate memory for the kernel.
> memblock will start to work before the kernel parse SRAT, which
> means memblock won't know which memory is hotpluggable before SRAT
> is parsed.
>
> So at this time, memblock could allocate hotpluggable memory for
> the kernel to use permanently. For example, the kernel may allocate
> pagetables in hotpluggable memory, which cannot be freed when the
> system is up.
>
> So we have to prevent memblock allocating hotpluggable memory for
> the kernel at the early boot time.
>
>
> [Earlier solutions]>
> We have tried to parse SRAT earlier, before memblock is ready. To
> do this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
> Otherwise the override tables won't be able to effect.
>
> This is not that easy to do because memblock is ready before direct
> mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
> procedure into two steps: find and copy. Please refer to the
> following patch-set:
> https://lkml.org/lkml/2013/6/13/587
>
> To this solution, tj gave a lot of comments and the following
> suggestions.
>
>
> [Suggestion from tj]
>
> tj mainly gave the following suggestions:
>
> 1. Necessary reordering is OK, but we should not rely on
> reordering to achieve the goal because it makes the kernel
> too fragile.
>
> 2. Memory allocated to kernel for temporary usage is OK because
> it will be freed when the system is up. Doing relocation
> for permanent allocated hotpluggable memory will make the
> the kernel more robust.
>
> 3. Need to enhance memblock to discover and complain if any
> hotpluggable memory is allocated to kernel.
>
> After a long thinking, we choose not to do the relocation for
> the following reasons:
>
> 1. It's easy to find out the allocated hotpluggable memory. But
> memblock will merge the adjoined ranges owned by different users
> and used for different purposes. It's hard to find the owners.
>
> 2. Different memory has different way to be relocated. I think one
> function for each kind of memory will make the code too messy.
>
> 3. Pagetable could be in hotpluggable memory. Relocating pagetable
> is too difficult and risky. We have to update all PUD, PMD pages.
> And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
> are not long after pagetable is initialized. If we relocate the
> pagetable not long after it was initialized, the code will be
> very ugly.
>
>
> [Solution in this patch-set]
>
> In this patch-set, we still do the reordering, but in a new way.
>
> 1. Improve memblock with flags, so that it is able to differentiate
> memory regions for different usage. And also a MEMBLOCK_HOTPLUGGABLE
> flag to mark hotpluggable memory.
> (patch 1 ~ 3)
>
> 2. When memblock is ready (memblock_x86_fill() is called), initialize
> acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
> Now, we have all the ACPI tables' phys addrs provided by firmware.
> (patch 4 ~ 8)
>
> 3. Check if there is a SRAT in initrd file used to override the one
> provided by firmware. If so, get its phys addr.
> (patch 12)
>
> 4. If no override SRAT in initrd, get the phys addr of the SRAT
> provided by firmware.
> (patch 13)
>
> Now, we have the phys addr of the to be used SRAT, the one in
> initrd or the one in firmware.
>
> 5. Parse only the memory affinities in SRAT, find out all the
> hotpluggable memory regions and reserve them in memblock with
> MEMBLK_HOTPLUGGABLE flag.
> (patch 14 ~ 15)
>
> 6. The kernel goes through the current path. Any other related parts,
> such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
> tables pathes, global variable numa_meminfo, and so on, are not
> modified. They work as before.
>
> 7. Free memory with MEMBLK_HOTPLUGGABLE flag when memory initialization
> is finished.
> (patch 16)
>
> 8. Introduce movablecore=acpi boot option to allow users to enable
> and disable this functionality.
> (patch 17 ~ 21)
>
> And patch 9 ~ 11 fix some small problems.
>
>
> In summary, in order to get hotpluggable memory info as early as possible,
> this patch-set only parse memory affinities in SRAT one more time right
> after memblock is ready, and leave all the other pathes untouched. With
> the hotpluggable memory info, we can arrange hotpluggable memory in
> ZONE_MOVABLE to prevent the kernel to use it.
>
> Thanks. :)
>
>
> Tang Chen (20):
> acpi: Print Hot-Pluggable Field in SRAT.
> memblock, numa: Introduce flag into memblock.
> x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to
> reserve hotpluggable memory.
> acpi: Remove "continue" in macro INVALID_TABLE().
> acpi: Introduce acpi_invalid_table() to check if a table is invalid.
> x86, acpi: Split acpi_boot_table_init() into two parts.
> x86, acpi: Initialize ACPI root table list earlier.
> x86, acpi: Also initialize signature and length when parsing root
> table.
> x86: Make get_ramdisk_{image|size}() global.
> earlycpio.c: Fix the confusing comment of find_cpio_data().
> x86, acpi: Try to find if SRAT is overrided earlier.
> x86, acpi: Try to find SRAT in firmware earlier.
> x86, acpi, numa: Reserve hotpluggable memory at early time.
> x86, acpi, numa: Don't reserve memory on nodes the kernel resides in.
> x86, memblock, mem-hotplug: Free hotpluggable memory reserved by
> memblock.
> page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using
> SRAT.
> x86, numa: Synchronize nid info in memblock.reserve with
> numa_meminfo.
> x86, numa: Save nid when reserve memory into memblock.reserved[].
> x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher
> priority.
> doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot
> option.
>
> Yasuaki Ishimatsu (1):
> x86: get pg_data_t's memory from other node
>
> Documentation/kernel-parameters.txt | 10 ++
> arch/x86/include/asm/setup.h | 3 +
> arch/x86/kernel/acpi/boot.c | 38 +++--
> arch/x86/kernel/setup.c | 22 +++-
> arch/x86/mm/numa.c | 55 +++++++-
> arch/x86/mm/srat.c | 11 +-
> drivers/acpi/acpica/tbutils.c | 48 ++++++-
> drivers/acpi/acpica/tbxface.c | 34 +++++
> drivers/acpi/osl.c | 262 ++++++++++++++++++++++++++++++++---
> drivers/acpi/tables.c | 7 +-
> include/acpi/acpixf.h | 6 +
> include/linux/acpi.h | 21 +++-
> include/linux/memblock.h | 9 ++
> include/linux/memory_hotplug.h | 5 +
> include/linux/mm.h | 9 ++
> lib/earlycpio.c | 7 +-
> mm/memblock.c | 90 ++++++++++--
> mm/memory_hotplug.c | 42 ++++++-
> mm/nobootmem.c | 3 +
> mm/page_alloc.c | 44 ++++++-
> 20 files changed, 656 insertions(+), 70 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2013-07-22 2:45 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-19 7:59 Tang Chen
2013-07-19 7:59 ` [PATCH 01/21] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-07-23 18:48 ` Tejun Heo
2013-07-23 19:15 ` Joe Perches
2013-07-23 19:20 ` Tejun Heo
2013-07-23 19:26 ` Joe Perches
2013-07-24 1:46 ` Tang Chen
2013-07-19 7:59 ` [PATCH 02/21] memblock, numa: Introduce flag into memblock Tang Chen
2013-07-23 19:09 ` Tejun Heo
2013-07-24 2:53 ` Tang Chen
2013-07-24 15:54 ` Tejun Heo
2013-07-25 6:42 ` Tang Chen
2013-07-19 7:59 ` [PATCH 03/21] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to reserve hotpluggable memory Tang Chen
2013-07-23 19:19 ` Tejun Heo
2013-07-24 2:55 ` Tang Chen
2013-07-19 7:59 ` [PATCH 04/21] acpi: Remove "continue" in macro INVALID_TABLE() Tang Chen
2013-07-23 19:15 ` Tejun Heo
2013-07-19 7:59 ` [PATCH 05/21] acpi: Introduce acpi_invalid_table() to check if a table is invalid Tang Chen
2013-07-19 7:59 ` [PATCH 06/21] x86, acpi: Split acpi_boot_table_init() into two parts Tang Chen
2013-07-19 7:59 ` [PATCH 07/21] x86, acpi: Initialize ACPI root table list earlier Tang Chen
2013-07-19 7:59 ` [PATCH 08/21] x86, acpi: Also initialize signature and length when parsing root table Tang Chen
2013-07-23 19:45 ` Tejun Heo
2013-07-25 6:50 ` Tang Chen
2013-07-19 7:59 ` [PATCH 09/21] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-07-23 19:56 ` Tejun Heo
2013-07-24 3:12 ` Tang Chen
2013-07-19 7:59 ` [PATCH 10/21] earlycpio.c: Fix the confusing comment of find_cpio_data() Tang Chen
2013-07-23 20:02 ` Tejun Heo
2013-07-24 3:20 ` Tang Chen
2013-07-19 7:59 ` [PATCH 11/21] x86: get pg_data_t's memory from other node Tang Chen
2013-07-23 20:09 ` Tejun Heo
2013-07-24 3:52 ` Tang Chen
2013-07-24 16:03 ` Tejun Heo
2013-07-19 7:59 ` [PATCH 12/21] x86, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-07-23 20:27 ` Tejun Heo
2013-07-24 6:57 ` Tang Chen
2013-07-19 7:59 ` [PATCH 13/21] x86, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-07-23 20:49 ` Tejun Heo
2013-07-24 10:12 ` Tang Chen
2013-07-24 15:55 ` Tejun Heo
2013-07-23 23:26 ` Cody P Schafer
2013-07-24 10:16 ` Tang Chen
2013-07-19 7:59 ` [PATCH 14/21] x86, acpi, numa: Reserve hotpluggable memory at early time Tang Chen
2013-07-23 20:55 ` Tejun Heo
2013-07-23 21:32 ` Tejun Heo
2013-07-25 2:13 ` Tang Chen
2013-07-25 15:17 ` Tejun Heo
2013-07-26 3:45 ` Tang Chen
2013-07-26 10:26 ` Tejun Heo
2013-07-26 10:27 ` Tejun Heo
2013-07-29 2:12 ` Tang Chen
2013-07-29 17:10 ` Tejun Heo
2013-07-19 7:59 ` [PATCH 15/21] x86, acpi, numa: Don't reserve memory on nodes the kernel resides in Tang Chen
2013-07-23 20:59 ` Tejun Heo
2013-07-25 2:34 ` Tang Chen
2013-07-19 7:59 ` [PATCH 16/21] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
2013-07-23 21:00 ` Tejun Heo
2013-07-25 2:35 ` Tang Chen
2013-07-19 7:59 ` [PATCH 17/21] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
2013-07-23 21:04 ` Tejun Heo
2013-07-23 21:11 ` Tejun Heo
2013-07-25 3:50 ` Tang Chen
2013-07-25 15:09 ` Tejun Heo
2013-07-26 3:58 ` Tang Chen
2013-07-19 7:59 ` [PATCH 18/21] x86, numa: Synchronize nid info in memblock.reserve with numa_meminfo Tang Chen
2013-07-23 21:25 ` Tejun Heo
2013-07-25 4:09 ` Tang Chen
2013-07-25 15:05 ` Tejun Heo
2013-07-26 4:00 ` Tang Chen
2013-07-19 7:59 ` [PATCH 19/21] x86, numa: Save nid when reserve memory into memblock.reserved[] Tang Chen
2013-07-19 7:59 ` [PATCH 20/21] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
2013-07-23 21:21 ` Tejun Heo
2013-07-19 7:59 ` [PATCH 21/21] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
2013-07-23 21:21 ` Tejun Heo
2013-07-25 3:53 ` Tang Chen
2013-07-22 2:48 ` Tang Chen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51EC9D6F.9010807@cn.fujitsu.com \
--to=tangchen@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=gong.chen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=jiang.liu@huawei.com \
--cc=jweiner@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=mingo@elte.hu \
--cc=prarit@redhat.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=trenn@suse.de \
--cc=vasilis.liaskovitis@profitbricks.com \
--cc=wency@cn.fujitsu.com \
--cc=x86@kernel.org \
--cc=yanghy@cn.fujitsu.com \
--cc=yinghai@kernel.org \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox