From: "Moore, Robert" <robert.moore@intel.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>, Tang Chen <tangchen@cn.fujitsu.com>
Cc: "Zheng, Lv" <lv.zheng@intel.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@elte.hu" <mingo@elte.hu>, "hpa@zytor.com" <hpa@zytor.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"tj@kernel.org" <tj@kernel.org>, "trenn@suse.de" <trenn@suse.de>,
"yinghai@kernel.org" <yinghai@kernel.org>,
"jiang.liu@huawei.com" <jiang.liu@huawei.com>,
"wency@cn.fujitsu.com" <wency@cn.fujitsu.com>,
"laijs@cn.fujitsu.com" <laijs@cn.fujitsu.com>,
"isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>,
"izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>,
"mgorman@suse.de" <mgorman@suse.de>,
"minchan@kernel.org" <minchan@kernel.org>,
"mina86@mina86.com" <mina86@mina86.com>,
"gong.chen@linux.intel.com" <gong.chen@linux.intel.com>,
"vasilis.liaskovitis@profitbricks.com"
<vasilis.liaskovitis@profitbricks.com>,
"lwoodman@redhat.com" <lwoodman@redhat.com>,
"riel@redhat.com" <riel@redhat.com>,
"jweiner@redhat.com" <jweiner@redhat.com>,
"prarit@redhat.com" <prarit@redhat.com>,
"Box, David E" <david.e.box@intel.com>,
"zhangyanfei@cn.fujitsu.com" <zhangyanfei@cn.fujitsu.com>,
"yanghy@cn.fujitsu.com" <yanghy@cn.fujitsu.com>,
"x86@kernel.org" <x86@kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>
Subject: RE: [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE.
Date: Thu, 8 Aug 2013 03:01:20 +0000 [thread overview]
Message-ID: <94F2FBAB4432B54E8AACC7DFDE6C92E36FEAC85B@ORSMSX103.amr.corp.intel.com> (raw)
In-Reply-To: <1786839.lAdBpJ22ie@vostro.rjw.lan>
> -----Original Message-----
> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Wednesday, August 07, 2013 4:49 PM
> To: Tang Chen
> Cc: Moore, Robert; Zheng, Lv; lenb@kernel.org; tglx@linutronix.de;
> mingo@elte.hu; hpa@zytor.com; akpm@linux-foundation.org; tj@kernel.org;
> trenn@suse.de; yinghai@kernel.org; jiang.liu@huawei.com;
> wency@cn.fujitsu.com; laijs@cn.fujitsu.com;
> isimatu.yasuaki@jp.fujitsu.com; izumi.taku@jp.fujitsu.com;
> mgorman@suse.de; minchan@kernel.org; mina86@mina86.com;
> gong.chen@linux.intel.com; vasilis.liaskovitis@profitbricks.com;
> lwoodman@redhat.com; riel@redhat.com; jweiner@redhat.com;
> prarit@redhat.com; zhangyanfei@cn.fujitsu.com; yanghy@cn.fujitsu.com;
> x86@kernel.org; linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-mm@kvack.org; linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE.
>
> On Wednesday, August 07, 2013 06:51:51 PM Tang Chen wrote:
> > This patch-set aims to solve some problems at system boot time to
> > enhance memory hotplug functionality.
> >
> > [Background]
> >
> > The Linux kernel cannot migrate pages used by the kernel because of
> > the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
> > physical address is changed, we cannot simply update the kernel
> > pagetable. On the contrary, we have to update all the pointers
> > pointing to the virtual address, which is very difficult to do.
> >
> > In order to do memory hotplug, we should prevent the kernel to use
> > hotpluggable memory.
> >
> > In ACPI, there is a table named SRAT(System Resource Affinity Table).
> > It contains system NUMA info (CPUs, memory ranges, PXM), and also a
> > flag field indicating which memory ranges are hotpluggable.
> >
> >
> > [Problem to be solved]
> >
> > At the very early time when the system is booting, we use a bootmem
> > allocator, named memblock, to allocate memory for the kernel.
> > memblock will start to work before the kernel parse SRAT, which means
> > memblock won't know which memory is hotpluggable before SRAT is
> > parsed.
> >
> > So at this time, memblock could allocate hotpluggable memory for the
> > kernel to use permanently. For example, the kernel may allocate
> > pagetables in hotpluggable memory, which cannot be freed when the
> > system is up.
> >
> > So we have to prevent memblock allocating hotpluggable memory for the
> > kernel at the early boot time.
> >
> >
> > [Earlier solutions]
> >
> > We have tried to parse SRAT earlier, before memblock is ready. To do
> > this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
> > Otherwise the override tables won't be able to effect.
> >
> > This is not that easy to do because memblock is ready before direct
> > mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
> > procedure into two steps: find and copy. Please refer to the following
> > patch-set:
> > https://lkml.org/lkml/2013/6/13/587
> >
> > To this solution, tj gave a lot of comments and the following
> > suggestions.
> >
> >
> > [Suggestion from tj]
> >
> > tj mainly gave the following suggestions:
> >
> > 1. Necessary reordering is OK, but we should not rely on
> > reordering to achieve the goal because it makes the kernel
> > too fragile.
> >
> > 2. Memory allocated to kernel for temporary usage is OK because
> > it will be freed when the system is up. Doing relocation
> > for permanent allocated hotpluggable memory will make the
> > the kernel more robust.
> >
> > 3. Need to enhance memblock to discover and complain if any
> > hotpluggable memory is allocated to kernel.
> >
> > After a long thinking, we choose not to do the relocation for the
> > following reasons:
> >
> > 1. It's easy to find out the allocated hotpluggable memory. But
> > memblock will merge the adjoined ranges owned by different users
> > and used for different purposes. It's hard to find the owners.
> >
> > 2. Different memory has different way to be relocated. I think one
> > function for each kind of memory will make the code too messy.
> >
> > 3. Pagetable could be in hotpluggable memory. Relocating pagetable
> > is too difficult and risky. We have to update all PUD, PMD pages.
> > And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
> > are not long after pagetable is initialized. If we relocate the
> > pagetable not long after it was initialized, the code will be
> > very ugly.
> >
> >
> > [Solution in this patch-set]
> >
> > In this patch-set, we still do the reordering, but in a new way.
> >
> > 1. Improve memblock with flags, so that it is able to differentiate
> > memory regions for different usage. And also a MEMBLOCK_HOTPLUG
> > flag to mark hotpluggable memory.
> >
> > 2. When memblock is ready (memblock_x86_fill() is called), initialize
> > acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
> > Now, we have all the ACPI tables' phys addrs provided by firmware.
> >
> > 3. Check if there is a SRAT in initrd file used to override the one
> > provided by firmware. If so, get its phys addr.
> >
> > 4. If no override SRAT in initrd, get the phys addr of the SRAT
> > provided by firmware.
> >
> > Now, we have the phys addr of the to be used SRAT, the one in
> > initrd or the one in firmware.
> >
> > 5. Parse only the memory affinities in SRAT, find out all the
> > hotpluggable memory regions and mark them in memblock.memory with
> > MEMBLOCK_HOTPLUG flag.
> >
> > 6. The kernel goes through the current path. Any other related parts,
> > such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
> > tables pathes, global variable numa_meminfo, and so on, are not
> > modified. They work as before.
> >
> > 7. Make memblock default allocator skip hotpluggable memory.
> >
> > 8. Introduce movablenode boot option to allow users to enable
> > and disable this functionality.
> >
> >
> > In summary, in order to get hotpluggable memory info as early as
> > possible, this patch-set only parse memory affinities in SRAT one more
> > time right after memblock is ready, and leave all the other pathes
> > untouched. With the hotpluggable memory info, we can arrange
> > hotpluggable memory in ZONE_MOVABLE to prevent the kernel to use it.
> >
> > change log v2 RESEND -> v3:
> > 1. As Rafael and Lv Zheng suggested, split acpi global root table list
> > initialization procedure into two steps: install and override. And
> > do the "install" step earlier.
>
> This looks a bit more manageable than before, but please do one more
> thing:
> Please split all of the ACPICA changes out into separate patches and put
> those patched in front of everything else.
>
> The reason is we may need to merge them through upstream ACPICA as the
> first step (if they are accepted by the ACPICA maintainers).
>
Yes, we (ACPICA) would like to see them all together in one place so that we can review.
Thanks,
Bob
> Thanks,
> Rafael
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
next prev parent reply other threads:[~2013-08-08 3:01 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-07 10:51 Tang Chen
2013-08-07 10:51 ` [PATCH v3 01/25] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-08-07 10:51 ` [PATCH v3 02/25] earlycpio.c: Fix the confusing comment of find_cpio_data() Tang Chen
2013-08-07 10:51 ` [PATCH v3 03/25] acpi: Remove "continue" in macro INVALID_TABLE() Tang Chen
2013-08-07 10:51 ` [PATCH v3 04/25] acpi: Introduce acpi_verify_initrd() to check if a table is invalid Tang Chen
2013-08-07 10:51 ` [PATCH v3 05/25] acpi, acpica: Split acpi_tb_install_table() into two parts Tang Chen
2013-08-07 10:51 ` [PATCH v3 06/25] acpi, acpica: Call two new functions instead of acpi_tb_install_table() in acpi_tb_parse_root_table() Tang Chen
2013-08-07 10:51 ` [PATCH v3 07/25] acpi, acpica: Split acpi_tb_parse_root_table() into two parts Tang Chen
2013-08-07 10:51 ` [PATCH v3 08/25] acpi, acpica: Call two new functions instead of acpi_tb_parse_root_table() in acpi_initialize_tables() Tang Chen
2013-08-07 10:52 ` [PATCH v3 09/25] acpi, acpica: Split acpi_initialize_tables() into two parts Tang Chen
2013-08-07 10:52 ` [PATCH v3 10/25] x86, acpi: Call two new functions instead of acpi_initialize_tables() in acpi_table_init() Tang Chen
2013-08-07 10:52 ` [PATCH v3 11/25] x86, acpi: Split acpi_table_init() into two parts Tang Chen
2013-08-07 10:52 ` [PATCH v3 12/25] x86, acpi: Rename check_multiple_madt() and make it global Tang Chen
2013-08-07 10:52 ` [PATCH v3 13/25] x86, acpi: Split acpi_boot_table_init() into two parts Tang Chen
2013-08-07 10:52 ` [PATCH v3 14/25] x86, acpi: Initialize acpi golbal root table list earlier Tang Chen
2013-08-07 10:52 ` [PATCH v3 15/25] x86: get pg_data_t's memory from other node Tang Chen
2013-08-07 10:52 ` [PATCH v3 16/25] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-07 10:52 ` [PATCH v3 17/25] x86, acpica, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-08-07 10:52 ` [PATCH v3 18/25] x86, acpica, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-08-07 10:52 ` [PATCH v3 19/25] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities Tang Chen
2013-08-07 10:52 ` [PATCH v3 20/25] x86, numa, mem_hotplug: Skip all the regions the kernel resides in Tang Chen
2013-08-07 10:52 ` [PATCH v3 21/25] memblock, numa: Introduce flag into memblock Tang Chen
2013-08-07 10:52 ` [PATCH v3 22/25] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions Tang Chen
2013-08-07 10:52 ` [PATCH v3 23/25] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default Tang Chen
2013-08-07 10:52 ` [PATCH v3 24/25] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT Tang Chen
2013-08-07 10:52 ` [PATCH v3 25/25] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority Tang Chen
2013-08-07 23:48 ` [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE Rafael J. Wysocki
2013-08-08 3:01 ` Moore, Robert [this message]
2013-08-08 3:41 ` Tang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94F2FBAB4432B54E8AACC7DFDE6C92E36FEAC85B@ORSMSX103.amr.corp.intel.com \
--to=robert.moore@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david.e.box@intel.com \
--cc=gong.chen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=jiang.liu@huawei.com \
--cc=jweiner@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lv.zheng@intel.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=mingo@elte.hu \
--cc=prarit@redhat.com \
--cc=riel@redhat.com \
--cc=rjw@sisk.pl \
--cc=tangchen@cn.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=trenn@suse.de \
--cc=vasilis.liaskovitis@profitbricks.com \
--cc=wency@cn.fujitsu.com \
--cc=x86@kernel.org \
--cc=yanghy@cn.fujitsu.com \
--cc=yinghai@kernel.org \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox