linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Moore, Robert" <robert.moore@intel.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>, Tang Chen <tangchen@cn.fujitsu.com>
Cc: "Zheng, Lv" <lv.zheng@intel.com>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@elte.hu" <mingo@elte.hu>, "hpa@zytor.com" <hpa@zytor.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"tj@kernel.org" <tj@kernel.org>, "trenn@suse.de" <trenn@suse.de>,
	"yinghai@kernel.org" <yinghai@kernel.org>,
	"jiang.liu@huawei.com" <jiang.liu@huawei.com>,
	"wency@cn.fujitsu.com" <wency@cn.fujitsu.com>,
	"laijs@cn.fujitsu.com" <laijs@cn.fujitsu.com>,
	"isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>,
	"izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"minchan@kernel.org" <minchan@kernel.org>,
	"mina86@mina86.com" <mina86@mina86.com>,
	"gong.chen@linux.intel.com" <gong.chen@linux.intel.com>,
	"vasilis.liaskovitis@profitbricks.com"
	<vasilis.liaskovitis@profitbricks.com>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	"riel@redhat.com" <riel@redhat.com>,
	"jweiner@redhat.com" <jweiner@redhat.com>,
	"prarit@redhat.com" <prarit@redhat.com>,
	"Box, David E" <david.e.box@intel.com>,
	"zhangyanfei@cn.fujitsu.com" <zhangyanfei@cn.fujitsu.com>,
	"yanghy@cn.fujitsu.com" <yanghy@cn.fujitsu.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>
Subject: RE: [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE.
Date: Thu, 8 Aug 2013 03:01:20 +0000	[thread overview]
Message-ID: <94F2FBAB4432B54E8AACC7DFDE6C92E36FEAC85B@ORSMSX103.amr.corp.intel.com> (raw)
In-Reply-To: <1786839.lAdBpJ22ie@vostro.rjw.lan>



> -----Original Message-----
> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Wednesday, August 07, 2013 4:49 PM
> To: Tang Chen
> Cc: Moore, Robert; Zheng, Lv; lenb@kernel.org; tglx@linutronix.de;
> mingo@elte.hu; hpa@zytor.com; akpm@linux-foundation.org; tj@kernel.org;
> trenn@suse.de; yinghai@kernel.org; jiang.liu@huawei.com;
> wency@cn.fujitsu.com; laijs@cn.fujitsu.com;
> isimatu.yasuaki@jp.fujitsu.com; izumi.taku@jp.fujitsu.com;
> mgorman@suse.de; minchan@kernel.org; mina86@mina86.com;
> gong.chen@linux.intel.com; vasilis.liaskovitis@profitbricks.com;
> lwoodman@redhat.com; riel@redhat.com; jweiner@redhat.com;
> prarit@redhat.com; zhangyanfei@cn.fujitsu.com; yanghy@cn.fujitsu.com;
> x86@kernel.org; linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-mm@kvack.org; linux-acpi@vger.kernel.org
> Subject: Re: [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE.
> 
> On Wednesday, August 07, 2013 06:51:51 PM Tang Chen wrote:
> > This patch-set aims to solve some problems at system boot time to
> > enhance memory hotplug functionality.
> >
> > [Background]
> >
> > The Linux kernel cannot migrate pages used by the kernel because of
> > the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
> > physical address is changed, we cannot simply update the kernel
> > pagetable. On the contrary, we have to update all the pointers
> > pointing to the virtual address, which is very difficult to do.
> >
> > In order to do memory hotplug, we should prevent the kernel to use
> > hotpluggable memory.
> >
> > In ACPI, there is a table named SRAT(System Resource Affinity Table).
> > It contains system NUMA info (CPUs, memory ranges, PXM), and also a
> > flag field indicating which memory ranges are hotpluggable.
> >
> >
> > [Problem to be solved]
> >
> > At the very early time when the system is booting, we use a bootmem
> > allocator, named memblock, to allocate memory for the kernel.
> > memblock will start to work before the kernel parse SRAT, which means
> > memblock won't know which memory is hotpluggable before SRAT is
> > parsed.
> >
> > So at this time, memblock could allocate hotpluggable memory for the
> > kernel to use permanently. For example, the kernel may allocate
> > pagetables in hotpluggable memory, which cannot be freed when the
> > system is up.
> >
> > So we have to prevent memblock allocating hotpluggable memory for the
> > kernel at the early boot time.
> >
> >
> > [Earlier solutions]
> >
> > We have tried to parse SRAT earlier, before memblock is ready. To do
> > this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
> > Otherwise the override tables won't be able to effect.
> >
> > This is not that easy to do because memblock is ready before direct
> > mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
> > procedure into two steps: find and copy. Please refer to the following
> > patch-set:
> >         https://lkml.org/lkml/2013/6/13/587
> >
> > To this solution, tj gave a lot of comments and the following
> > suggestions.
> >
> >
> > [Suggestion from tj]
> >
> > tj mainly gave the following suggestions:
> >
> > 1. Necessary reordering is OK, but we should not rely on
> >    reordering to achieve the goal because it makes the kernel
> >    too fragile.
> >
> > 2. Memory allocated to kernel for temporary usage is OK because
> >    it will be freed when the system is up. Doing relocation
> >    for permanent allocated hotpluggable memory will make the
> >    the kernel more robust.
> >
> > 3. Need to enhance memblock to discover and complain if any
> >    hotpluggable memory is allocated to kernel.
> >
> > After a long thinking, we choose not to do the relocation for the
> > following reasons:
> >
> > 1. It's easy to find out the allocated hotpluggable memory. But
> >    memblock will merge the adjoined ranges owned by different users
> >    and used for different purposes. It's hard to find the owners.
> >
> > 2. Different memory has different way to be relocated. I think one
> >    function for each kind of memory will make the code too messy.
> >
> > 3. Pagetable could be in hotpluggable memory. Relocating pagetable
> >    is too difficult and risky. We have to update all PUD, PMD pages.
> >    And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
> >    are not long after pagetable is initialized. If we relocate the
> >    pagetable not long after it was initialized, the code will be
> >    very ugly.
> >
> >
> > [Solution in this patch-set]
> >
> > In this patch-set, we still do the reordering, but in a new way.
> >
> > 1. Improve memblock with flags, so that it is able to differentiate
> >    memory regions for different usage. And also a MEMBLOCK_HOTPLUG
> >    flag to mark hotpluggable memory.
> >
> > 2. When memblock is ready (memblock_x86_fill() is called), initialize
> >    acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
> >    Now, we have all the ACPI tables' phys addrs provided by firmware.
> >
> > 3. Check if there is a SRAT in initrd file used to override the one
> >    provided by firmware. If so, get its phys addr.
> >
> > 4. If no override SRAT in initrd, get the phys addr of the SRAT
> >    provided by firmware.
> >
> >    Now, we have the phys addr of the to be used SRAT, the one in
> >    initrd or the one in firmware.
> >
> > 5. Parse only the memory affinities in SRAT, find out all the
> >    hotpluggable memory regions and mark them in memblock.memory with
> >    MEMBLOCK_HOTPLUG flag.
> >
> > 6. The kernel goes through the current path. Any other related parts,
> >    such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
> >    tables pathes, global variable numa_meminfo, and so on, are not
> >    modified. They work as before.
> >
> > 7. Make memblock default allocator skip hotpluggable memory.
> >
> > 8. Introduce movablenode boot option to allow users to enable
> >    and disable this functionality.
> >
> >
> > In summary, in order to get hotpluggable memory info as early as
> > possible, this patch-set only parse memory affinities in SRAT one more
> > time right after memblock is ready, and leave all the other pathes
> > untouched. With the hotpluggable memory info, we can arrange
> > hotpluggable memory in ZONE_MOVABLE to prevent the kernel to use it.
> >
> > change log v2 RESEND -> v3:
> > 1. As Rafael and Lv Zheng suggested, split acpi global root table list
> >    initialization procedure into two steps: install and override. And
> >    do the "install" step earlier.
> 
> This looks a bit more manageable than before, but please do one more
> thing:
> Please split all of the ACPICA changes out into separate patches and put
> those patched in front of everything else.
> 
> The reason is we may need to merge them through upstream ACPICA as the
> first step (if they are accepted by the ACPICA maintainers).
> 


Yes, we (ACPICA) would like to see them all together in one place so that we can review.
Thanks,
Bob




> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

  reply	other threads:[~2013-08-08  3:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-07 10:51 Tang Chen
2013-08-07 10:51 ` [PATCH v3 01/25] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-08-07 10:51 ` [PATCH v3 02/25] earlycpio.c: Fix the confusing comment of find_cpio_data() Tang Chen
2013-08-07 10:51 ` [PATCH v3 03/25] acpi: Remove "continue" in macro INVALID_TABLE() Tang Chen
2013-08-07 10:51 ` [PATCH v3 04/25] acpi: Introduce acpi_verify_initrd() to check if a table is invalid Tang Chen
2013-08-07 10:51 ` [PATCH v3 05/25] acpi, acpica: Split acpi_tb_install_table() into two parts Tang Chen
2013-08-07 10:51 ` [PATCH v3 06/25] acpi, acpica: Call two new functions instead of acpi_tb_install_table() in acpi_tb_parse_root_table() Tang Chen
2013-08-07 10:51 ` [PATCH v3 07/25] acpi, acpica: Split acpi_tb_parse_root_table() into two parts Tang Chen
2013-08-07 10:51 ` [PATCH v3 08/25] acpi, acpica: Call two new functions instead of acpi_tb_parse_root_table() in acpi_initialize_tables() Tang Chen
2013-08-07 10:52 ` [PATCH v3 09/25] acpi, acpica: Split acpi_initialize_tables() into two parts Tang Chen
2013-08-07 10:52 ` [PATCH v3 10/25] x86, acpi: Call two new functions instead of acpi_initialize_tables() in acpi_table_init() Tang Chen
2013-08-07 10:52 ` [PATCH v3 11/25] x86, acpi: Split acpi_table_init() into two parts Tang Chen
2013-08-07 10:52 ` [PATCH v3 12/25] x86, acpi: Rename check_multiple_madt() and make it global Tang Chen
2013-08-07 10:52 ` [PATCH v3 13/25] x86, acpi: Split acpi_boot_table_init() into two parts Tang Chen
2013-08-07 10:52 ` [PATCH v3 14/25] x86, acpi: Initialize acpi golbal root table list earlier Tang Chen
2013-08-07 10:52 ` [PATCH v3 15/25] x86: get pg_data_t's memory from other node Tang Chen
2013-08-07 10:52 ` [PATCH v3 16/25] x86: Make get_ramdisk_{image|size}() global Tang Chen
2013-08-07 10:52 ` [PATCH v3 17/25] x86, acpica, acpi: Try to find if SRAT is overrided earlier Tang Chen
2013-08-07 10:52 ` [PATCH v3 18/25] x86, acpica, acpi: Try to find SRAT in firmware earlier Tang Chen
2013-08-07 10:52 ` [PATCH v3 19/25] x86, acpi, numa, mem_hotplug: Find hotpluggable memory in SRAT memory affinities Tang Chen
2013-08-07 10:52 ` [PATCH v3 20/25] x86, numa, mem_hotplug: Skip all the regions the kernel resides in Tang Chen
2013-08-07 10:52 ` [PATCH v3 21/25] memblock, numa: Introduce flag into memblock Tang Chen
2013-08-07 10:52 ` [PATCH v3 22/25] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions Tang Chen
2013-08-07 10:52 ` [PATCH v3 23/25] memblock, mem_hotplug: Make memblock skip hotpluggable regions by default Tang Chen
2013-08-07 10:52 ` [PATCH v3 24/25] mem-hotplug: Introduce movablenode boot option to {en|dis}able using SRAT Tang Chen
2013-08-07 10:52 ` [PATCH v3 25/25] x86, numa, acpi, memory-hotplug: Make movablenode have higher priority Tang Chen
2013-08-07 23:48 ` [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE Rafael J. Wysocki
2013-08-08  3:01   ` Moore, Robert [this message]
2013-08-08  3:41     ` Tang Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94F2FBAB4432B54E8AACC7DFDE6C92E36FEAC85B@ORSMSX103.amr.corp.intel.com \
    --to=robert.moore@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david.e.box@intel.com \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lv.zheng@intel.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yanghy@cn.fujitsu.com \
    --cc=yinghai@kernel.org \
    --cc=zhangyanfei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox