Re: [Part1 PATCH v5 00/22] x86, ACPI, numa: Parse numa info earlier

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tang Chen <tangchen@cn.fujitsu.com>
To: Tejun Heo <tj@kernel.org>
Cc: yinghai@kernel.org, tglx@linutronix.de, mingo@elte.hu,
	hpa@zytor.com, akpm@linux-foundation.org, trenn@suse.de,
	jiang.liu@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	isimatu.yasuaki@jp.fujitsu.com, mgorman@suse.de,
	minchan@kernel.org, mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	riel@redhat.com, jweiner@redhat.com, prarit@redhat.com,
	x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [Part1 PATCH v5 00/22] x86, ACPI, numa: Parse numa info earlier
Date: Mon, 24 Jun 2013 15:26:27 +0800	[thread overview]
Message-ID: <51C7F4A3.6060307@cn.fujitsu.com> (raw)
In-Reply-To: <51C7C258.8070906@cn.fujitsu.com>

On 06/24/2013 11:51 AM, Tang Chen wrote:
> On 06/22/2013 02:25 AM, Tejun Heo wrote:
>> Hey,
>>
>> On Fri, Jun 21, 2013 at 05:19:48PM +0800, Tang Chen wrote:
>>>> * As memblock allocator can relocate itself. There's no point in
>>>> avoiding setting NUMA node while parsing and registering NUMA
>>>> topology. Just parse and register NUMA info and later tell it to
>>>> relocate itself out of hot-pluggable node. A number of patches in
>>>> the series is doing this dancing - carefully reordering NUMA
>>>> probing. No need to do that. It's really fragile thing to do.
>>>>
>>>> * Once you get the above out of the way, I don't think there are a lot
>>>> of permanent allocations in the way before NUMA is initialized.
>>>> Re-order the remaining ones if that's cleaner to do. If that gets
>>>> overly messy / fragile, copying them around or freeing and reloading
>>>> afterwards could be an option too.
>>>
>>> memblock allocator can relocate itself, but it cannot relocate the
>>> memory
>>
>> Hmmm... maybe I wasn't clear but that's the first bullet point above.
>>
>>> it allocated for users. There could be some pointers pointing to these
>>> memory ranges. If we do the relocation, how to update these pointers ?
>>
>> And the second. Can you please list what persistent areas are
>> allocated before numa info is configured into memblock? There
>
> Hi tj,
>
> My box is x86_64, and the memory layout is:
> [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
> [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x307ffffff]
> [ 0.000000] SRAT: Node 1 PXM 2 [mem 0x308000000-0x587ffffff] Hot Pluggable
> [ 0.000000] SRAT: Node 2 PXM 3 [mem 0x588000000-0x7ffffffff] Hot Pluggable
>
>
> I marked ranges reserved by memblock before we parse SRAT with flag 0x4.
> There are about 14 ranges which is persistent after boot.
>
> [ 0.000000] reserved[0x0] [0x00000000000000-0x0000000000ffff], 0x10000
> bytes flags: 0x4
> [ 0.000000] reserved[0x1] [0x00000000093000-0x000000000fffff], 0x6d000
> bytes flags: 0x4
> [ 0.000000] reserved[0x2] [0x00000001000000-0x00000002a9afff], 0x1a9b000
> bytes flags: 0x4
> [ 0.000000] reserved[0x3] [0x00000030000000-0x00000037ffffff], 0x8000000
> bytes flags: 0x4
> ...
> [ 0.000000] reserved[0x5] [0x0000006da81000-0x0000006e46afff], 0x9ea000
> bytes flags: 0x4
> [ 0.000000] reserved[0x6] [0x0000006ed6a000-0x0000006f246fff], 0x4dd000
> bytes flags: 0x4
> [ 0.000000] reserved[0x7] [0x0000006f28a000-0x0000006f299fff], 0x10000
> bytes flags: 0x4
> [ 0.000000] reserved[0x8] [0x0000006f29c000-0x0000006fe91fff], 0xbf6000
> bytes flags: 0x4
> [ 0.000000] reserved[0x9] [0x00000070e92000-0x00000071d54fff], 0xec3000
> bytes flags: 0x4
> [ 0.000000] reserved[0xa] [0x00000071d5e000-0x00000072204fff], 0x4a7000
> bytes flags: 0x4
> [ 0.000000] reserved[0xb] [0x00000072220000-0x0000007222074f], 0x750
> bytes flags: 0x4
> ...
> [ 0.000000] reserved[0xd] [0x000000722bc000-0x000000722bc1cf], 0x1d0
> bytes flags: 0x4
> [ 0.000000] reserved[0xe] [0x00000072bd3000-0x00000076c8ffff], 0x40bd000
> bytes flags: 0x4
> ......
> [ 0.000000] reserved[0x134] [0x000007fffdf000-0x000007ffffffff], 0x21000
> bytes flags: 0x4

This range is allocated by init_mem_mapping() in setup_arch(), it calls
alloc_low_pages() to allocate pagetable pages.

I think if we do the local device pagetable, we can solve this problem
without any relocation.

I will make a patch trying to do this. But I'm not sure if there are any
other relocation problems on other architectures.

But even if not, I still think this could be dangerous if someone modifies
the boot path and allocates some persistent memory before SRAT parsed in
the future. He has to be aware of memory hotplug things and do the 
necessary
relocation himself.

I'll try to make the patch to acheve this with comment as full as possible.

Thanks. :)

>
>
> Just for the readability:
> [0x00000308000000-0x00000587ffffff] Hot Pluggable
> [0x00000588000000-0x000007ffffffff] Hot Pluggable
>
> Seeing from the dmesg, only the last one is in hotpluggable area. I need
> to go
> through the code to find out what it is, and find a way to relocate it.
>
> But I'm not sure if a box with a different SRAT will have different result.
>
> I will send more info later.
>
> Thanks. :)
>
>
>> shouldn't be whole lot. And, again, this type of information should
>> have been available in the head message so that high-level discussion
>> could take place right away.
>>
>> Thanks.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-06-24  7:23 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13 13:02 Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 01/22] x86: Change get_ramdisk_{image|size}() to global Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 02/22] x86, microcode: Use common get_ramdisk_{image|size}() Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Tang Chen
2013-06-17 21:04   ` Tejun Heo
2013-06-17 21:13     ` Yinghai Lu
2013-06-17 23:08       ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 04/22] x86, ACPI: Search buffer above 4GB in a second try for acpi initrd table override Tang Chen
2013-06-17 21:06   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 05/22] x86, ACPI: Increase acpi initrd override tables number limit Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override() into find/copy two steps Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array Tang Chen
2013-06-17 23:38   ` Tejun Heo
2013-06-17 23:40     ` Yinghai Lu
2013-06-17 23:52   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Tang Chen
2013-06-18  0:07   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Tang Chen
2013-06-18  0:33   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 10/22] x86, mm, numa: Move two functions calling on successful path later Tang Chen
2013-06-18  0:53   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Tang Chen
2013-06-18  1:05   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 12/22] x86, mm, numa: Move node_map_pfn_alignment() to x86 Tang Chen
2013-06-18  1:08   ` Tejun Heo
2013-06-13 13:03 ` [Part1 PATCH v5 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Tang Chen
2013-06-18  1:40   ` Tejun Heo
2013-06-13 13:03 ` [Part1 PATCH v5 14/22] x86, mm, numa: Set memblock nid later Tang Chen
2013-06-18  1:45   ` Tejun Heo
2013-06-13 13:03 ` [Part1 PATCH v5 15/22] x86, mm, numa: Move node_possible_map setting later Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 16/22] x86, mm, numa: Move numa emulation handling down Tang Chen
2013-06-18  1:58   ` Tejun Heo
2013-06-18  6:22     ` Yinghai Lu
2013-06-18  7:13       ` Yinghai Lu
2013-06-19 21:25       ` Yinghai Lu
2013-06-13 13:03 ` [Part1 PATCH v5 17/22] x86, ACPI, numa, ia64: split SLIT handling out Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 18/22] x86, mm, numa: Add early_initmem_init() stub Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 19/22] x86, mm: Parse numa info earlier Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 20/22] x86, mm: Add comments for step_size shift Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 21/22] x86, mm: Make init_mem_mapping be able to be called several times Tang Chen
2013-06-13 18:35   ` Konrad Rzeszutek Wilk
2013-06-13 22:47     ` Yinghai Lu
2013-06-14  5:08       ` Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit Tang Chen
2013-06-18  2:03 ` [Part1 PATCH v5 00/22] x86, ACPI, numa: Parse numa info earlier Tejun Heo
2013-06-18  5:47   ` Tang Chen
2013-06-18 17:21     ` Tejun Heo
2013-06-20  5:52       ` Tang Chen
2013-06-20  6:17         ` Tejun Heo
2013-06-21  9:19           ` Tang Chen
2013-06-21 18:25             ` Tejun Heo
2013-06-24  3:51               ` Tang Chen
2013-06-24  7:26                 ` Tang Chen [this message]
2013-06-24 19:59                   ` Tejun Heo
2013-06-18 17:10 ` Vasilis Liaskovitis
2013-06-18 20:19   ` Yinghai Lu
2013-06-19 10:05     ` Vasilis Liaskovitis
2013-06-20 18:42       ` Yinghai Lu
2013-06-24  9:40   ` Gu Zheng
2013-06-21  5:19 ` H. Peter Anvin
2013-06-21  6:06   ` Tang Chen
2013-06-21  6:10     ` H. Peter Anvin
2013-06-21  6:20       ` Tang Chen
2013-06-21  6:26         ` Tejun Heo
2013-06-21 20:18   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51C7F4A3.6060307@cn.fujitsu.com \
    --to=tangchen@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox