From: Tang Chen <tangchen@cn.fujitsu.com>
To: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Cc: mingo@redhat.com, hpa@zytor.com, akpm@linux-foundation.org,
yinghai@kernel.org, jiang.liu@huawei.com, wency@cn.fujitsu.com,
laijs@cn.fujitsu.com, isimatu.yasuaki@jp.fujitsu.com,
tj@kernel.org, mgorman@suse.de, minchan@kernel.org,
mina86@mina86.com, gong.chen@linux.intel.com,
lwoodman@redhat.com, riel@redhat.com, jweiner@redhat.com,
prarit@redhat.com, x86@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in.
Date: Mon, 03 Jun 2013 15:35:53 +0800 [thread overview]
Message-ID: <51AC4759.6090101@cn.fujitsu.com> (raw)
In-Reply-To: <20130531162401.GA31139@dhcp-192-168-178-175.profitbricks.localdomain>
Hi Vasilis,
On 06/01/2013 12:24 AM, Vasilis Liaskovitis wrote:
......
>> +void __init_memblock memblock_mark_kernel_nodes()
>> +{
>> + int i, nid;
>> + struct memblock_type *reserved =&memblock.reserved;
>> +
>> + for (i = 0; i< reserved->cnt; i++)
>> + if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
>> + nid = memblock_get_region_node(&reserved->regions[i]);
>> + node_set(nid, memblock_kernel_nodemask);
>> + }
>> +}
>
> I think there is a problem here because memblock_set_region_node is sometimes
> called with nid == MAX_NUMNODES. This means the correct node is not properly
> masked in the memblock_kernel_nodemask bitmap.
> E.g. in a VM test, memblock_mark_kernel_nodes with extra pr_warn calls iterates
> over the following memblocks (ranges below are memblks base-(base+size)):
>
> [ 0.000000] memblock_mark_kernel_nodes nid=64 0x00000000000000-0x00000000010000
> [ 0.000000] memblock_mark_kernel_nodes nid=64 0x00000000098000-0x00000000100000
> [ 0.000000] memblock_mark_kernel_nodes nid=64 0x00000001000000-0x00000001a5a000
> [ 0.000000] memblock_mark_kernel_nodes nid=64 0x00000037000000-0x000000377f8000
>
> where MAX_NUMNODES is 64 because CONFIG_NODES_SHIFT=6.
> The ranges above belong to node 0, but the node's bit is never marked.
>
> With a buggy bios that marks all memory as hotpluggable, this results in a
> panic, because both checks against hotpluggable bit and memblock_kernel_bitmask
> (in early_mem_hotplug_init) fail, the numa regions have all been merged together
> and memblock_reserve_hotpluggable is called for all memory.
>
> With a correct bios (some part of initial memory is not hotplug-able) the kernel
> can boot since the hotpluggable bit check works ok, but extra dimms on node 0
> will still be allowed to be in MOVABLE_ZONE.
>
OK, I see the problem. But would you please give me a call trace that
can show
how this could happen. I think the memory block info should be the same as
numa_meminfo. Can we fix the caller to make it set nid correctly ?
> Actually this behaviour (being able to have MOVABLE memory on nodes with kernel
> reserved memblocks) sort of matches the policy I requested in v2 :). But i
> suspect that is not your intent i.e. you want memblock_kernel_nodemask_bitmap to
> prevent movable reservations for the whole node where kernel has reserved
> memblocks.
I intended to set the whole node which the kernel resides in as
un-hotpluggable.
>
> Is there a way to get accurate nid information for memblocks at early boot? I
> suspect pfn_to_nid doesn't work yet at this stage (i got a panic when I
> attempted iirc)
In such an early time, I think we can only get nid from numa_meminfo. So
as I
said above, I'd like to fix this problem by making memblock has correct nid.
And I read the patch below. I think if we get nid from numa_meminfo,
than we
don't need to call memblock_get_region_node().
Thanks. :)
>
> I used the hack below but it depends on CONFIG_NUMA, hopefully there is a
> cleaner general way:
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index cfd8c2f..af8ad2a 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -133,6 +133,19 @@ void __init setup_node_to_cpumask_map(void)
> pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> }
>
> +int __init numa_find_range_nid(u64 start, u64 size)
> +{
> + unsigned int i;
> + struct numa_meminfo *mi =&numa_meminfo;
> +
> + for (i = 0; i< mi->nr_blks; i++) {
> + if (start>= mi->blk[i].start&& start + size -1<= mi->blk[i].end)
> + return mi->blk[i].nid;
> + }
> + return -1;
> +}
> +EXPORT_SYMBOL(numa_find_range_nid);
> +
> static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
> bool hotpluggable,
> struct numa_meminfo *mi)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77a71fb..194b7c7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1600,6 +1600,9 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
> unsigned long start, unsigned long end);
> #endif
>
> +#ifdef CONFIG_NUMA
> +int __init numa_find_range_nid(u64 start, u64 size);
> +#endif
> struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
> int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
> unsigned long pfn, unsigned long size, pgprot_t);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index a6b7845..284aced 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -834,15 +834,26 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
>
> void __init_memblock memblock_mark_kernel_nodes()
> {
> - int i, nid;
> + int i, nid, tmpnid;
> struct memblock_type *reserved =&memblock.reserved;
>
> for (i = 0; i< reserved->cnt; i++)
> if (reserved->regions[i].flags == MEMBLK_FLAGS_DEFAULT) {
> nid = memblock_get_region_node(&reserved->regions[i]);
> + if (nid == MAX_NUMNODES) {
> + tmpnid = numa_find_range_nid(reserved->regions[i].base,
> + reserved->regions[i].size);
> + if (tmpnid>= 0)
> + nid = tmpnid;
> + }
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index e862311..84d6e64 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -667,11 +667,7 @@ static bool srat_used __initdata;
> */
> static void __init early_x86_numa_init(void)
> {
> - /*
> - * Need to find out which nodes the kernel resides in, and arrange
> - * them as un-hotpluggable when parsing SRAT.
> - */
> - memblock_mark_kernel_nodes();
>
> if (!numa_off) {
> #ifdef CONFIG_X86_NUMAQ
> @@ -779,6 +775,12 @@ void __init early_initmem_init(void)
> load_cr3(swapper_pg_dir);
> __flush_tlb_all();
>
> + /*
> + * Need to find out which nodes the kernel resides in, and arrange
> + * them as un-hotpluggable when parsing SRAT.
> + */
> +
> + memblock_mark_kernel_nodes();
> early_mem_hotplug_init();
>
> early_memtest(0, max_pfn_mapped<<PAGE_SHIFT);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-03 7:33 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-24 9:29 [PATCH v3 00/13] Arrange hotpluggable memory in SRAT as ZONE_MOVABLE Tang Chen
2013-05-24 9:29 ` [PATCH v3 01/13] x86: get pg_data_t's memory from other node Tang Chen
2013-06-03 0:31 ` Wanpeng Li
2013-06-03 0:31 ` Wanpeng Li
2013-05-24 9:29 ` [PATCH v3 02/13] acpi: Print Hot-Pluggable Field in SRAT Tang Chen
2013-06-03 0:50 ` Wanpeng Li
2013-06-03 0:50 ` Wanpeng Li
2013-05-24 9:29 ` [PATCH v3 03/13] page_alloc, mem-hotplug: Improve movablecore to {en|dis}able using SRAT Tang Chen
2013-06-03 0:52 ` Wanpeng Li
2013-06-03 0:52 ` Wanpeng Li
2013-05-24 9:29 ` [PATCH v3 04/13] x86, numa, acpi, memory-hotplug: Introduce hotplug info into struct numa_meminfo Tang Chen
2013-05-24 9:29 ` [PATCH v3 05/13] x86, numa, acpi, memory-hotplug: Consider hotplug info when cleanup numa_meminfo Tang Chen
2013-05-24 9:29 ` [PATCH v3 06/13] memblock, numa: Introduce flag into memblock Tang Chen
2013-06-03 1:30 ` Wanpeng Li
2013-06-03 1:30 ` Wanpeng Li
2013-06-03 1:59 ` Tang Chen
2013-05-24 9:29 ` [PATCH v3 07/13] x86, numa, mem-hotplug: Mark nodes which the kernel resides in Tang Chen
2013-05-31 16:24 ` Vasilis Liaskovitis
2013-06-03 7:35 ` Tang Chen [this message]
2013-06-03 13:18 ` Vasilis Liaskovitis
2013-06-06 9:42 ` Tang Chen
2013-05-24 9:29 ` [PATCH v3 08/13] x86, numa: Move memory_add_physaddr_to_nid() to CONFIG_NUMA Tang Chen
2013-05-24 9:29 ` [PATCH v3 09/13] x86, numa, memblock: Introduce MEMBLK_LOCAL_NODE to mark and reserve node-life-cycle data Tang Chen
2013-05-24 9:29 ` [PATCH v3 10/13] x86, acpi, numa, mem-hotplug: Introduce MEMBLK_HOTPLUGGABLE to mark and reserve hotpluggable memory Tang Chen
2013-05-31 16:15 ` Vasilis Liaskovitis
2013-05-24 9:29 ` [PATCH v3 11/13] x86, memblock, mem-hotplug: Free hotpluggable memory reserved by memblock Tang Chen
2013-06-03 2:57 ` Wanpeng Li
2013-06-03 2:57 ` Wanpeng Li
2013-05-24 9:29 ` [PATCH v3 12/13] x86, numa, acpi, memory-hotplug: Make movablecore=acpi have higher priority Tang Chen
2013-06-03 2:59 ` Wanpeng Li
2013-06-03 7:37 ` Tang Chen
2013-06-03 2:59 ` Wanpeng Li
2013-05-24 9:29 ` [PATCH v3 13/13] doc, page_alloc, acpi, mem-hotplug: Add doc for movablecore=acpi boot option Tang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51AC4759.6090101@cn.fujitsu.com \
--to=tangchen@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=gong.chen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=jiang.liu@huawei.com \
--cc=jweiner@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=mingo@redhat.com \
--cc=prarit@redhat.com \
--cc=riel@redhat.com \
--cc=tj@kernel.org \
--cc=vasilis.liaskovitis@profitbricks.com \
--cc=wency@cn.fujitsu.com \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox