From: Tang Chen <tangchen@cn.fujitsu.com>
To: "gongzhaogang@inspur.com" <gongzhaogang@inspur.com>,
"tj@kernel.org" <tj@kernel.org>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
"hpa@zytor.com" <hpa@zytor.com>,
tangchen@cn.fujitsu.com,
"yasu.isimatu@gmail.com" <yasu.isimatu@gmail.com>,
"isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>,
"kamezawa.hiroyu@jp.fujitsu.com" <kamezawa.hiroyu@jp.fujitsu.com>,
"izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>,
"qiaonuohan@cn.fujitsu.com" <qiaonuohan@cn.fujitsu.com>,
"x86@kernel.org" <x86@kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.
Date: Tue, 4 Aug 2015 16:53:48 +0800 [thread overview]
Message-ID: <55C07D9C.8070505@cn.fujitsu.com> (raw)
In-Reply-To: <201508041626380745999@inspur.com>
[-- Attachment #1: Type: text/plain, Size: 6281 bytes --]
On 08/04/2015 04:26 PM, gongzhaogang@inspur.com wrote:
> Sorry,I am new.
> >But,
> >1) in cpu_up(), it will try to online a node, and it doesn't check if
> >the node has memory.
> >2) in try_offline_node(), it offlines CPUs first, and then the memory.
> >This behavior looks a little wired, or let's say it is ambiguous. It
> >seems that a NUMA node
> >consists of CPUs and memory. So if the CPUs are online, the node should
> >be online.
> I suggested you to try the patch offered by Liu Jiang.
>
> https://lkml.org/lkml/2014/9/11/1087
Well, I think Liu Jiang meant this patch set. :)
https://lkml.org/lkml/2014/7/11/75
>
> I have tried ,It is OK.
>
> >Unfortunately, since I don't have a machine a with memory-less node, I
> >cannot reproduce
> >the problem right now.
>
> If not hurried , I can test your patches in our environment on weekends.
Thanks. But this version of my patch set is obviously problematic.
It will be very nice of you if you can help to test the next version.
But maybe in a few days.
Thanks. :)
>
> ------------------------------------------------------------------------
> gongzhaogang@inspur.com
>
> *From:* Tang Chen <mailto:tangchen@cn.fujitsu.com>
> *Date:* 2015-08-04 11:36
> *To:* Tejun Heo <mailto:tj@kernel.org>
> *CC:* mingo@redhat.com <mailto:mingo@redhat.com>;
> akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>;
> rjw@rjwysocki.net <mailto:rjw@rjwysocki.net>; hpa@zytor.com
> <mailto:hpa@zytor.com>; laijs@cn.fujitsu.com
> <mailto:laijs@cn.fujitsu.com>; yasu.isimatu@gmail.com
> <mailto:yasu.isimatu@gmail.com>; isimatu.yasuaki@jp.fujitsu.com
> <mailto:isimatu.yasuaki@jp.fujitsu.com>;
> kamezawa.hiroyu@jp.fujitsu.com
> <mailto:kamezawa.hiroyu@jp.fujitsu.com>; izumi.taku@jp.fujitsu.com
> <mailto:izumi.taku@jp.fujitsu.com>; gongzhaogang@inspur.com
> <mailto:gongzhaogang@inspur.com>; qiaonuohan@cn.fujitsu.com
> <mailto:qiaonuohan@cn.fujitsu.com>; x86@kernel.org
> <mailto:x86@kernel.org>; linux-acpi@vger.kernel.org
> <mailto:linux-acpi@vger.kernel.org>; linux-kernel@vger.kernel.org
> <mailto:linux-kernel@vger.kernel.org>; linux-mm@kvack.org
> <mailto:linux-mm@kvack.org>; tangchen@cn.fujitsu.com
> <mailto:tangchen@cn.fujitsu.com>
> *Subject:* Re: [PATCH 1/5] x86, gfp: Cache best near node for
> memory allocation.
> Hi TJ,
> Sorry for the late reply.
> On 07/16/2015 05:48 AM, Tejun Heo wrote:
> > ......
> > so in initialization pharse makes no sense any more. The best
> near online
> > node for each cpu should be cached somewhere.
> > I'm not really following. Is this because the now offline node can
> > later come online and we'd have to break the constant mapping
> > invariant if we update the mapping later? If so, it'd be nice to
> > spell that out.
> Yes. Will document this in the next version.
> >> ......
> >>
> >> +int get_near_online_node(int node)
> >> +{
> >> + return per_cpu(x86_cpu_to_near_online_node,
> >> + cpumask_first(&node_to_cpuid_mask_map[node]));
> >> +}
> >> +EXPORT_SYMBOL(get_near_online_node);
> > Umm... this function is sitting on a fairly hot path and scanning a
> > cpumask each time. Why not just build a numa node -> numa node
> array?
> Indeed. Will avoid to scan a cpumask.
> > ......
> >
> >>
> >> static inline struct page *alloc_pages_exact_node(int nid,
> gfp_t gfp_mask,
> >> unsigned int order)
> >> {
> >> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> >> + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> >> +
> >> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
> >> + if (!node_online(nid))
> >> + nid = get_near_online_node(nid);
> >> +#endif
> >>
> >> return __alloc_pages(gfp_mask, order, node_zonelist(nid,
> gfp_mask));
> >> }
> > Ditto. Also, what's the synchronization rules for NUMA node
> > on/offlining. If you end up updating the mapping later, how would
> > that be synchronized against the above usages?
> I think the near online node map should be updated when node
> online/offline
> happens. But about this, I think the current numa code has a
> little problem.
> As you know, firmware info binds a set of CPUs and memory to a
> node. But
> at boot time, if the node has no memory (a memory-less node) , it
> won't
> be online.
> But the CPUs on that node is available, and bound to the near
> online node.
> (Here, I mean numa_set_node(cpu, node).)
> Why does the kernel do this ? I think it is used to ensure that we
> can
> allocate memory
> successfully by calling functions like alloc_pages_node() and
> alloc_pages_exact_node().
> By these two fuctions, any CPU should be bound to a node who has
> memory
> so that
> memory allocation can be successful.
> That means, for a memory-less node at boot time, CPUs on the node is
> online,
> but the node is not online.
> That also means, "the node is online" equals to "the node has
> memory".
> Actually, there
> are a lot of code in the kernel is using this rule.
> But,
> 1) in cpu_up(), it will try to online a node, and it doesn't check if
> the node has memory.
> 2) in try_offline_node(), it offlines CPUs first, and then the memory.
> This behavior looks a little wired, or let's say it is ambiguous. It
> seems that a NUMA node
> consists of CPUs and memory. So if the CPUs are online, the node
> should
> be online.
> And also,
> The main purpose of this patch-set is to make the cpuid <-> nodeid
> mapping persistent.
> After this patch-set, alloc_pages_node() and alloc_pages_exact_node()
> won't depend on
> cpuid <-> nodeid mapping any more. So the node should be online if
> the
> CPUs on it are
> online. Otherwise, we cannot setup interfaces of CPUs under /sys.
> Unfortunately, since I don't have a machine a with memory-less
> node, I
> cannot reproduce
> the problem right now.
> How do you think the node online behavior should be changed ?
> Thanks.
>
[-- Attachment #2: Type: text/html, Size: 13473 bytes --]
next prev parent reply other threads:[~2015-08-04 8:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-07 9:30 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-07 9:30 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
2015-07-15 21:48 ` Tejun Heo
2015-08-04 3:36 ` Tang Chen
2015-08-04 8:05 ` Jiang Liu
2015-08-04 8:24 ` Tang Chen
2015-08-09 6:15 ` Tang Chen
2015-08-12 1:53 ` Jiang Liu
2015-08-04 8:26 ` gongzhaogang
2015-08-04 8:53 ` Tang Chen [this message]
2015-08-04 8:58 ` Tang Chen
2015-07-07 9:30 ` [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time Tang Chen
2015-07-07 9:30 ` [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping Tang Chen
2015-07-07 11:14 ` Mika Penttilä
2015-07-15 3:33 ` Tang Chen
2015-07-15 5:35 ` Jiang Liu
2015-07-15 6:26 ` Tang Chen
2015-07-15 22:02 ` Tejun Heo
2015-07-07 9:30 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
2015-07-15 22:06 ` Tejun Heo
2015-07-07 9:30 ` [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Tang Chen
2015-07-15 22:13 ` [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tejun Heo
2015-07-23 4:44 ` Tang Chen
2015-07-23 18:32 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55C07D9C.8070505@cn.fujitsu.com \
--to=tangchen@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=gongzhaogang@inspur.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=qiaonuohan@cn.fujitsu.com \
--cc=rjw@rjwysocki.net \
--cc=tj@kernel.org \
--cc=x86@kernel.org \
--cc=yasu.isimatu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox