From: Nish Aravamudan <nish.aravamudan@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
David Rientjes <rientjes@google.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Tony Luck <tony.luck@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>,
linux-ia64@vger.kernel.org,
Linux Memory Management List <linux-mm@kvack.org>,
linuxppc-dev@lists.ozlabs.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 0/2] Memoryless nodes and kworker
Date: Fri, 18 Jul 2014 10:42:29 -0700 [thread overview]
Message-ID: <CAOhV88PyBK3WxDjG1H0hUbRhRYzPOzV8eim5DuOcgObe-FtFYg@mail.gmail.com> (raw)
In-Reply-To: <20140718112039.GA8383@htj.dyndns.org>
[-- Attachment #1: Type: text/plain, Size: 2552 bytes --]
Hi Tejun,
On Fri, Jul 18, 2014 at 4:20 AM, Tejun Heo <tj@kernel.org> wrote:
>
> On Thu, Jul 17, 2014 at 04:09:23PM -0700, Nishanth Aravamudan wrote:
> > [Apologies for the large Cc list, but I believe we have the following
> > interested parties:
> >
> > x86 (recently posted memoryless node support)
> > ia64 (existing memoryless node support)
> > ppc (existing memoryless node support)
> > previous discussion of how to solve Anton's issue with slab usage
> > workqueue contributors/maintainers]
>
> Well, you forgot to cc me.
Ah I'm very sorry! That's what I get for editing e-mails... Thank you for
your reply!
> ...
> > It turns out we see this large slab usage due to using the wrong NUMA
> > information when creating kthreads.
> >
> > Two changes are required, one of which is in the workqueue code and one
> > of which is in the powerpc initialization. Note that ia64 may want to
> > consider something similar.
>
> Wasn't there a thread on this exact subject a few weeks ago? Was that
> someone else? Memory-less node detail leaking out of allocator proper
> isn't a good idea. Please allow allocator users to specify the nodes
> they're on and let the allocator layer deal with mapping that to
> whatever is appropriate. Please don't push that to everybody.
I didn't send anything for the workqueue logic anytime recently. Jiang sent
out a patchset for x86 memoryless node support, which may have touched
kernel/workqueue.c.
So, to be clear, this is not *necessarily* about memoryless nodes. It's
about the semantics intended. The workqueue code currently calls
cpu_to_node() in a few places, and passes that node into the core MM as a
hint about where the memory should come from. However, when memoryless
nodes are present, that hint is guaranteed to be wrong, as it's the nearest
NUMA node to the CPU (which happens to be the one its on), not the nearest
NUMA node with memory. The hint is correctly specified as cpu_to_mem(),
which does the right thing in the presence or absence of memoryless nodes.
And I think encapsulates the hint's semantics correctly -- please give me
memory from where I expect it, which is the closest NUMA node.
I guess we could also change tsk_fork_get_node to return
local_memory_node(tsk->pref_node_fork), but that can be a bit expensive, as
it generates a new zonelist each time to determine the first fallback node.
We get the exact same semantics (because cpu_to_mem() caches the result of
local_memory_node) by using cpu_to_mem directly.
Again, apologies for not Cc'ing you originally.
-Nish
[-- Attachment #2: Type: text/html, Size: 2947 bytes --]
next prev parent reply other threads:[~2014-07-18 17:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-17 23:09 Nishanth Aravamudan
2014-07-17 23:09 ` [RFC 1/2] workqueue: use the nearest NUMA node, not the local one Nishanth Aravamudan
2014-07-17 23:15 ` [RFC 2/2] powerpc: reorder per-cpu NUMA information's initialization Nishanth Aravamudan
2014-07-18 8:11 ` [RFC 1/2] workqueue: use the nearest NUMA node, not the local one Lai Jiangshan
2014-07-18 17:33 ` Nish Aravamudan
2014-07-18 11:20 ` [RFC 0/2] Memoryless nodes and kworker Tejun Heo
2014-07-18 17:42 ` Nish Aravamudan [this message]
2014-07-18 18:00 ` Tejun Heo
2014-07-18 18:01 ` Tejun Heo
2014-07-18 18:12 ` Nish Aravamudan
2014-07-18 18:19 ` Tejun Heo
2014-07-18 18:47 ` Nish Aravamudan
2014-07-18 18:58 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOhV88PyBK3WxDjG1H0hUbRhRYzPOzV8eim5DuOcgObe-FtFYg@mail.gmail.com \
--to=nish.aravamudan@gmail.com \
--cc=benh@kernel.crashing.org \
--cc=fenghua.yu@intel.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=nacc@linux.vnet.ibm.com \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox