Re: [RFC 0/2] Memoryless nodes and kworker

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nish Aravamudan <nish.aravamudan@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Jiang Liu <jiang.liu@linux.intel.com>,
	Tony Luck <tony.luck@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	linux-ia64@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>,
	linuxppc-dev@lists.ozlabs.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 0/2] Memoryless nodes and kworker
Date: Fri, 18 Jul 2014 11:12:01 -0700	[thread overview]
Message-ID: <CAOhV88O03zCsv_3eadEKNv1D1RoBmjWRFNhPjEHawF9s71U0JA@mail.gmail.com> (raw)
In-Reply-To: <20140718180008.GC13012@htj.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 3191 bytes --]

Hi Tejun,

[I found the other thread where you made these points, thanks you for
expressing them so clearly again!]

On Fri, Jul 18, 2014 at 11:00 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Fri, Jul 18, 2014 at 10:42:29AM -0700, Nish Aravamudan wrote:
> > So, to be clear, this is not *necessarily* about memoryless nodes. It's
> > about the semantics intended. The workqueue code currently calls
> > cpu_to_node() in a few places, and passes that node into the core MM as
a
> > hint about where the memory should come from. However, when memoryless
> > nodes are present, that hint is guaranteed to be wrong, as it's the
nearest
> > NUMA node to the CPU (which happens to be the one its on), not the
nearest
> > NUMA node with memory. The hint is correctly specified as cpu_to_mem(),
>
> It's telling the allocator the node the CPU is on.  Choosing and
> falling back the actual allocation is the allocator's job.

Ok, I agree with you then, if that's all the semantic is supposed to be.

But looking at the comment for kthread_create_on_node:

 * If thread is going to be bound on a particular cpu, give its node
 * in @node, to get NUMA affinity for kthread stack, or else give -1.

so the API interprets it as a suggestion for the affinity itself, *not* the
node the kthread should be on. Piddly, yes, but actually I have another
thought altogether, and in reviewing Jiang's patches this seems like the
right approach:

why aren't these callers using kthread_create_on_cpu()? That API was
already change to use cpu_to_mem() [so one change, rather than of all over
the kernel source]. We could change it back to cpu_to_node and push down
the knowledge about the fallback.

> > which does the right thing in the presence or absence of memoryless
nodes.
> > And I think encapsulates the hint's semantics correctly -- please give
me
> > memory from where I expect it, which is the closest NUMA node.
>
> I don't think it does.  It loses information at too high a layer.
> Workqueue here doesn't care how memory subsystem is structured, it's
> just telling the allocator where it's at and expecting it to do the
> right thing.  Please consider the following scenario.
>
>         A - B - C - D - E
>
> Let's say C is a memory-less node.  If we map from C to either B or D
> from individual users and that node can't serve that memory request,
> the allocator would fall back to A or E respectively when the right
> thing to do would be falling back to D or B respectively, right?

Yes, this is a good point. But honestly, we're not really even to the point
of talking about fallback here, at least in my testing, going off-node at
all causes SLUB-configured slabs to deactivate, which then leads to an
explosion in the unreclaimable slab.

> This isn't a huge issue but it shows that this is the wrong layer to
> deal with this issue.  Let the allocators express where they are.
> Choosing and falling back belong to the memory allocator.  That's the
> only place which has all the information that's necessary and those
> details must be contained there.  Please don't leak it to memory
> allocator users.

Ok, I will continue to work at that level of abstraction.

Thanks,
Nish

[-- Attachment #2: Type: text/html, Size: 3817 bytes --]

next prev parent reply	other threads:[~2014-07-18 18:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-17 23:09 Nishanth Aravamudan
2014-07-17 23:09 ` [RFC 1/2] workqueue: use the nearest NUMA node, not the local one Nishanth Aravamudan
2014-07-17 23:15   ` [RFC 2/2] powerpc: reorder per-cpu NUMA information's initialization Nishanth Aravamudan
2014-07-18  8:11   ` [RFC 1/2] workqueue: use the nearest NUMA node, not the local one Lai Jiangshan
2014-07-18 17:33     ` Nish Aravamudan
2014-07-18 11:20 ` [RFC 0/2] Memoryless nodes and kworker Tejun Heo
2014-07-18 17:42   ` Nish Aravamudan
2014-07-18 18:00     ` Tejun Heo
2014-07-18 18:01       ` Tejun Heo
2014-07-18 18:12       ` Nish Aravamudan [this message]
2014-07-18 18:19         ` Tejun Heo
2014-07-18 18:47           ` Nish Aravamudan
2014-07-18 18:58             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOhV88O03zCsv_3eadEKNv1D1RoBmjWRFNhPjEHawF9s71U0JA@mail.gmail.com \
    --to=nish.aravamudan@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=fenghua.yu@intel.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jiang.liu@linux.intel.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=nacc@linux.vnet.ibm.com \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox