Re: [PATCH 0/6] make memblock allocator utilize the node's fallback info

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Pingfan Liu <kernelfans@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: x86@kernel.org, linux-mm@kvack.org,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	 Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Mel Gorman <mgorman@suse.de>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Andy Lutomirski <luto@kernel.org>,
	Andi Kleen <ak@linux.intel.com>, Petr Tesarik <ptesarik@suse.cz>,
	 Stephen Rothwell <sfr@canb.auug.org.au>,
	Jonathan Corbet <corbet@lwn.net>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Daniel Vacek <neelx@redhat.com>,
	 LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/6] make memblock allocator utilize the node's fallback info
Date: Tue, 5 Mar 2019 20:37:53 +0800	[thread overview]
Message-ID: <CAFgQCTs5uW9baypGbW5z=KyC7Vd9-QjTSKLFAJC5c2Jd6_ow_Q@mail.gmail.com> (raw)
In-Reply-To: <20190226120919.GY10588@dhcp22.suse.cz>

On Tue, Feb 26, 2019 at 8:09 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 26-02-19 13:47:37, Pingfan Liu wrote:
> > On Tue, Feb 26, 2019 at 12:04 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sun 24-02-19 20:34:03, Pingfan Liu wrote:
> > > > There are NUMA machines with memory-less node. At present page allocator builds the
> > > > full fallback info by build_zonelists(). But memblock allocator does not utilize
> > > > this info. And for memory-less node, memblock allocator just falls back "node 0",
> > > > without utilizing the nearest node. Unfortunately, the percpu section is allocated
> > > > by memblock, which is accessed frequently after bootup.
> > > >
> > > > This series aims to improve the performance of per cpu section on memory-less node
> > > > by feeding node's fallback info to memblock allocator on x86, like we do for page
> > > > allocator. On other archs, it requires independent effort to setup node to cpumask
> > > > map ahead.
> > >
> > > Do you have any numbers to tell us how much does this improve the
> > > situation?
> >
> > Not yet. At present just based on the fact that we prefer to allocate
> > per cpu area on local node.
>
> Yes, we _usually_ do. But the additional complexity should be worth it.
> And if we find out that the final improvement is not all that great and
> considering that memory-less setups are crippled anyway then it might
> turn out we just do not care all that much.
> --
I had finished some test on a "Dell Inc. PowerEdge R7425/02MJ3T"
machine, which owns 8 numa node. and the topology is:
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            4096K
NUMA node0 CPU(s):   0,8,16,24
NUMA node1 CPU(s):   2,10,18,26
NUMA node2 CPU(s):   4,12,20,28
NUMA node3 CPU(s):   6,14,22,30
NUMA node4 CPU(s):   1,9,17,25
NUMA node5 CPU(s):   3,11,19,27
NUMA node6 CPU(s):   5,13,21,29
NUMA node7 CPU(s):   7,15,23,31

Here is the basic info about the NUMA machine. cpu 0 and 16 share the
same L3 cache. Only node 1 and 5 own memory. Using local node as
baseline, the memory write performance suffer 25% drop to nearest node
(i.e. writing data from node 0 to 1), and 78% drop to farthest node
(i.e. writing from 0 to 5).

I used a user space test case to get the performance difference
between the nearest node and the farthest. The case pins two tasks on
cpu 0 and 16. The case used two memory chunks, A which emulates a
small footprint of per cpu section, and B which emulates a large
footprint. Chunk B is always allocated on nearest node, while chunk A
switch between nearest node and the farthest to render comparable
result. To emulate around 2.5% access to per cpu area, the case
composes two groups of writing, 1 time to memory chunk A, then 40
times to chunk B.

On the nearest node, I used 4MB foot print, which is the same size as
L3 cache. And varying foot print from 2K -> 4K ->8K to emulate the
access to the per cpu section. For 2K and 4K, perf result can not tell
the difference exactly, due to the difference is smaller than the
variance. For 8K: 1.8% improvement, then the larger footprint, the
higher improvement in performance. But 8K means that a module
allocates 4K/per cpu in the section. This is not in practice.

So the changes may be not need.

Regards,
Pingfan

     prev parent reply	other threads:[~2019-03-05 12:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-24 12:34 Pingfan Liu
2019-02-24 12:34 ` [PATCH 1/6] mm/numa: extract the code of building node fall back list Pingfan Liu
2019-02-24 12:34 ` [PATCH 2/6] mm/memblock: make full utilization of numa info Pingfan Liu
2019-02-25  7:07   ` kbuild test robot
2019-02-25  7:59   ` kbuild test robot
2019-02-25 15:34   ` Dave Hansen
2019-02-26  5:40     ` Pingfan Liu
2019-02-26 12:37       ` Dave Hansen
2019-02-26 11:58   ` Mike Rapoport
2019-02-27  9:23     ` Pingfan Liu
2019-02-24 12:34 ` [PATCH 3/6] x86/numa: define numa_init_array() conditional on CONFIG_NUMA Pingfan Liu
2019-02-25 15:23   ` Dave Hansen
2019-02-26  5:40     ` Pingfan Liu
2019-02-24 12:34 ` [PATCH 4/6] x86/numa: concentrate the code of setting cpu to node map Pingfan Liu
2019-02-25 15:25   ` Dave Hansen
2019-02-24 12:34 ` [PATCH 5/6] x86/numa: push forward the setup of node to cpumask map Pingfan Liu
2019-02-25 15:30   ` Dave Hansen
2019-02-26  5:40     ` Pingfan Liu
2019-02-24 12:34 ` [PATCH 6/6] x86/numa: build node fallback info after setting up " Pingfan Liu
2019-02-25 16:03 ` [PATCH 0/6] make memblock allocator utilize the node's fallback info Michal Hocko
2019-02-26  5:47   ` Pingfan Liu
2019-02-26 12:09     ` Michal Hocko
2019-03-05 12:37       ` Pingfan Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFgQCTs5uW9baypGbW5z=KyC7Vd9-QjTSKLFAJC5c2Jd6_ow_Q@mail.gmail.com' \
    --to=kernelfans@gmail.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=neelx@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=ptesarik@suse.cz \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox