linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Feng Tang <feng.tang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	 Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [RFC Patch 1/3] mm/slub: increase the maximum slab order to 4 for big systems
Date: Tue, 12 Sep 2023 13:52:19 +0900	[thread overview]
Message-ID: <CAB=+i9SP2j=VEDi52ga0WgPWSeDzdmTYisA4PAnR26Natp3swA@mail.gmail.com> (raw)
In-Reply-To: <20230905141348.32946-2-feng.tang@intel.com>

On Tue, Sep 5, 2023 at 11:07 PM Feng Tang <feng.tang@intel.com> wrote:
>
> There are reports about severe lock contention for slub's per-node
> 'list_lock' in 'hackbench' test, [1][2], on server systems. And
> similar contention is also seen when running 'mmap1' case of
> will-it-scale on big systems. As the trend is one processor (socket)
> will have more and more CPUs (100+, 200+), the contention could be
> much more severe and becomes a scalability issue.
>
> One way to help reducing the contention is to increase the maximum
> slab order from 3 to 4, for big systems.

Hello Feng,

Increasing order with a higher number of CPUs (and so with more
memory) makes sense to me.
IIUC the contention here becomes worse when the number of slabs
increases, so it makes sense to
decrease the number of slabs by increasing order.

By the way, my silly question here is:
In the first place, is it worth taking 1/2 of s->cpu_partial_slabs in
the slowpath when slab is frequently used?
wouldn't the cpu partial slab list be re-filled again by free if free
operations are frequently performed?

> Unconditionally increasing the order could  bring trouble to client
> devices with very limited size of memory, which may care more about
> memory footprint, also allocating order 4 page could be harder under
> memory pressure. So the increase will only be done for big systems
> like servers, which usually are equipped with plenty of memory and
> easier to hit lock contention issues.

Also, does it make sense not to increase the order when PAGE_SIZE > 4096?

> Following is some performance data:
>
> will-it-scale/mmap1
> -------------------
> Run will-it-scale benchmark's 'mmap1' test case on a 2 socket Sapphire
> Rapids server (112 cores / 224 threads) with 256 GB DRAM, run 3
> configurations with parallel test threads of 25%, 50% and 100% of
> number of CPUs, and the data is (base is vanilla v6.5 kernel):
>
>                      base                      base+patch
> wis-mmap1-25%       223670           +33.3%     298205        per_process_ops
> wis-mmap1-50%       186020           +51.8%     282383        per_process_ops
> wis-mmap1-100%       89200           +65.0%     147139        per_process_ops
>
> Take the perf-profile comparasion of 50% test case, the lock contention
> is greatly reduced:
>
>       43.80           -30.8       13.04       pp.self.native_queued_spin_lock_slowpath
>       0.85            -0.2        0.65        pp.self.___slab_alloc
>       0.41            -0.1        0.27        pp.self.__unfreeze_partials
>       0.20 ±  2%      -0.1        0.12 ±  4%  pp.self.get_any_partial
>
> hackbench
> ---------
>
> Run same hackbench testcase  mentioned in [1], use same HW/SW as will-it-scale:
>
>                      base                      base+patch
> hackbench           759951           +10.5%     839601        hackbench.throughput
>
> perf-profile diff:
>      22.20 ±  3%     -15.2        7.05        pp.self.native_queued_spin_lock_slowpath
>       0.82            -0.2        0.59        pp.self.___slab_alloc
>       0.33            -0.2        0.13        pp.self.__unfreeze_partials
>
> [1]. https://lore.kernel.org/all/202307172140.3b34825a-oliver.sang@intel.com/
> [2]. ttps://lore.kernel.org/lkml/ZORaUsd+So+tnyMV@chenyu5-mobl2/
> Signed-off-by: Feng Tang <feng.tang@intel.com>

> ---
>  mm/slub.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index f7940048138c..09ae1ed642b7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4081,7 +4081,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk);
>   */
>  static unsigned int slub_min_order;
>  static unsigned int slub_max_order =
> -       IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER;
> +       IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 4;
>  static unsigned int slub_min_objects;
>
>  /*
> @@ -4134,6 +4134,26 @@ static inline unsigned int calc_slab_order(unsigned int size,
>         return order;
>  }


  reply	other threads:[~2023-09-12 13:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-05 14:13 [RFC Patch 0/3] mm/slub: reduce contention for per-node list_lock for large systems Feng Tang
2023-09-05 14:13 ` [RFC Patch 1/3] mm/slub: increase the maximum slab order to 4 for big systems Feng Tang
2023-09-12  4:52   ` Hyeonggon Yoo [this message]
2023-09-12 15:52     ` Feng Tang
2023-09-05 14:13 ` [RFC Patch 2/3] mm/slub: double per-cpu partial number for large systems Feng Tang
2023-09-05 14:13 ` [RFC Patch 3/3] mm/slub: setup maxim per-node partial according to cpu numbers Feng Tang
2023-09-12  4:48   ` Hyeonggon Yoo
2023-09-14  7:05     ` Feng Tang
2023-09-15  2:40       ` Lameter, Christopher
2023-09-15  5:05         ` Feng Tang
2023-09-15 16:13           ` Lameter, Christopher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAB=+i9SP2j=VEDi52ga0WgPWSeDzdmTYisA4PAnR26Natp3swA@mail.gmail.com' \
    --to=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=feng.tang@intel.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox