From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 97DACCA0EDB
	for <linux-mm@archiver.kernel.org>; Tue, 12 Sep 2023 13:52:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 2AAC86B00F7; Tue, 12 Sep 2023 09:52:34 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 25C296B00F8; Tue, 12 Sep 2023 09:52:34 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 12CFE6B00F9; Tue, 12 Sep 2023 09:52:34 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 04F796B00F7
	for <linux-mm@kvack.org>; Tue, 12 Sep 2023 09:52:34 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id D09B680BD8
	for <linux-mm@kvack.org>; Tue, 12 Sep 2023 13:52:33 +0000 (UTC)
X-FDA: 81228085386.16.D639FC9
Received: from mail-ua1-f49.google.com (mail-ua1-f49.google.com [209.85.222.49])
	by imf23.hostedemail.com (Postfix) with ESMTP id 19787140017
	for <linux-mm@kvack.org>; Tue, 12 Sep 2023 13:52:31 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20221208 header.b=SvUDaakm;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf23.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.222.49 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1694526752;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=xzhOpAWh1roWufQ3LpAtbJqxibdBhs/iHC4UgrfWeT4=;
	b=Xb0XcirrlcHzXytq/YvOZdSXfQRtf301uz3ucrs8/B1YTFs0QuMy8C0UBytvoG417vNU8U
	EU4uMHW7it69Iq5FMjPEoPSLaiHadjUaxLfbLmS6SK0593wpE4EwGvNFEACRIImehcYF0G
	rakJbay0i96vE8TdkvgQOL0Bg3SQH4c=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20221208 header.b=SvUDaakm;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf23.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.222.49 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694526752; a=rsa-sha256;
	cv=none;
	b=cFRzpuDp9RqhCZMr0uR8/gvou1yZgHAu9+hKT+eDVns9GV+J5fwdzPsvRUvZlFRaW0m8Zj
	tPCWHUXkIBlsVXgMRrB6UIS4fcn6GAJEhptnC5cgUs2wmjTlhHUYYh7Cg4J8f9fnQImY2g
	hcvbNPArPWqiECZdFx8DQIX5fbSYbCE=
Received: by mail-ua1-f49.google.com with SMTP id a1e0cc1a2514c-7a0254de2fdso1912386241.1
        for <linux-mm@kvack.org>; Tue, 12 Sep 2023 06:52:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1694526751; x=1695131551; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=xzhOpAWh1roWufQ3LpAtbJqxibdBhs/iHC4UgrfWeT4=;
        b=SvUDaakmnlKJDdWI8FWsfEWOydC9vtmxzvkr2eSvPyin1sTvDGvLLJTEIT0uZ9KoWL
         BoVJT8+kQ08B4xsx3z5dcyYAOrJYHOvMU5qdS0yda7/EcNYTZPQoQPAvsrCl5HXw+XPH
         CbdNSqywsw2KxR0FmBrGY4jZJH1WY3aYsVUVlRIV4FJwRkY/FjBbu8kpRoGg+7wPioU2
         3VsW8YUHw/iBcxxQEXZYH6pgFQJAg6lEJYuEBiqC1XG8GxMPihGO8qtir1HRFkOXHTfJ
         +7auy1l4l3kLy7MLioe73jOIVO5TOK1TtyEhAV+gm6tn42001FgBT7Ffi3FFTlPAtsv0
         R1wA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1694526751; x=1695131551;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=xzhOpAWh1roWufQ3LpAtbJqxibdBhs/iHC4UgrfWeT4=;
        b=V+VXFh2zmfwvVea5I+aphGoDdtqe/Kzb2Fhh8Q6ZGma5aqgj9nbCwMmpk1XBAaWtHc
         +AuknpWrfoDPScFiCO7DFVIGAoV5PONSL0mjo4yAbKXhf2u/c4xeq10hJFUmkdnpoGdm
         X5ZBdqeMpWV6ydy9prfU5J18FqoeSjVTrtq9OT6DCxy3CpTBJ/5VdGkhndbbb6maSoY+
         47G4XKW4wFnK6skFJ65GGdLQwWO07Gq8gjdM5MLoHomrblR+Q/HnadiNoir6tEvmub/Y
         ty+B5RqCHm1szLwNgIJl2ScHQi6dYL7CU7Vb+I/hgR5k1h9GwiLoG6xDIIHOrRcFOXA0
         Sqxw==
X-Gm-Message-State: AOJu0YxROKVMfNKx+El1gX+jHLAVKVn8IBjoBl7wIaOlfFzkuEamQ8CP
	bj3tBuqUI5EZBUVW5RZILp7JSVvwaWQcbEXn36Q=
X-Google-Smtp-Source: AGHT+IHxd3rp8UROjoeMiC42smp2+6ErUpiQMfAvfKvtLedzaIU2SlbFQE9r/OH0JyAhv6hTEcAOymkLAHmS7ib5jjQ=
X-Received: by 2002:a67:fd19:0:b0:44d:5a92:ec40 with SMTP id
 f25-20020a67fd19000000b0044d5a92ec40mr10615940vsr.24.1694526750968; Tue, 12
 Sep 2023 06:52:30 -0700 (PDT)
MIME-Version: 1.0
References: <20230905141348.32946-1-feng.tang@intel.com> <20230905141348.32946-2-feng.tang@intel.com>
In-Reply-To: <20230905141348.32946-2-feng.tang@intel.com>
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date: Tue, 12 Sep 2023 13:52:19 +0900
Message-ID: <CAB=+i9SP2j=VEDi52ga0WgPWSeDzdmTYisA4PAnR26Natp3swA@mail.gmail.com>
Subject: Re: [RFC Patch 1/3] mm/slub: increase the maximum slab order to 4 for
 big systems
To: Feng Tang <feng.tang@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>, Andrew Morton <akpm@linux-foundation.org>, 
	Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, 
	Joonsoo Kim <iamjoonsoo.kim@lge.com>, Roman Gushchin <roman.gushchin@linux.dev>, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 19787140017
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Stat-Signature: sfs8d4kr31knzu43a3pr1sihuon53ioy
X-HE-Tag: 1694526751-46918
X-HE-Meta: U2FsdGVkX19Ysb9HYU2VuBAisuRZB2KIJj+GE0MXHQCwdnYjTlOlutCyL9DnGH8wMI3K3OG5a0peDw4CsJFqkaDNnqZxEKC7U4GodxFPBGLlaEYqAXiYSMybqQFgrKyUoDrxV/qpOOlcVx2PYcsHpQpU9yv6bbX3mi9hTzQXUj4Wt7PX31wNvCBV+KN6mnzEd9CW+1i0l+70GSJtrx+C3ILbiReFwy/y3Cid5dDqd2F6YnVIS9+X9Qla2h9Me5Q8gGyR+H3hXM935JKmEpmenlB9zu5Mc7HS/9OmyKENx2a71Bl8hxy2j2v7e50BBGVMBnBS7u78nUZh4ClyCunDzgogCRq5pOUVZIVaEe6jRB8ePQLw4DT+HU65TOeV6+VAHvxhv/OR6zzcbPJc9GoySb1DEJEtWf66xAemk1KeIwT/a486b0p0U7ctjdWAIZ/H4oX08yN2pK2pLqJCopariLiMRMBLZbX8zwyiTq8Hf29Cz+bB35Yeq9+k6f4IPrr82MCf+GbrVoWohFI9Ry4uWgkSZkgsGaO2f0KReZLzbuUrwuGfdcccP9W/YEiTvfMWsedg32GRwUwLe/3VzpanMdLhDQY7JjxIQ5X1B1MJik5sGGEam7dBeECEsDPAOl3TfsW8auYclQEw0jjDFNGJOJWdDzQNbaSlJme2lkFKdP4BtGz217SXwl9yvYNfg1mr5raD/nGWepm8xQPl1K7rv2yUdRSFiqgsea9RFiBdi5f9lkx8PKgKKHn3omM577aLMJx1qJnVtRdzSzWyFQ0lo6K8ciKNgNTP5wfDhxOGNowNYT1X/IdKINTURIqGA9LDCrALOtPCTTaNwUKjOQ+kXYM5kNwaW8GimWLEyDA8HUXFCHSNNmlJ8wrsEhC38zYAVWcVKSg2Kv2hK8ipU1zCT6rfkZwH7f0dNXnXkThi4WTh+gRWLXN1tDVUAG8eFmAM4BqsUq81jBwnkww43TI
 fdbWxOgr
 a4+GtNLaN1pq1T2vPjVGfBR7xaukIFpPkgZa29BJ/P9D1x5nLHX7JaX4qBh9KlWb3hbUNgOFHSNV8AbyCZ1zo90m0RECxkS/YEaXmUmZxOii4cUoDWVdNvE55eTfmnQi/C3lZ2RVe6IYzt5/7QxBOnpYiLoLMbsfH48Z0CU7jPe5asTwy/M+pIOou55NtPoA8qQ6vmfr9BAT1Ha+otcm7sM8sHfmKBxifOX3JZaZ0K/L3J7LNZdrleoqOSMi1tcKqU9QZPbXACu6GgFQ5VZDReBmZR1dMzGbNM+GHPYj6G2KTOXionegKoDJnOPmeDbYeDg7o+PmFeDC8kFw3wtoUVDehO+64xuZxJlwLzRQlHx1EfXPvXtCL4JLqqBLbSXMBdHacYqO1Fgb7y794QEA7CM28m+clMWijHyq/4muxrGJJI9JKOPtu5qNZl116cpWq1agFBYKXqVgJ486EbEmaxBcsTWbLeX3CQmRhEDmIylWaxFLsxQ7Nvum5AohUHp302n0Yo4O2iojEZ670+9ih9YeuTw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Sep 5, 2023 at 11:07=E2=80=AFPM Feng Tang <feng.tang@intel.com> wro=
te:
>
> There are reports about severe lock contention for slub's per-node
> 'list_lock' in 'hackbench' test, [1][2], on server systems. And
> similar contention is also seen when running 'mmap1' case of
> will-it-scale on big systems. As the trend is one processor (socket)
> will have more and more CPUs (100+, 200+), the contention could be
> much more severe and becomes a scalability issue.
>
> One way to help reducing the contention is to increase the maximum
> slab order from 3 to 4, for big systems.

Hello Feng,

Increasing order with a higher number of CPUs (and so with more
memory) makes sense to me.
IIUC the contention here becomes worse when the number of slabs
increases, so it makes sense to
decrease the number of slabs by increasing order.

By the way, my silly question here is:
In the first place, is it worth taking 1/2 of s->cpu_partial_slabs in
the slowpath when slab is frequently used?
wouldn't the cpu partial slab list be re-filled again by free if free
operations are frequently performed?

> Unconditionally increasing the order could  bring trouble to client
> devices with very limited size of memory, which may care more about
> memory footprint, also allocating order 4 page could be harder under
> memory pressure. So the increase will only be done for big systems
> like servers, which usually are equipped with plenty of memory and
> easier to hit lock contention issues.

Also, does it make sense not to increase the order when PAGE_SIZE > 4096?

> Following is some performance data:
>
> will-it-scale/mmap1
> -------------------
> Run will-it-scale benchmark's 'mmap1' test case on a 2 socket Sapphire
> Rapids server (112 cores / 224 threads) with 256 GB DRAM, run 3
> configurations with parallel test threads of 25%, 50% and 100% of
> number of CPUs, and the data is (base is vanilla v6.5 kernel):
>
>                      base                      base+patch
> wis-mmap1-25%       223670           +33.3%     298205        per_process=
_ops
> wis-mmap1-50%       186020           +51.8%     282383        per_process=
_ops
> wis-mmap1-100%       89200           +65.0%     147139        per_process=
_ops
>
> Take the perf-profile comparasion of 50% test case, the lock contention
> is greatly reduced:
>
>       43.80           -30.8       13.04       pp.self.native_queued_spin_=
lock_slowpath
>       0.85            -0.2        0.65        pp.self.___slab_alloc
>       0.41            -0.1        0.27        pp.self.__unfreeze_partials
>       0.20 =C2=B1  2%      -0.1        0.12 =C2=B1  4%  pp.self.get_any_p=
artial
>
> hackbench
> ---------
>
> Run same hackbench testcase  mentioned in [1], use same HW/SW as will-it-=
scale:
>
>                      base                      base+patch
> hackbench           759951           +10.5%     839601        hackbench.t=
hroughput
>
> perf-profile diff:
>      22.20 =C2=B1  3%     -15.2        7.05        pp.self.native_queued_=
spin_lock_slowpath
>       0.82            -0.2        0.59        pp.self.___slab_alloc
>       0.33            -0.2        0.13        pp.self.__unfreeze_partials
>
> [1]. https://lore.kernel.org/all/202307172140.3b34825a-oliver.sang@intel.=
com/
> [2]. ttps://lore.kernel.org/lkml/ZORaUsd+So+tnyMV@chenyu5-mobl2/
> Signed-off-by: Feng Tang <feng.tang@intel.com>

> ---
>  mm/slub.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index f7940048138c..09ae1ed642b7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4081,7 +4081,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk);
>   */
>  static unsigned int slub_min_order;
>  static unsigned int slub_max_order =3D
> -       IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER;
> +       IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 4;
>  static unsigned int slub_min_objects;
>
>  /*
> @@ -4134,6 +4134,26 @@ static inline unsigned int calc_slab_order(unsigne=
d int size,
>         return order;
>  }