From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=+gpB=PA=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 69EADC433FE
	for <linux-mm@archiver.kernel.org>; Tue, 12 Oct 2021 13:46:59 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id DE9CA60555
	for <linux-mm@archiver.kernel.org>; Tue, 12 Oct 2021 13:46:58 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DE9CA60555
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org
Received: by kanga.kvack.org (Postfix)
	id 5DDDE6B006C; Tue, 12 Oct 2021 09:46:57 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 52F79940007; Tue, 12 Oct 2021 09:46:57 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 355336B0073; Tue, 12 Oct 2021 09:46:57 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40])
	by kanga.kvack.org (Postfix) with ESMTP id 210EC940007
	for <linux-mm@kvack.org>; Tue, 12 Oct 2021 09:46:57 -0400 (EDT)
Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id D15112D239
	for <linux-mm@kvack.org>; Tue, 12 Oct 2021 13:46:56 +0000 (UTC)
X-FDA: 78687911232.02.196D2F3
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
	by imf09.hostedemail.com (Postfix) with ESMTP id 1D4E33000104
	for <linux-mm@kvack.org>; Tue, 12 Oct 2021 13:46:56 +0000 (UTC)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by smtp-out1.suse.de (Postfix) with ESMTPS id C24E12216C;
	Tue, 12 Oct 2021 13:46:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa;
	t=1634046414; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:  content-transfer-encoding:content-transfer-encoding;
	bh=9VLruXfogL5Np+6Fued/KWQbSFQFpSks+D/J4FHdL4s=;
	b=Nw0q/LW+yvJ+VmyUfa1hHFYpUq9pziW8tFonQZEKCALt1ogq4as84YuxALFOGc8br3O3uT
	nO0NmxgjaeSrmlEmtZIm6Q9/oGOP4TZIe67pOWhbc8LY288Nj8nFOJYDsz85ZPImgM+17L
	m3dgdvcLZtKI0y3bO+++srUTh4pu1ZU=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz;
	s=susede2_ed25519; t=1634046414;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:  content-transfer-encoding:content-transfer-encoding;
	bh=9VLruXfogL5Np+6Fued/KWQbSFQFpSks+D/J4FHdL4s=;
	b=HRqpzvQaUZpCBStRISNJP0gm/3PjVunlsKHsY6lFTOAIiTYrPD3HtGruM4KURQYAwOp5hG
	K1SIGrleXNSg/3BA==
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8F55D13B6F;
	Tue, 12 Oct 2021 13:46:54 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
	by imap2.suse-dmz.suse.de with ESMTPSA
	id r2hJIs6RZWFvcQAAMHmgww
	(envelope-from <vbabka@suse.cz>); Tue, 12 Oct 2021 13:46:54 +0000
From: Vlastimil Babka <vbabka@suse.cz>
To: linux-mm@kvack.org,
	Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Pekka Enberg <penberg@kernel.org>,
	Jann Horn <jannh@google.com>
Cc: linux-kernel@vger.kernel.org,
	Roman Gushchin <guro@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH v2 1/2] mm, slub: change percpu partial accounting from objects to pages
Date: Tue, 12 Oct 2021 15:46:50 +0200
Message-Id: <20211012134651.11258-1-vbabka@suse.cz>
X-Mailer: git-send-email 2.33.0
MIME-Version: 1.0
X-Developer-Signature: v=1; a=openpgp-sha256; l=19787; h=from:subject; bh=yntem0Kk9LvQ+BUaM7cYK1vKUx4cqlmk6C4YG5vpo44=; b=owEBbQGS/pANAwAIAeAhynPxiakQAcsmYgBhZZHC0F+DC2E6675D3U3HS6VVKErkex7zSWXRHFDd ViXXwHWJATMEAAEIAB0WIQSNS5MBqTXjGL5IXszgIcpz8YmpEAUCYWWRwgAKCRDgIcpz8YmpEGm6B/ 9WE2/G/w2Z7OYU0AaQWSWgL9CLwTk/apkrU0mjQzvBEISyN0H7QKz03rkjpt00GPDQHLwju8tpgP3V oB05UvPwkTQkYl/vfXe8qaDLzv4EKvBVBcAI7322w/7SaknRoF3kZwKi1jVk7XRdQH14tuBZo7eHii Xk2o9lZELgm43UmN9BrHEGg21ZWraccE76DEPg/RpwjxpjFryrP4Z+pgLWosYruEX98QBWeAWu4D9u PlrlXCZOP6zHyco3PCltMimOd/PliX2Rv+PLneS6CwAJckiljswOUk5MmWxJZFmWadXbvskqY1l/m/ sbM3Kwwp7zJ8GijipwNu3jVijiENY1
X-Developer-Key: i=vbabka@suse.cz; a=openpgp; fpr=A940D434992C2E8E99103D50224FA7E7CC82A664
X-Rspamd-Queue-Id: 1D4E33000104
X-Stat-Signature: 8xxfetxenggieu74oow6n3yida5wobuy
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="Nw0q/LW+";
	dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=HRqpzvQa;
	spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz;
	dmarc=none
X-Rspamd-Server: rspam06
X-HE-Tag: 1634046416-687862
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

With CONFIG_SLUB_CPU_PARTIAL enabled, SLUB keeps a percpu list of partial
slabs that can be promoted to cpu slab when the previous one is depleted,
without accessing the shared partial list. A slab can be added to this li=
st
by 1) refill of an empty list from get_partial_node() - once we really ha=
ve to
access the shared partial list, we acquire multiple slabs to amortize the=
 cost
of locking, and 2) first free to a previously full slab - instead of putt=
ing
the slab on a shared partial list, we can more cheaply freeze it and put =
it on
the per-cpu list.

To control how large a percpu partial list can grow for a kmem cache,
set_cpu_partial() calculates a target number of free objects on each cpu'=
s
percpu partial list, and this can be also set by the sysfs file cpu_parti=
al.

However, the tracking of actual number of objects is imprecise, in order =
to
limit overhead from cpu X freeing an objects to a slab on percpu partial
list of cpu Y. Basically, the percpu partial slabs form a single linked l=
ist,
and when we add a new slab to the list with current head "oldpage", we se=
t in
the struct page of the slab we're adding:

page->pages =3D oldpage->pages + 1; // this is precise
page->pobjects =3D oldpage->pobjects + (page->objects - page->inuse);
page->next =3D oldpage;

Thus the real number of free objects in the slab (objects - inuse) is onl=
y
determined at the moment of adding the slab to the percpu partial list, a=
nd
further freeing doesn't update the pobjects counter nor propagate it to t=
he
current list head. As Jann reports [1], this can easily lead to large
inaccuracies, where the target number of objects (up to 30 by default) ca=
n
translate to the same number of (empty) slab pages on the list. In case 2=
)
above, we put a slab with 1 free object on the list, thus only increase
page->pobjects by 1, even if there are subsequent frees on the same slab.=
 Jann
has noticed this in practice and so did we [2] when investigating signifi=
cant
increase of kmemcg usage after switching from SLAB to SLUB.

While this is no longer a problem in kmemcg context thanks to the account=
ing
rewrite in 5.9, the memory waste is still not ideal and it's questionable
whether it makes sense to perform free object count based control when ob=
ject
counts can easily become so much inaccurate. So this patch converts the
accounting to be based on number of pages only (which is precise) and rem=
oves
the page->pobjects field completely. This is also ultimately simpler.

To retain the existing set_cpu_partial() heuristic, first calculate the t=
arget
number of objects as previously, but then convert it to target number of =
pages
by assuming the pages will be half-filled on average. This assumption mig=
ht
obviously also be inaccurate in practice, but cannot degrade to actual nu=
mber of
pages being equal to the target number of objects.

We could also skip the intermediate step with target number of objects an=
d
rewrite the heuristic in terms of pages. However we still have the sysfs =
file
cpu_partial which uses number of objects and could break existing users i=
f it
suddenly becomes number of pages, so this patch doesn't do that.

In practice, after this patch the heuristics limit the size of percpu par=
tial
list up to 2 pages. In case of a reported regression (which would mean so=
me
workload has benefited from the previous imprecise object based counting)=
, we
can tune the heuristics to get a better compromise within the new scheme,=
 while
still avoid the unexpectedly long percpu partial lists.

[1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=3D=
-umR7BLsEgjEYzA@mail.gmail.com/
[2] https://lore.kernel.org/all/2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse=
.cz/

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Evaluation
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Mel was kind enough to run v1 through mmtests machinery for netperf (loca=
lhost)
and hackbench and, for most significant results see below. So there are s=
ome
apparent regressions, especially with hackbench, which I think ultimately=
 boils
down to having shorter percpu partial lists on average and some benchmark=
s
benefiting from longer ones. Monitoring slab usage also indicated less me=
mory
usage by slab. Based on that, the following patch will bump the defaults =
to
allow longer percpu partial lists than after this patch.

However the goal is certainly not such that we would limit the percpu par=
tial
lists to 30 pages just because previously a specific alloc/free pattern c=
ould
lead to the limit of 30 objects translate to a limit to 30 pages - that w=
ould
make little sense. This is a correctness patch, and if a workload benefit=
s from
larger lists, the sysfs tuning knobs are still there to allow that.

Netperf

2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
per socket), 384GB RAM
TCP-RR:
hmean before 127045.79 after 121092.94 (-4.69%, worse)
stddev before  2634.37 after   1254.08
UDP-RR:
hmean before 166985.45 after 160668.94 ( -3.78%, worse)
stddev before 4059.69 after 1943.63

2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
per socket), 512GB RAM
TCP-RR:
hmean before 84173.25 after 76914.72 ( -8.62%, worse)
UDP-RR:
hmean before 93571.12 after 96428.69 ( 3.05%, better)
stddev before 23118.54 after 16828.14

2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
per socket), 64GB RAM
TCP-RR:
hmean before 49984.92 after 48922.27 ( -2.13%, worse)
stddev before 6248.15 after 4740.51
UDP-RR:
hmean before 61854.31 after 68761.81 ( 11.17%, better)
stddev before 4093.54 after 5898.91

other machines - within 2%

Hackbench

(results before and after the patch, negative % means worse)

2-socket AMD EPYC 7713 (64 cores, 128 threads per core), 256GB RAM
hackbench-process-sockets
Amean 	1 	0.5380	0.5583	( -3.78%)
Amean 	4 	0.7510	0.8150	( -8.52%)
Amean 	7 	0.7930	0.9533	( -20.22%)
Amean 	12 	0.7853	1.1313	( -44.06%)
Amean 	21 	1.1520	1.4993	( -30.15%)
Amean 	30 	1.6223	1.9237	( -18.57%)
Amean 	48 	2.6767	2.9903	( -11.72%)
Amean 	79 	4.0257	5.1150	( -27.06%)
Amean 	110	5.5193	7.4720	( -35.38%)
Amean 	141	7.2207	9.9840	( -38.27%)
Amean 	172	8.4770	12.1963	( -43.88%)
Amean 	203	9.6473	14.3137	( -48.37%)
Amean 	234	11.3960	18.7917	( -64.90%)
Amean 	265	13.9627	22.4607	( -60.86%)
Amean 	296	14.9163	26.0483	( -74.63%)

hackbench-thread-sockets
Amean 	1 	0.5597	0.5877	( -5.00%)
Amean 	4 	0.7913	0.8960	( -13.23%)
Amean 	7 	0.8190	1.0017	( -22.30%)
Amean 	12 	0.9560	1.1727	( -22.66%)
Amean 	21 	1.7587	1.5660	( 10.96%)
Amean 	30 	2.4477	1.9807	( 19.08%)
Amean 	48 	3.4573	3.0630	( 11.41%)
Amean 	79 	4.7903	5.1733	( -8.00%)
Amean 	110	6.1370	7.4220	( -20.94%)
Amean 	141	7.5777	9.2617	( -22.22%)
Amean 	172	9.2280	11.0907	( -20.18%)
Amean 	203	10.2793	13.3470	( -29.84%)
Amean 	234	11.2410	17.1070	( -52.18%)
Amean 	265	12.5970	23.3323	( -85.22%)
Amean 	296	17.1540	24.2857	( -41.57%)

2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
per socket), 384GB RAM
hackbench-process-sockets
Amean 	1 	0.5760	0.4793	( 16.78%)
Amean 	4 	0.9430	0.9707	( -2.93%)
Amean 	7 	1.5517	1.8843	( -21.44%)
Amean 	12 	2.4903	2.7267	( -9.49%)
Amean 	21 	3.9560	4.2877	( -8.38%)
Amean 	30 	5.4613	5.8343	( -6.83%)
Amean 	48 	8.5337	9.2937	( -8.91%)
Amean 	79 	14.0670	15.2630	( -8.50%)
Amean 	110	19.2253	21.2467	( -10.51%)
Amean 	141	23.7557	25.8550	( -8.84%)
Amean 	172	28.4407	29.7603	( -4.64%)
Amean 	203	33.3407	33.9927	( -1.96%)
Amean 	234	38.3633	39.1150	( -1.96%)
Amean 	265	43.4420	43.8470	( -0.93%)
Amean 	296	48.3680	48.9300	( -1.16%)

hackbench-thread-sockets
Amean 	1 	0.6080	0.6493	( -6.80%)
Amean 	4 	1.0000	1.0513	( -5.13%)
Amean 	7 	1.6607	2.0260	( -22.00%)
Amean 	12 	2.7637	2.9273	( -5.92%)
Amean 	21 	5.0613	4.5153	( 10.79%)
Amean 	30 	6.3340	6.1140	( 3.47%)
Amean 	48 	9.0567	9.5577	( -5.53%)
Amean 	79 	14.5657	15.7983	( -8.46%)
Amean 	110	19.6213	21.6333	( -10.25%)
Amean 	141	24.1563	26.2697	( -8.75%)
Amean 	172	28.9687	30.2187	( -4.32%)
Amean 	203	33.9763	34.6970	( -2.12%)
Amean 	234	38.8647	39.3207	( -1.17%)
Amean 	265	44.0813	44.1507	( -0.16%)
Amean 	296	49.2040	49.4330	( -0.47%)

2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
per socket), 512GB RAM
hackbench-process-sockets
Amean 	1 	0.5027	0.5017	( 0.20%)
Amean 	4 	1.1053	1.2033	( -8.87%)
Amean 	7 	1.8760	2.1820	( -16.31%)
Amean 	12 	2.9053	3.1810	( -9.49%)
Amean 	21 	4.6777	4.9920	( -6.72%)
Amean 	30 	6.5180	6.7827	( -4.06%)
Amean 	48 	10.0710	10.5227	( -4.48%)
Amean 	79 	16.4250	17.5053	( -6.58%)
Amean 	110	22.6203	24.4617	( -8.14%)
Amean 	141	28.0967	31.0363	( -10.46%)
Amean 	172	34.4030	36.9233	( -7.33%)
Amean 	203	40.5933	43.0850	( -6.14%)
Amean 	234	46.6477	48.7220	( -4.45%)
Amean 	265	53.0530	53.9597	( -1.71%)
Amean 	296	59.2760	59.9213	( -1.09%)

hackbench-thread-sockets
Amean 	1 	0.5363	0.5330	( 0.62%)
Amean 	4 	1.1647	1.2157	( -4.38%)
Amean 	7 	1.9237	2.2833	( -18.70%)
Amean 	12 	2.9943	3.3110	( -10.58%)
Amean 	21 	4.9987	5.1880	( -3.79%)
Amean 	30 	6.7583	7.0043	( -3.64%)
Amean 	48 	10.4547	10.8353	( -3.64%)
Amean 	79 	16.6707	17.6790	( -6.05%)
Amean 	110	22.8207	24.4403	( -7.10%)
Amean 	141	28.7090	31.0533	( -8.17%)
Amean 	172	34.9387	36.8260	( -5.40%)
Amean 	203	41.1567	43.0450	( -4.59%)
Amean 	234	47.3790	48.5307	( -2.43%)
Amean 	265	53.9543	54.6987	( -1.38%)
Amean 	296	60.0820	60.2163	( -0.22%)

1-socket Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 cores, 8 threads),
32 GB RAM
hackbench-process-sockets
Amean 	1 	1.4760	1.5773	( -6.87%)
Amean 	3 	3.9370	4.0910	( -3.91%)
Amean 	5 	6.6797	6.9357	( -3.83%)
Amean 	7 	9.3367	9.7150	( -4.05%)
Amean 	12	15.7627	16.1400	( -2.39%)
Amean 	18	23.5360	23.6890	( -0.65%)
Amean 	24	31.0663	31.3137	( -0.80%)
Amean 	30	38.7283	39.0037	( -0.71%)
Amean 	32	41.3417	41.6097	( -0.65%)

hackbench-thread-sockets
Amean 	1 	1.5250	1.6043	( -5.20%)
Amean 	3 	4.0897	4.2603	( -4.17%)
Amean 	5 	6.7760	7.0933	( -4.68%)
Amean 	7 	9.4817	9.9157	( -4.58%)
Amean 	12	15.9610	16.3937	( -2.71%)
Amean 	18	23.9543	24.3417	( -1.62%)
Amean 	24	31.4400	31.7217	( -0.90%)
Amean 	30	39.2457	39.5467	( -0.77%)
Amean 	32	41.8267	42.1230	( -0.71%)

2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
per socket), 64GB RAM
hackbench-process-sockets
Amean 	1 	1.0347	1.0880	( -5.15%)
Amean 	4 	1.7267	1.8527	( -7.30%)
Amean 	7 	2.6707	2.8110	( -5.25%)
Amean 	12 	4.1617	4.3383	( -4.25%)
Amean 	21 	7.0070	7.2600	( -3.61%)
Amean 	30 	9.9187	10.2397	( -3.24%)
Amean 	48 	15.6710	16.3923	( -4.60%)
Amean 	79 	24.7743	26.1247	( -5.45%)
Amean 	110	34.3000	35.9307	( -4.75%)
Amean 	141	44.2043	44.8010	( -1.35%)
Amean 	172	54.2430	54.7260	( -0.89%)
Amean 	192	60.6557	60.9777	( -0.53%)

hackbench-thread-sockets
Amean 	1 	1.0610	1.1353	( -7.01%)
Amean 	4 	1.7543	1.9140	( -9.10%)
Amean 	7 	2.7840	2.9573	( -6.23%)
Amean 	12 	4.3813	4.4937	( -2.56%)
Amean 	21 	7.3460	7.5350	( -2.57%)
Amean 	30 	10.2313	10.5190	( -2.81%)
Amean 	48 	15.9700	16.5940	( -3.91%)
Amean 	79 	25.3973	26.6637	( -4.99%)
Amean 	110	35.1087	36.4797	( -3.91%)
Amean 	141	45.8220	46.3053	( -1.05%)
Amean 	172	55.4917	55.7320	( -0.43%)
Amean 	192	62.7490	62.5410	( 0.33%)

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- added evaluation results to changelog
- added patch 2 bumping the defaults
 include/linux/mm_types.h |  2 -
 include/linux/slub_def.h | 13 +-----
 mm/slub.c                | 89 ++++++++++++++++++++++++++--------------
 3 files changed, 61 insertions(+), 43 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7f8ee09c711f..68ffa064b7a8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -124,10 +124,8 @@ struct page {
 					struct page *next;
 #ifdef CONFIG_64BIT
 					int pages;	/* Nr of pages left */
-					int pobjects;	/* Approximate count */
 #else
 					short int pages;
-					short int pobjects;
 #endif
 				};
 			};
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 85499f0586b0..0fa751b946fa 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -99,6 +99,8 @@ struct kmem_cache {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 	/* Number of per cpu partial objects to keep around */
 	unsigned int cpu_partial;
+	/* Number of per cpu partial pages to keep around */
+	unsigned int cpu_partial_pages;
 #endif
 	struct kmem_cache_order_objects oo;
=20
@@ -141,17 +143,6 @@ struct kmem_cache {
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
=20
-#ifdef CONFIG_SLUB_CPU_PARTIAL
-#define slub_cpu_partial(s)		((s)->cpu_partial)
-#define slub_set_cpu_partial(s, n)		\
-({						\
-	slub_cpu_partial(s) =3D (n);		\
-})
-#else
-#define slub_cpu_partial(s)		(0)
-#define slub_set_cpu_partial(s, n)
-#endif /* CONFIG_SLUB_CPU_PARTIAL */
-
 #ifdef CONFIG_SYSFS
 #define SLAB_SUPPORTS_SYSFS
 void sysfs_slab_unlink(struct kmem_cache *);
diff --git a/mm/slub.c b/mm/slub.c
index 3d2025f7163b..3757f31c5d97 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -414,6 +414,29 @@ static inline unsigned int oo_objects(struct kmem_ca=
che_order_objects x)
 	return x.x & OO_MASK;
 }
=20
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_o=
bjects)
+{
+	unsigned int nr_pages;
+
+	s->cpu_partial =3D nr_objects;
+
+	/*
+	 * We take the number of objects but actually limit the number of
+	 * pages on the per cpu partial list, in order to limit excessive
+	 * growth of the list. For simplicity we assume that the pages will
+	 * be half-full.
+	 */
+	nr_pages =3D DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo));
+	s->cpu_partial_pages =3D nr_pages;
+}
+#else
+static inline void
+slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects)
+{
+}
+#endif /* CONFIG_SLUB_CPU_PARTIAL */
+
 /*
  * Per slab locking using the pagelock
  */
@@ -2045,7 +2068,7 @@ static inline void remove_partial(struct kmem_cache=
_node *n,
  */
 static inline void *acquire_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page,
-		int mode, int *objects)
+		int mode)
 {
 	void *freelist;
 	unsigned long counters;
@@ -2061,7 +2084,6 @@ static inline void *acquire_slab(struct kmem_cache =
*s,
 	freelist =3D page->freelist;
 	counters =3D page->counters;
 	new.counters =3D counters;
-	*objects =3D new.objects - new.inuse;
 	if (mode) {
 		new.inuse =3D page->objects;
 		new.freelist =3D NULL;
@@ -2099,9 +2121,8 @@ static void *get_partial_node(struct kmem_cache *s,=
 struct kmem_cache_node *n,
 {
 	struct page *page, *page2;
 	void *object =3D NULL;
-	unsigned int available =3D 0;
 	unsigned long flags;
-	int objects;
+	unsigned int partial_pages =3D 0;
=20
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -2119,11 +2140,10 @@ static void *get_partial_node(struct kmem_cache *=
s, struct kmem_cache_node *n,
 		if (!pfmemalloc_match(page, gfpflags))
 			continue;
=20
-		t =3D acquire_slab(s, n, page, object =3D=3D NULL, &objects);
+		t =3D acquire_slab(s, n, page, object =3D=3D NULL);
 		if (!t)
 			break;
=20
-		available +=3D objects;
 		if (!object) {
 			*ret_page =3D page;
 			stat(s, ALLOC_FROM_PARTIAL);
@@ -2131,10 +2151,15 @@ static void *get_partial_node(struct kmem_cache *=
s, struct kmem_cache_node *n,
 		} else {
 			put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
+			partial_pages++;
 		}
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 		if (!kmem_cache_has_cpu_partial(s)
-			|| available > slub_cpu_partial(s) / 2)
+			|| partial_pages > s->cpu_partial_pages / 2)
 			break;
+#else
+		break;
+#endif
=20
 	}
 	spin_unlock_irqrestore(&n->list_lock, flags);
@@ -2539,14 +2564,13 @@ static void put_cpu_partial(struct kmem_cache *s,=
 struct page *page, int drain)
 	struct page *page_to_unfreeze =3D NULL;
 	unsigned long flags;
 	int pages =3D 0;
-	int pobjects =3D 0;
=20
 	local_lock_irqsave(&s->cpu_slab->lock, flags);
=20
 	oldpage =3D this_cpu_read(s->cpu_slab->partial);
=20
 	if (oldpage) {
-		if (drain && oldpage->pobjects > slub_cpu_partial(s)) {
+		if (drain && oldpage->pages >=3D s->cpu_partial_pages) {
 			/*
 			 * Partial array is full. Move the existing set to the
 			 * per node partial list. Postpone the actual unfreezing
@@ -2555,16 +2579,13 @@ static void put_cpu_partial(struct kmem_cache *s,=
 struct page *page, int drain)
 			page_to_unfreeze =3D oldpage;
 			oldpage =3D NULL;
 		} else {
-			pobjects =3D oldpage->pobjects;
 			pages =3D oldpage->pages;
 		}
 	}
=20
 	pages++;
-	pobjects +=3D page->objects - page->inuse;
=20
 	page->pages =3D pages;
-	page->pobjects =3D pobjects;
 	page->next =3D oldpage;
=20
 	this_cpu_write(s->cpu_slab->partial, page);
@@ -3980,6 +4001,8 @@ static void set_min_partial(struct kmem_cache *s, u=
nsigned long min)
 static void set_cpu_partial(struct kmem_cache *s)
 {
 #ifdef CONFIG_SLUB_CPU_PARTIAL
+	unsigned int nr_objects;
+
 	/*
 	 * cpu_partial determined the maximum number of objects kept in the
 	 * per cpu partial lists of a processor.
@@ -3989,24 +4012,22 @@ static void set_cpu_partial(struct kmem_cache *s)
 	 * filled up again with minimal effort. The slab will never hit the
 	 * per node partial lists and therefore no locking will be required.
 	 *
-	 * This setting also determines
-	 *
-	 * A) The number of objects from per cpu partial slabs dumped to the
-	 *    per node list when we reach the limit.
-	 * B) The number of objects in cpu partial slabs to extract from the
-	 *    per node list when we run out of per cpu objects. We only fetch
-	 *    50% to keep some capacity around for frees.
+	 * For backwards compatibility reasons, this is determined as number
+	 * of objects, even though we now limit maximum number of pages, see
+	 * slub_set_cpu_partial()
 	 */
 	if (!kmem_cache_has_cpu_partial(s))
-		slub_set_cpu_partial(s, 0);
+		nr_objects =3D 0;
 	else if (s->size >=3D PAGE_SIZE)
-		slub_set_cpu_partial(s, 2);
+		nr_objects =3D 2;
 	else if (s->size >=3D 1024)
-		slub_set_cpu_partial(s, 6);
+		nr_objects =3D 6;
 	else if (s->size >=3D 256)
-		slub_set_cpu_partial(s, 13);
+		nr_objects =3D 13;
 	else
-		slub_set_cpu_partial(s, 30);
+		nr_objects =3D 30;
+
+	slub_set_cpu_partial(s, nr_objects);
 #endif
 }
=20
@@ -5379,7 +5400,12 @@ SLAB_ATTR(min_partial);
=20
 static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
 {
-	return sysfs_emit(buf, "%u\n", slub_cpu_partial(s));
+	unsigned int nr_partial =3D 0;
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+	nr_partial =3D s->cpu_partial;
+#endif
+
+	return sysfs_emit(buf, "%u\n", nr_partial);
 }
=20
 static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
@@ -5450,12 +5476,12 @@ static ssize_t slabs_cpu_partial_show(struct kmem=
_cache *s, char *buf)
=20
 		page =3D slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu));
=20
-		if (page) {
+		if (page)
 			pages +=3D page->pages;
-			objects +=3D page->pobjects;
-		}
 	}
=20
+	/* Approximate half-full pages , see slub_set_cpu_partial() */
+	objects =3D (pages * oo_objects(s->oo)) / 2;
 	len +=3D sysfs_emit_at(buf, len, "%d(%d)", objects, pages);
=20
 #ifdef CONFIG_SMP
@@ -5463,9 +5489,12 @@ static ssize_t slabs_cpu_partial_show(struct kmem_=
cache *s, char *buf)
 		struct page *page;
=20
 		page =3D slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu));
-		if (page)
+		if (page) {
+			pages =3D READ_ONCE(page->pages);
+			objects =3D (pages * oo_objects(s->oo)) / 2;
 			len +=3D sysfs_emit_at(buf, len, " C%d=3D%d(%d)",
-					     cpu, page->pobjects, page->pages);
+					     cpu, objects, pages);
+		}
 	}
 #endif
 	len +=3D sysfs_emit_at(buf, len, "\n");
--=20
2.33.0