From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69EADC433FE for ; Tue, 12 Oct 2021 13:46:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE9CA60555 for ; Tue, 12 Oct 2021 13:46:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DE9CA60555 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5DDDE6B006C; Tue, 12 Oct 2021 09:46:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52F79940007; Tue, 12 Oct 2021 09:46:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 355336B0073; Tue, 12 Oct 2021 09:46:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 210EC940007 for ; Tue, 12 Oct 2021 09:46:57 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D15112D239 for ; Tue, 12 Oct 2021 13:46:56 +0000 (UTC) X-FDA: 78687911232.02.196D2F3 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id 1D4E33000104 for ; Tue, 12 Oct 2021 13:46:56 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C24E12216C; Tue, 12 Oct 2021 13:46:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1634046414; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=9VLruXfogL5Np+6Fued/KWQbSFQFpSks+D/J4FHdL4s=; b=Nw0q/LW+yvJ+VmyUfa1hHFYpUq9pziW8tFonQZEKCALt1ogq4as84YuxALFOGc8br3O3uT nO0NmxgjaeSrmlEmtZIm6Q9/oGOP4TZIe67pOWhbc8LY288Nj8nFOJYDsz85ZPImgM+17L m3dgdvcLZtKI0y3bO+++srUTh4pu1ZU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1634046414; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=9VLruXfogL5Np+6Fued/KWQbSFQFpSks+D/J4FHdL4s=; b=HRqpzvQaUZpCBStRISNJP0gm/3PjVunlsKHsY6lFTOAIiTYrPD3HtGruM4KURQYAwOp5hG K1SIGrleXNSg/3BA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8F55D13B6F; Tue, 12 Oct 2021 13:46:54 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id r2hJIs6RZWFvcQAAMHmgww (envelope-from ); Tue, 12 Oct 2021 13:46:54 +0000 From: Vlastimil Babka To: linux-mm@kvack.org, Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Jann Horn Cc: linux-kernel@vger.kernel.org, Roman Gushchin , Vlastimil Babka Subject: [PATCH v2 1/2] mm, slub: change percpu partial accounting from objects to pages Date: Tue, 12 Oct 2021 15:46:50 +0200 Message-Id: <20211012134651.11258-1-vbabka@suse.cz> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=19787; h=from:subject; bh=yntem0Kk9LvQ+BUaM7cYK1vKUx4cqlmk6C4YG5vpo44=; b=owEBbQGS/pANAwAIAeAhynPxiakQAcsmYgBhZZHC0F+DC2E6675D3U3HS6VVKErkex7zSWXRHFDd ViXXwHWJATMEAAEIAB0WIQSNS5MBqTXjGL5IXszgIcpz8YmpEAUCYWWRwgAKCRDgIcpz8YmpEGm6B/ 9WE2/G/w2Z7OYU0AaQWSWgL9CLwTk/apkrU0mjQzvBEISyN0H7QKz03rkjpt00GPDQHLwju8tpgP3V oB05UvPwkTQkYl/vfXe8qaDLzv4EKvBVBcAI7322w/7SaknRoF3kZwKi1jVk7XRdQH14tuBZo7eHii Xk2o9lZELgm43UmN9BrHEGg21ZWraccE76DEPg/RpwjxpjFryrP4Z+pgLWosYruEX98QBWeAWu4D9u PlrlXCZOP6zHyco3PCltMimOd/PliX2Rv+PLneS6CwAJckiljswOUk5MmWxJZFmWadXbvskqY1l/m/ sbM3Kwwp7zJ8GijipwNu3jVijiENY1 X-Developer-Key: i=vbabka@suse.cz; a=openpgp; fpr=A940D434992C2E8E99103D50224FA7E7CC82A664 X-Rspamd-Queue-Id: 1D4E33000104 X-Stat-Signature: 8xxfetxenggieu74oow6n3yida5wobuy Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="Nw0q/LW+"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=HRqpzvQa; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-Rspamd-Server: rspam06 X-HE-Tag: 1634046416-687862 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With CONFIG_SLUB_CPU_PARTIAL enabled, SLUB keeps a percpu list of partial slabs that can be promoted to cpu slab when the previous one is depleted, without accessing the shared partial list. A slab can be added to this li= st by 1) refill of an empty list from get_partial_node() - once we really ha= ve to access the shared partial list, we acquire multiple slabs to amortize the= cost of locking, and 2) first free to a previously full slab - instead of putt= ing the slab on a shared partial list, we can more cheaply freeze it and put = it on the per-cpu list. To control how large a percpu partial list can grow for a kmem cache, set_cpu_partial() calculates a target number of free objects on each cpu'= s percpu partial list, and this can be also set by the sysfs file cpu_parti= al. However, the tracking of actual number of objects is imprecise, in order = to limit overhead from cpu X freeing an objects to a slab on percpu partial list of cpu Y. Basically, the percpu partial slabs form a single linked l= ist, and when we add a new slab to the list with current head "oldpage", we se= t in the struct page of the slab we're adding: page->pages =3D oldpage->pages + 1; // this is precise page->pobjects =3D oldpage->pobjects + (page->objects - page->inuse); page->next =3D oldpage; Thus the real number of free objects in the slab (objects - inuse) is onl= y determined at the moment of adding the slab to the percpu partial list, a= nd further freeing doesn't update the pobjects counter nor propagate it to t= he current list head. As Jann reports [1], this can easily lead to large inaccuracies, where the target number of objects (up to 30 by default) ca= n translate to the same number of (empty) slab pages on the list. In case 2= ) above, we put a slab with 1 free object on the list, thus only increase page->pobjects by 1, even if there are subsequent frees on the same slab.= Jann has noticed this in practice and so did we [2] when investigating signifi= cant increase of kmemcg usage after switching from SLAB to SLUB. While this is no longer a problem in kmemcg context thanks to the account= ing rewrite in 5.9, the memory waste is still not ideal and it's questionable whether it makes sense to perform free object count based control when ob= ject counts can easily become so much inaccurate. So this patch converts the accounting to be based on number of pages only (which is precise) and rem= oves the page->pobjects field completely. This is also ultimately simpler. To retain the existing set_cpu_partial() heuristic, first calculate the t= arget number of objects as previously, but then convert it to target number of = pages by assuming the pages will be half-filled on average. This assumption mig= ht obviously also be inaccurate in practice, but cannot degrade to actual nu= mber of pages being equal to the target number of objects. We could also skip the intermediate step with target number of objects an= d rewrite the heuristic in terms of pages. However we still have the sysfs = file cpu_partial which uses number of objects and could break existing users i= f it suddenly becomes number of pages, so this patch doesn't do that. In practice, after this patch the heuristics limit the size of percpu par= tial list up to 2 pages. In case of a reported regression (which would mean so= me workload has benefited from the previous imprecise object based counting)= , we can tune the heuristics to get a better compromise within the new scheme,= while still avoid the unexpectedly long percpu partial lists. [1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=3D= -umR7BLsEgjEYzA@mail.gmail.com/ [2] https://lore.kernel.org/all/2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse= .cz/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Evaluation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Mel was kind enough to run v1 through mmtests machinery for netperf (loca= lhost) and hackbench and, for most significant results see below. So there are s= ome apparent regressions, especially with hackbench, which I think ultimately= boils down to having shorter percpu partial lists on average and some benchmark= s benefiting from longer ones. Monitoring slab usage also indicated less me= mory usage by slab. Based on that, the following patch will bump the defaults = to allow longer percpu partial lists than after this patch. However the goal is certainly not such that we would limit the percpu par= tial lists to 30 pages just because previously a specific alloc/free pattern c= ould lead to the limit of 30 objects translate to a limit to 30 pages - that w= ould make little sense. This is a correctness patch, and if a workload benefit= s from larger lists, the sysfs tuning knobs are still there to allow that. Netperf 2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads per socket), 384GB RAM TCP-RR: hmean before 127045.79 after 121092.94 (-4.69%, worse) stddev before 2634.37 after 1254.08 UDP-RR: hmean before 166985.45 after 160668.94 ( -3.78%, worse) stddev before 4059.69 after 1943.63 2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads per socket), 512GB RAM TCP-RR: hmean before 84173.25 after 76914.72 ( -8.62%, worse) UDP-RR: hmean before 93571.12 after 96428.69 ( 3.05%, better) stddev before 23118.54 after 16828.14 2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads per socket), 64GB RAM TCP-RR: hmean before 49984.92 after 48922.27 ( -2.13%, worse) stddev before 6248.15 after 4740.51 UDP-RR: hmean before 61854.31 after 68761.81 ( 11.17%, better) stddev before 4093.54 after 5898.91 other machines - within 2% Hackbench (results before and after the patch, negative % means worse) 2-socket AMD EPYC 7713 (64 cores, 128 threads per core), 256GB RAM hackbench-process-sockets Amean 1 0.5380 0.5583 ( -3.78%) Amean 4 0.7510 0.8150 ( -8.52%) Amean 7 0.7930 0.9533 ( -20.22%) Amean 12 0.7853 1.1313 ( -44.06%) Amean 21 1.1520 1.4993 ( -30.15%) Amean 30 1.6223 1.9237 ( -18.57%) Amean 48 2.6767 2.9903 ( -11.72%) Amean 79 4.0257 5.1150 ( -27.06%) Amean 110 5.5193 7.4720 ( -35.38%) Amean 141 7.2207 9.9840 ( -38.27%) Amean 172 8.4770 12.1963 ( -43.88%) Amean 203 9.6473 14.3137 ( -48.37%) Amean 234 11.3960 18.7917 ( -64.90%) Amean 265 13.9627 22.4607 ( -60.86%) Amean 296 14.9163 26.0483 ( -74.63%) hackbench-thread-sockets Amean 1 0.5597 0.5877 ( -5.00%) Amean 4 0.7913 0.8960 ( -13.23%) Amean 7 0.8190 1.0017 ( -22.30%) Amean 12 0.9560 1.1727 ( -22.66%) Amean 21 1.7587 1.5660 ( 10.96%) Amean 30 2.4477 1.9807 ( 19.08%) Amean 48 3.4573 3.0630 ( 11.41%) Amean 79 4.7903 5.1733 ( -8.00%) Amean 110 6.1370 7.4220 ( -20.94%) Amean 141 7.5777 9.2617 ( -22.22%) Amean 172 9.2280 11.0907 ( -20.18%) Amean 203 10.2793 13.3470 ( -29.84%) Amean 234 11.2410 17.1070 ( -52.18%) Amean 265 12.5970 23.3323 ( -85.22%) Amean 296 17.1540 24.2857 ( -41.57%) 2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads per socket), 384GB RAM hackbench-process-sockets Amean 1 0.5760 0.4793 ( 16.78%) Amean 4 0.9430 0.9707 ( -2.93%) Amean 7 1.5517 1.8843 ( -21.44%) Amean 12 2.4903 2.7267 ( -9.49%) Amean 21 3.9560 4.2877 ( -8.38%) Amean 30 5.4613 5.8343 ( -6.83%) Amean 48 8.5337 9.2937 ( -8.91%) Amean 79 14.0670 15.2630 ( -8.50%) Amean 110 19.2253 21.2467 ( -10.51%) Amean 141 23.7557 25.8550 ( -8.84%) Amean 172 28.4407 29.7603 ( -4.64%) Amean 203 33.3407 33.9927 ( -1.96%) Amean 234 38.3633 39.1150 ( -1.96%) Amean 265 43.4420 43.8470 ( -0.93%) Amean 296 48.3680 48.9300 ( -1.16%) hackbench-thread-sockets Amean 1 0.6080 0.6493 ( -6.80%) Amean 4 1.0000 1.0513 ( -5.13%) Amean 7 1.6607 2.0260 ( -22.00%) Amean 12 2.7637 2.9273 ( -5.92%) Amean 21 5.0613 4.5153 ( 10.79%) Amean 30 6.3340 6.1140 ( 3.47%) Amean 48 9.0567 9.5577 ( -5.53%) Amean 79 14.5657 15.7983 ( -8.46%) Amean 110 19.6213 21.6333 ( -10.25%) Amean 141 24.1563 26.2697 ( -8.75%) Amean 172 28.9687 30.2187 ( -4.32%) Amean 203 33.9763 34.6970 ( -2.12%) Amean 234 38.8647 39.3207 ( -1.17%) Amean 265 44.0813 44.1507 ( -0.16%) Amean 296 49.2040 49.4330 ( -0.47%) 2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads per socket), 512GB RAM hackbench-process-sockets Amean 1 0.5027 0.5017 ( 0.20%) Amean 4 1.1053 1.2033 ( -8.87%) Amean 7 1.8760 2.1820 ( -16.31%) Amean 12 2.9053 3.1810 ( -9.49%) Amean 21 4.6777 4.9920 ( -6.72%) Amean 30 6.5180 6.7827 ( -4.06%) Amean 48 10.0710 10.5227 ( -4.48%) Amean 79 16.4250 17.5053 ( -6.58%) Amean 110 22.6203 24.4617 ( -8.14%) Amean 141 28.0967 31.0363 ( -10.46%) Amean 172 34.4030 36.9233 ( -7.33%) Amean 203 40.5933 43.0850 ( -6.14%) Amean 234 46.6477 48.7220 ( -4.45%) Amean 265 53.0530 53.9597 ( -1.71%) Amean 296 59.2760 59.9213 ( -1.09%) hackbench-thread-sockets Amean 1 0.5363 0.5330 ( 0.62%) Amean 4 1.1647 1.2157 ( -4.38%) Amean 7 1.9237 2.2833 ( -18.70%) Amean 12 2.9943 3.3110 ( -10.58%) Amean 21 4.9987 5.1880 ( -3.79%) Amean 30 6.7583 7.0043 ( -3.64%) Amean 48 10.4547 10.8353 ( -3.64%) Amean 79 16.6707 17.6790 ( -6.05%) Amean 110 22.8207 24.4403 ( -7.10%) Amean 141 28.7090 31.0533 ( -8.17%) Amean 172 34.9387 36.8260 ( -5.40%) Amean 203 41.1567 43.0450 ( -4.59%) Amean 234 47.3790 48.5307 ( -2.43%) Amean 265 53.9543 54.6987 ( -1.38%) Amean 296 60.0820 60.2163 ( -0.22%) 1-socket Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 cores, 8 threads), 32 GB RAM hackbench-process-sockets Amean 1 1.4760 1.5773 ( -6.87%) Amean 3 3.9370 4.0910 ( -3.91%) Amean 5 6.6797 6.9357 ( -3.83%) Amean 7 9.3367 9.7150 ( -4.05%) Amean 12 15.7627 16.1400 ( -2.39%) Amean 18 23.5360 23.6890 ( -0.65%) Amean 24 31.0663 31.3137 ( -0.80%) Amean 30 38.7283 39.0037 ( -0.71%) Amean 32 41.3417 41.6097 ( -0.65%) hackbench-thread-sockets Amean 1 1.5250 1.6043 ( -5.20%) Amean 3 4.0897 4.2603 ( -4.17%) Amean 5 6.7760 7.0933 ( -4.68%) Amean 7 9.4817 9.9157 ( -4.58%) Amean 12 15.9610 16.3937 ( -2.71%) Amean 18 23.9543 24.3417 ( -1.62%) Amean 24 31.4400 31.7217 ( -0.90%) Amean 30 39.2457 39.5467 ( -0.77%) Amean 32 41.8267 42.1230 ( -0.71%) 2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads per socket), 64GB RAM hackbench-process-sockets Amean 1 1.0347 1.0880 ( -5.15%) Amean 4 1.7267 1.8527 ( -7.30%) Amean 7 2.6707 2.8110 ( -5.25%) Amean 12 4.1617 4.3383 ( -4.25%) Amean 21 7.0070 7.2600 ( -3.61%) Amean 30 9.9187 10.2397 ( -3.24%) Amean 48 15.6710 16.3923 ( -4.60%) Amean 79 24.7743 26.1247 ( -5.45%) Amean 110 34.3000 35.9307 ( -4.75%) Amean 141 44.2043 44.8010 ( -1.35%) Amean 172 54.2430 54.7260 ( -0.89%) Amean 192 60.6557 60.9777 ( -0.53%) hackbench-thread-sockets Amean 1 1.0610 1.1353 ( -7.01%) Amean 4 1.7543 1.9140 ( -9.10%) Amean 7 2.7840 2.9573 ( -6.23%) Amean 12 4.3813 4.4937 ( -2.56%) Amean 21 7.3460 7.5350 ( -2.57%) Amean 30 10.2313 10.5190 ( -2.81%) Amean 48 15.9700 16.5940 ( -3.91%) Amean 79 25.3973 26.6637 ( -4.99%) Amean 110 35.1087 36.4797 ( -3.91%) Amean 141 45.8220 46.3053 ( -1.05%) Amean 172 55.4917 55.7320 ( -0.43%) Amean 192 62.7490 62.5410 ( 0.33%) Reported-by: Jann Horn Signed-off-by: Vlastimil Babka --- Changes in v2: - added evaluation results to changelog - added patch 2 bumping the defaults include/linux/mm_types.h | 2 - include/linux/slub_def.h | 13 +----- mm/slub.c | 89 ++++++++++++++++++++++++++-------------- 3 files changed, 61 insertions(+), 43 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7f8ee09c711f..68ffa064b7a8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -124,10 +124,8 @@ struct page { struct page *next; #ifdef CONFIG_64BIT int pages; /* Nr of pages left */ - int pobjects; /* Approximate count */ #else short int pages; - short int pobjects; #endif }; }; diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 85499f0586b0..0fa751b946fa 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -99,6 +99,8 @@ struct kmem_cache { #ifdef CONFIG_SLUB_CPU_PARTIAL /* Number of per cpu partial objects to keep around */ unsigned int cpu_partial; + /* Number of per cpu partial pages to keep around */ + unsigned int cpu_partial_pages; #endif struct kmem_cache_order_objects oo; =20 @@ -141,17 +143,6 @@ struct kmem_cache { struct kmem_cache_node *node[MAX_NUMNODES]; }; =20 -#ifdef CONFIG_SLUB_CPU_PARTIAL -#define slub_cpu_partial(s) ((s)->cpu_partial) -#define slub_set_cpu_partial(s, n) \ -({ \ - slub_cpu_partial(s) =3D (n); \ -}) -#else -#define slub_cpu_partial(s) (0) -#define slub_set_cpu_partial(s, n) -#endif /* CONFIG_SLUB_CPU_PARTIAL */ - #ifdef CONFIG_SYSFS #define SLAB_SUPPORTS_SYSFS void sysfs_slab_unlink(struct kmem_cache *); diff --git a/mm/slub.c b/mm/slub.c index 3d2025f7163b..3757f31c5d97 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -414,6 +414,29 @@ static inline unsigned int oo_objects(struct kmem_ca= che_order_objects x) return x.x & OO_MASK; } =20 +#ifdef CONFIG_SLUB_CPU_PARTIAL +static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_o= bjects) +{ + unsigned int nr_pages; + + s->cpu_partial =3D nr_objects; + + /* + * We take the number of objects but actually limit the number of + * pages on the per cpu partial list, in order to limit excessive + * growth of the list. For simplicity we assume that the pages will + * be half-full. + */ + nr_pages =3D DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo)); + s->cpu_partial_pages =3D nr_pages; +} +#else +static inline void +slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) +{ +} +#endif /* CONFIG_SLUB_CPU_PARTIAL */ + /* * Per slab locking using the pagelock */ @@ -2045,7 +2068,7 @@ static inline void remove_partial(struct kmem_cache= _node *n, */ static inline void *acquire_slab(struct kmem_cache *s, struct kmem_cache_node *n, struct page *page, - int mode, int *objects) + int mode) { void *freelist; unsigned long counters; @@ -2061,7 +2084,6 @@ static inline void *acquire_slab(struct kmem_cache = *s, freelist =3D page->freelist; counters =3D page->counters; new.counters =3D counters; - *objects =3D new.objects - new.inuse; if (mode) { new.inuse =3D page->objects; new.freelist =3D NULL; @@ -2099,9 +2121,8 @@ static void *get_partial_node(struct kmem_cache *s,= struct kmem_cache_node *n, { struct page *page, *page2; void *object =3D NULL; - unsigned int available =3D 0; unsigned long flags; - int objects; + unsigned int partial_pages =3D 0; =20 /* * Racy check. If we mistakenly see no partial slabs then we @@ -2119,11 +2140,10 @@ static void *get_partial_node(struct kmem_cache *= s, struct kmem_cache_node *n, if (!pfmemalloc_match(page, gfpflags)) continue; =20 - t =3D acquire_slab(s, n, page, object =3D=3D NULL, &objects); + t =3D acquire_slab(s, n, page, object =3D=3D NULL); if (!t) break; =20 - available +=3D objects; if (!object) { *ret_page =3D page; stat(s, ALLOC_FROM_PARTIAL); @@ -2131,10 +2151,15 @@ static void *get_partial_node(struct kmem_cache *= s, struct kmem_cache_node *n, } else { put_cpu_partial(s, page, 0); stat(s, CPU_PARTIAL_NODE); + partial_pages++; } +#ifdef CONFIG_SLUB_CPU_PARTIAL if (!kmem_cache_has_cpu_partial(s) - || available > slub_cpu_partial(s) / 2) + || partial_pages > s->cpu_partial_pages / 2) break; +#else + break; +#endif =20 } spin_unlock_irqrestore(&n->list_lock, flags); @@ -2539,14 +2564,13 @@ static void put_cpu_partial(struct kmem_cache *s,= struct page *page, int drain) struct page *page_to_unfreeze =3D NULL; unsigned long flags; int pages =3D 0; - int pobjects =3D 0; =20 local_lock_irqsave(&s->cpu_slab->lock, flags); =20 oldpage =3D this_cpu_read(s->cpu_slab->partial); =20 if (oldpage) { - if (drain && oldpage->pobjects > slub_cpu_partial(s)) { + if (drain && oldpage->pages >=3D s->cpu_partial_pages) { /* * Partial array is full. Move the existing set to the * per node partial list. Postpone the actual unfreezing @@ -2555,16 +2579,13 @@ static void put_cpu_partial(struct kmem_cache *s,= struct page *page, int drain) page_to_unfreeze =3D oldpage; oldpage =3D NULL; } else { - pobjects =3D oldpage->pobjects; pages =3D oldpage->pages; } } =20 pages++; - pobjects +=3D page->objects - page->inuse; =20 page->pages =3D pages; - page->pobjects =3D pobjects; page->next =3D oldpage; =20 this_cpu_write(s->cpu_slab->partial, page); @@ -3980,6 +4001,8 @@ static void set_min_partial(struct kmem_cache *s, u= nsigned long min) static void set_cpu_partial(struct kmem_cache *s) { #ifdef CONFIG_SLUB_CPU_PARTIAL + unsigned int nr_objects; + /* * cpu_partial determined the maximum number of objects kept in the * per cpu partial lists of a processor. @@ -3989,24 +4012,22 @@ static void set_cpu_partial(struct kmem_cache *s) * filled up again with minimal effort. The slab will never hit the * per node partial lists and therefore no locking will be required. * - * This setting also determines - * - * A) The number of objects from per cpu partial slabs dumped to the - * per node list when we reach the limit. - * B) The number of objects in cpu partial slabs to extract from the - * per node list when we run out of per cpu objects. We only fetch - * 50% to keep some capacity around for frees. + * For backwards compatibility reasons, this is determined as number + * of objects, even though we now limit maximum number of pages, see + * slub_set_cpu_partial() */ if (!kmem_cache_has_cpu_partial(s)) - slub_set_cpu_partial(s, 0); + nr_objects =3D 0; else if (s->size >=3D PAGE_SIZE) - slub_set_cpu_partial(s, 2); + nr_objects =3D 2; else if (s->size >=3D 1024) - slub_set_cpu_partial(s, 6); + nr_objects =3D 6; else if (s->size >=3D 256) - slub_set_cpu_partial(s, 13); + nr_objects =3D 13; else - slub_set_cpu_partial(s, 30); + nr_objects =3D 30; + + slub_set_cpu_partial(s, nr_objects); #endif } =20 @@ -5379,7 +5400,12 @@ SLAB_ATTR(min_partial); =20 static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf) { - return sysfs_emit(buf, "%u\n", slub_cpu_partial(s)); + unsigned int nr_partial =3D 0; +#ifdef CONFIG_SLUB_CPU_PARTIAL + nr_partial =3D s->cpu_partial; +#endif + + return sysfs_emit(buf, "%u\n", nr_partial); } =20 static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf, @@ -5450,12 +5476,12 @@ static ssize_t slabs_cpu_partial_show(struct kmem= _cache *s, char *buf) =20 page =3D slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu)); =20 - if (page) { + if (page) pages +=3D page->pages; - objects +=3D page->pobjects; - } } =20 + /* Approximate half-full pages , see slub_set_cpu_partial() */ + objects =3D (pages * oo_objects(s->oo)) / 2; len +=3D sysfs_emit_at(buf, len, "%d(%d)", objects, pages); =20 #ifdef CONFIG_SMP @@ -5463,9 +5489,12 @@ static ssize_t slabs_cpu_partial_show(struct kmem_= cache *s, char *buf) struct page *page; =20 page =3D slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu)); - if (page) + if (page) { + pages =3D READ_ONCE(page->pages); + objects =3D (pages * oo_objects(s->oo)) / 2; len +=3D sysfs_emit_at(buf, len, " C%d=3D%d(%d)", - cpu, page->pobjects, page->pages); + cpu, objects, pages); + } } #endif len +=3D sysfs_emit_at(buf, len, "\n"); --=20 2.33.0