From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80FB6C001DC for ; Wed, 26 Jul 2023 10:34:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CF416B0075; Wed, 26 Jul 2023 06:34:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 158AA6B0078; Wed, 26 Jul 2023 06:34:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F15858D0001; Wed, 26 Jul 2023 06:34:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E09DA6B0075 for ; Wed, 26 Jul 2023 06:34:56 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B093E1A021C for ; Wed, 26 Jul 2023 10:34:56 +0000 (UTC) X-FDA: 81053404992.09.1B87F6A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf15.hostedemail.com (Postfix) with ESMTP id 66326A0004 for ; Wed, 26 Jul 2023 10:34:54 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=3GiZP0yw; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=anRpI62x; spf=pass (imf15.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690367694; a=rsa-sha256; cv=none; b=CFZQTmIgMwSHq6/Ppt/YT3GQAXY6GKaUVLBCg7oDelh7SpPHVDFYANWBodZml5JSyBE2yE 85CqafFsCNVZKhXH+YZqLXHtdTYyi+436/FsPbkpyjNwH09wbRGwGCBnHo6GqXi2rkpXv4 TzajUxa5I6HSD89upAybzB7Mw3nbgi0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=3GiZP0yw; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=anRpI62x; spf=pass (imf15.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690367694; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rP9cCoRQS2E3rGBBhQPwvc58rPFE1YgDISl0GEDJwyA=; b=zFCOih5rq/PkBWCh2qLXuMBlYy5CBmL+6gfClZFlPayRw3DsEuR/c5SvVGjLhwgWwmqX8G CugGP5syKp4hEEnyOT79qI5rS6Aln+p85/H5iAWizzp9DScvOoxg7MjlAs+8jgIV6Ct+Pm Gjjr16WZVpUjP17jz3hUNsZKJwdurEc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C4A6821CAC; Wed, 26 Jul 2023 10:34:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1690367692; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rP9cCoRQS2E3rGBBhQPwvc58rPFE1YgDISl0GEDJwyA=; b=3GiZP0ywJ9twiZK95tEdvn42lF3vNqbRlVXTr5uHmQaNt7gk5vdFLBVY7EF6iLkqgvX6HW TqgKt1lo2w+nb3z+EiAm6XT7r1M2LzPtgrwED4nO13z6nHBuQIIn2of1c7yXEsfCDd91Ag EH48EE4rY2WI4aAGVHWrsw+qxiHh+bU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1690367692; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rP9cCoRQS2E3rGBBhQPwvc58rPFE1YgDISl0GEDJwyA=; b=anRpI62xl12WDfgp8LFxrUmRveWxcwf0808WL6NI7A4gnSaWp7oep1P4W5ox7tQdMKYt6J qHcUOgc2PzBiJmDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 78B7A139BD; Wed, 26 Jul 2023 10:34:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XMi7HMz2wGSzCwAAMHmgww (envelope-from ); Wed, 26 Jul 2023 10:34:52 +0000 Message-ID: <7a94996f-b6f0-c427-eb1e-126bcb97930c@suse.cz> Date: Wed, 26 Jul 2023 12:34:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC 1/2] Revert "mm, slub: change percpu partial accounting from objects to pages" Content-Language: en-US To: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Christoph Lameter , Pekka Enberg , Joonsoo Kim , David Rientjes , Andrew Morton Cc: Roman Gushchin , Feng Tang , "Sang, Oliver" , Jay Patel , Binder Makin , aneesh.kumar@linux.ibm.com, tsahu@linux.ibm.com, piyushs@linux.ibm.com, fengwei.yin@intel.com, ying.huang@intel.com, lkp , "oe-lkp@lists.linux.dev" , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230723190906.4082646-1-42.hyeyoo@gmail.com> <20230723190906.4082646-2-42.hyeyoo@gmail.com> From: Vlastimil Babka In-Reply-To: <20230723190906.4082646-2-42.hyeyoo@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 66326A0004 X-Stat-Signature: sm9rrzn5frcz8stewjc71cpqytoisjew X-Rspam-User: X-HE-Tag: 1690367694-274612 X-HE-Meta: U2FsdGVkX1/SbmX/BxsdeuptxO6qzK0MfT/2Chcgg/CGTQG13HrPCw8VAC60IWiHIrajQgmyl1+pEVV0VXgKQ2LTeiOuLD+w8b2Zx2O3iiluuwAFQm94+e1zNMzCsyB5cbodc33A3WBl6RaaXS1yYEmEuSrXLQOFbu/i8LyW5Y8cEVfy2zAUr0+7lAWGq4dLxzp4wPQfxI2LbtfCTrwxsCfHxh37EJyOn+nyFCcCCk0BSrb+zzHbJQyPLbRI7c+UwFA0wOOEycAp6IY1opsfHXJP8ZL4yEucO8jUo9UNRHGHih460j11Gz8br9epKvif+XCv5RJHZM1MyFOtfDWFObJhV816UsiulTIXMWjoF+ujQzGSh+E1pvAUzsG1zvm0B9WkKa2GdsiaKHbPzPM3MqAJNQE/g3G84BVoZVhpJSsgorEYhL1AJhR4siKYuZ78DmKxb1yXp93z6/BJLAt5jnoE10QhVIqtv7Ka6Xnk49r7AdptxG9gFYeG0lyxtHsCdQcy85oNl1Rnh5ZxoWqvZc6PLksG1vCDtjgWtwd5sVtHDcrOnd3JwufvlZtCe1dC1mIfXllc0ZayPKl+XE1FwGkITy11H6XEV+GrGinfs0NPmuTjcYZhBvF/Cy0dzmMEJ0kiTikQE3TOYtAZhXSYr96hVBSYL3Kl8dW1NFDUywwRtAeO4uI+POptlc/d7C/eAtrMagDIS9SrO6XfK2Un36ZoL2xLU+0z4kg21zIGSJuLR95lRaQ/n4oK+JHeedYJLEwF25WTlwXQrfl6oDMcrgJdtVjB39op/oR374M8z4ayz2s/VeiMsQ19eIPh8awogPwAjVVEH86k32cvd7pGDlcfFZ8gdxiNT3uvuiUHjGVkRGyqPcZIypUhTbXNZpi/q2IPWyITu78kvyZxpY+jmYfi6LqefmNg8X0yPQzGwlWNu55My2KGKzQvtZk4+0MkVjf0GVPgGL+GsknfrZG nLPSQnhj 6158xVtsiGE+jNm9Su9q677ofmANnniYQ4GjAGjGd06ajLSoQBThdV/uctMv6dcmsL4yBxkPdoQw9PpeOt/FBuRkQzSpeHyZPYx+KAk5x+x1q2w1r/wLlrRzuwYKSzxvj8NSbfvFt7jQsIgr3XD5VHSYHlNLtimlVUSiKw3kGfRWZ05PJEzuyEO+fOsArONhSVBROcxMzJclHhxqfV+fMdHVf3RFpr6RsstfGKKXe34MJZ3QX8HWaOuj62w1mPfVJOrPgBUmo3Lmi54qXaaD3BomZUq0VC06iFZk28d4vzdZPoHuDz6DGhsJVlw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Nit: I would change the subject from "Revert: " as it's not a revert exactly. If we can come up with a good subject that's not very long :) On 7/23/23 21:09, Hyeonggon Yoo wrote: > This is partial revert of commit b47291ef02b0 ("mm, slub: change percpu > partial accounting from objects to pages"). and full revert of commit > 662188c3a20e ("mm/slub: Simplify struct slab slabs field definition"). > > While b47291ef02b0 prevents percpu partial slab list becoming too long, > it assumes that the order of slabs are always oo_order(s->oo). I think I've considered this possibility, but decided it's not important because if the system becomes memory pressured in a way that it can't allocate the oo_order() and has to fallback, we no longer care about accurate percpu caching, as we're unlikely having optimum performance anyway. > The current approach can surprisingly lower the number of objects cached > per cpu when it fails to allocate high order slabs. Instead of accounting > the number of slabs, change it back to accounting objects, but keep > the assumption that the slab is always half-full. That's a nice solution as that avoids converting the sysfs variable, so I wouldn't mind going that way even if I doubt the performance benefits in a memory pressured system. But maybe there's a concern that if the system is really memory pressured and has to fallback to smaller orders, before this patch it would keep fewer percpu partial slabs than after this patch, which would increase the pressure further and thus be counter-productive? > With this change, the number of cached objects per cpu is not surprisingly > decreased even when it fails to allocate high order slabs. It still > prevents large inaccuracy because it does not account based on the > number of free objects when taking slabs. > --- > include/linux/slub_def.h | 2 -- > mm/slab.h | 6 ++++++ > mm/slub.c | 31 ++++++++++++------------------- > 3 files changed, 18 insertions(+), 21 deletions(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index deb90cf4bffb..589ff6a2a23f 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -109,8 +109,6 @@ struct kmem_cache { > #ifdef CONFIG_SLUB_CPU_PARTIAL > /* Number of per cpu partial objects to keep around */ > unsigned int cpu_partial; > - /* Number of per cpu partial slabs to keep around */ > - unsigned int cpu_partial_slabs; > #endif > struct kmem_cache_order_objects oo; > > diff --git a/mm/slab.h b/mm/slab.h > index 799a315695c6..be38a264df16 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -65,7 +65,13 @@ struct slab { > #ifdef CONFIG_SLUB_CPU_PARTIAL > struct { > struct slab *next; > +#ifdef CONFIG_64BIT > int slabs; /* Nr of slabs left */ > + int pobjects; /* Approximate count */ > +#else > + short int slabs; > + short int pobjects; > +#endif > }; > #endif > }; > diff --git a/mm/slub.c b/mm/slub.c > index f7940048138c..199d3d03d5b9 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -486,18 +486,7 @@ static inline unsigned int oo_objects(struct kmem_cache_order_objects x) > #ifdef CONFIG_SLUB_CPU_PARTIAL > static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) > { > - unsigned int nr_slabs; > - > s->cpu_partial = nr_objects; > - > - /* > - * We take the number of objects but actually limit the number of > - * slabs on the per cpu partial list, in order to limit excessive > - * growth of the list. For simplicity we assume that the slabs will > - * be half-full. > - */ > - nr_slabs = DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo)); > - s->cpu_partial_slabs = nr_slabs; > } > #else > static inline void > @@ -2275,7 +2264,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, > struct slab *slab, *slab2; > void *object = NULL; > unsigned long flags; > - unsigned int partial_slabs = 0; > + int objects_taken = 0; > > /* > * Racy check. If we mistakenly see no partial slabs then we > @@ -2312,11 +2301,11 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, > } else { > put_cpu_partial(s, slab, 0); > stat(s, CPU_PARTIAL_NODE); > - partial_slabs++; > + objects_taken += slab->objects / 2; > } > #ifdef CONFIG_SLUB_CPU_PARTIAL > if (!kmem_cache_has_cpu_partial(s) > - || partial_slabs > s->cpu_partial_slabs / 2) > + || objects_taken > s->cpu_partial / 2) > break; > #else > break; > @@ -2699,13 +2688,14 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain) > struct slab *slab_to_unfreeze = NULL; > unsigned long flags; > int slabs = 0; > + int pobjects = 0; > > local_lock_irqsave(&s->cpu_slab->lock, flags); > > oldslab = this_cpu_read(s->cpu_slab->partial); > > if (oldslab) { > - if (drain && oldslab->slabs >= s->cpu_partial_slabs) { > + if (drain && oldslab->pobjects >= s->cpu_partial) { > /* > * Partial array is full. Move the existing set to the > * per node partial list. Postpone the actual unfreezing > @@ -2714,14 +2704,17 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain) > slab_to_unfreeze = oldslab; > oldslab = NULL; > } else { > + pobjects = oldslab->pobjects; > slabs = oldslab->slabs; > } > } > > slabs++; > + pobjects += slab->objects / 2; > > slab->slabs = slabs; > slab->next = oldslab; > + slab->pobjects = pobjects; > > this_cpu_write(s->cpu_slab->partial, slab); > > @@ -5653,13 +5646,13 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf) > > slab = slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu)); > > - if (slab) > + if (slab) { > slabs += slab->slabs; > + objects += slab->objects; > + } > } > #endif > > - /* Approximate half-full slabs, see slub_set_cpu_partial() */ > - objects = (slabs * oo_objects(s->oo)) / 2; > len += sysfs_emit_at(buf, len, "%d(%d)", objects, slabs); > > #ifdef CONFIG_SLUB_CPU_PARTIAL > @@ -5669,7 +5662,7 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf) > slab = slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu)); > if (slab) { > slabs = READ_ONCE(slab->slabs); > - objects = (slabs * oo_objects(s->oo)) / 2; > + objects = READ_ONCE(slab->pobjects); > len += sysfs_emit_at(buf, len, " C%d=%d(%d)", > cpu, objects, slabs); > }