From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9818C25B41 for ; Mon, 23 Oct 2023 16:00:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3754D6B00DF; Mon, 23 Oct 2023 12:00:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 325426B00E0; Mon, 23 Oct 2023 12:00:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1ED106B00E1; Mon, 23 Oct 2023 12:00:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0C6016B00DF for ; Mon, 23 Oct 2023 12:00:16 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A157D4029D for ; Mon, 23 Oct 2023 16:00:15 +0000 (UTC) X-FDA: 81377187990.19.0D00DCE Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf19.hostedemail.com (Postfix) with ESMTP id 3A2F61A002C for ; Mon, 23 Oct 2023 16:00:12 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="uXyg/KeB"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=I7aHMkZQ; spf=pass (imf19.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698076813; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ci88fiZi7x2IyI2FoMPvCXZY9i+2+BsJo5XNRuhXCuo=; b=OIwaoEsWhxVcpY48hwZNtSXqxPB9L2NhplOS+qJS5N/gfVD5o4ZAS4nDDDf6m2RtrMSsAV meZ+OdSMHx5VUgv7G502c5/qOLqYCW+QFhj+YUddJ6unQeuFbJ9OJuFDzDcHF9JxDojlyH KBvZFOUQKbiJKtatQ/UcVDcSC/xkcbg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698076813; a=rsa-sha256; cv=none; b=DRulpHgA3C058uaXAy4mXWsNZLy3AOTlnDlJQpEY9Zj4L3fdfJ1dBaDzZfotSTAKbTJET+ 58V+ZXQ4JzXLXCmFIefTWQa/D42JyCTAin+Ysxo1dZgOaOFdTLZFk7FV3i8D5KNc0wavza TQfoYHZjNFivK7RCrh+tl1AKYAhIh/E= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="uXyg/KeB"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=I7aHMkZQ; spf=pass (imf19.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CC08121ACA; Mon, 23 Oct 2023 16:00:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1698076810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ci88fiZi7x2IyI2FoMPvCXZY9i+2+BsJo5XNRuhXCuo=; b=uXyg/KeBZX/Re+jNaOI9y6whNxibc2jQ9bW8t5qPWI0FPL8KcQzDlU4e/aiFpniQTmcSUJ +vwQX7OhHcrcsQyW0Og1uwvnrZLEp1OTNspguNqgnOqZ0Vd7LQ74736azAkrIkwweYVDfu bTwwgJL5bP8usPRf7uVnAs9QZS42oPc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1698076810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ci88fiZi7x2IyI2FoMPvCXZY9i+2+BsJo5XNRuhXCuo=; b=I7aHMkZQe2271mJL9HefEhf8yWzkerTubwUp7aSXhfSLTHDskn8f2WXIwcFFctq8F39ABD lvz5tx5u1R0RQaAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 85B35132FD; Mon, 23 Oct 2023 16:00:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id QaDLH4qYNmUdIwAAMHmgww (envelope-from ); Mon, 23 Oct 2023 16:00:10 +0000 Message-ID: <79a879cc-f5f8-08ef-0afa-3688d433a756@suse.cz> Date: Mon, 23 Oct 2023 18:00:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH v2 3/6] slub: Don't freeze slabs for cpu partial Content-Language: en-US To: chengming.zhou@linux.dev, cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, willy@infradead.org, pcc@google.com, tytso@mit.edu, maz@kernel.org, ruansy.fnst@fujitsu.com, vishal.moola@gmail.com, lrh2000@pku.edu.cn, hughd@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chengming Zhou References: <20231021144317.3400916-1-chengming.zhou@linux.dev> <20231021144317.3400916-4-chengming.zhou@linux.dev> From: Vlastimil Babka In-Reply-To: <20231021144317.3400916-4-chengming.zhou@linux.dev> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3A2F61A002C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: xqz55kkm5e14169ygt6a5679rx13xox8 X-HE-Tag: 1698076812-515797 X-HE-Meta: U2FsdGVkX1+FpD2DoTWXgPowffd2zu6Wl0AW04aS2aUZkcBMo7pUiMv6qndfwc7Apyspu9L+opF8f9LdegLgqWGbuWEHd6dXnRPAeTa8B9WyNskhV/fGVxVPaN7i3V+TxrhuT/uyP8XQyYbwmYlv6i88LAyhkcD9uqh0PBoUGVtCbNq7sQm+0ooBBbvrgysfX+Cjt9wEYX0EutfFjpfFsPxyFsaPw2HrIuTdVcWtpmhhX2jVAW1ZQ9Ebcniz1UaDdNkqzeu16nTO3xVpnZ0ILwjxZwt6qPY73/kIa5avRFsTk0VRrq92pJW0yQvSW5sEukp63xJsH9T44EbiV5bUSy3xFkLt9J2OFAS4h5GA/6ZLkPcrxZz/nwmR98BPmxcaqkY5VW5pv9u9i0GjLZANJVXn+1+TLWWqCjOGUae6HJFUA0CtSVD8/CcMqvQ1kW+k+w9/gk95Q3xmdfyh8OrWBQXLdtP3VfJsEZfkBYkYM1i11rs4uhFcaiGOIJ6hwaC55FJoTaDp5uE69czd+eDXEUE4oxIhQc5nKG0A5kcPE/bC3dVTwmPhcboqRhOLqzLvLhkdsyEGZkyCqXbuXgIzDZr2i0ySA8SPnFHxcP8YMqY6gmv1fAG4d2Qufhva68LyMSj7l5JTdU6CacnRRVvQuCmuJGvpUrdj0kKydurOKrLiuMXAC0aVduYPITXLGm/Z1PrOKVeghIoLrFPgXn7qudLuEcsDMU98bVRTLzzBGPsuwHiaSFkCSlq1cm3G26E5E0h3apjcot8eianQ7kgdb8sYmZKEl+Iz3ZT2ckAnfHQ9apBYeEzpXK4v7OHm4UucpiezeMz7Ke1o1aZIJK7eTZ8G6g+tmUbHtB1dvVQ7bMlqgSYflet5kexhZ/XpY5n+t9oEuRid3ETMUz7HXOT1QQDSGErFtq7VaMDSgJCYqb7mMKxzEfb2oby27AYMaVqHWi6pt9JOBI0tkexfo6j LnVfUYJz r12Dptc+DfDq8NWPg42IV8yUez/ACkfTnWfB0tyfPMriEWfoutlAiG1j/E7QwrIbI8lg3VE9btT+FC9nt/56InMuu8Kz1AJLvIzWs7QKI2FUk1dVKEoOh8Ay5zR7vIVijY8DqX3EJiovacAgBKr+ilgT7KhQCbRwKeR0150FBR3DuGn2+gVhId9hfqnAFjPfI44OhZP0bjBUbkBRd+jo7rHfzsJ5GQruDKDaGxma0M1GbguBD9pJgqS//yn1ztuEgcF8Ap7UMO/mmHIbICVYGCdnuI52juJ7Uhv1woLymQ12SSOm2p4ATIY1YJs8fSP5FAdNSCOfBwNW5g3I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/21/23 16:43, chengming.zhou@linux.dev wrote: > From: Chengming Zhou > > Now we will freeze slabs when moving them out of node partial list to > cpu partial list, this method needs two cmpxchg_double operations: > > 1. freeze slab (acquire_slab()) under the node list_lock > 2. get_freelist() when pick used in ___slab_alloc() > > Actually we don't need to freeze when moving slabs out of node partial > list, we can delay freeze to use slab freelist in ___slab_alloc(), so > we can save one cmpxchg_double(). > > And there are other good points: > > 1. The moving of slabs between node partial list and cpu partial list > becomes simpler, since we don't need to freeze or unfreeze at all. > > 2. The node list_lock contention would be less, since we only need to > freeze one slab under the node list_lock. (In fact, we can first > move slabs out of node partial list, don't need to freeze any slab > at all, so the contention on slab won't transfer to the node list_lock > contention.) > > We can achieve this because there is no concurrent path would manipulate > the partial slab list except the __slab_free() path, which is serialized > now. > > Note this patch just change the parts of moving the partial slabs for > easy code review, we will fix other parts in the following patches. > Specifically this patch change three paths: > 1. get partial slab from node: get_partial_node() > 2. put partial slab to node: __unfreeze_partials() > 3. cache partail slab on cpu when __slab_free() So the problem with this approach that one patch breaks things and another one fixes them up, is that git bisect becomes problematic, so we shouldn't do that and instead bite the bullet and deal with a potentially large patch. At some level it's unavoidable as one has to grasp all the moving pieces anyway to see e.g. if the changes in allocation path are compatible with the changes in freeing. When possible, we can do preparatory stuff that doesn't break things like in patches 1 and 2, maybe get_cpu_partial() could also be introduced (even if unused) before switching the world over to the new scheme in a single patch (and possible cleanups afterwards). So would it be possible to redo it in such way please? > @@ -2621,23 +2622,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab) > spin_lock_irqsave(&n->list_lock, flags); > } > > - do { > - > - old.freelist = slab->freelist; > - old.counters = slab->counters; > - VM_BUG_ON(!old.frozen); > - > - new.counters = old.counters; > - new.freelist = old.freelist; > - > - new.frozen = 0; > - > - } while (!__slab_update_freelist(s, slab, > - old.freelist, old.counters, > - new.freelist, new.counters, > - "unfreezing slab")); > - > - if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) { > + if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) { > slab->next = slab_to_discard; > slab_to_discard = slab; > } else { > @@ -3634,18 +3619,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > was_frozen = new.frozen; > new.inuse -= cnt; > if ((!new.inuse || !prior) && !was_frozen) { > - > - if (kmem_cache_has_cpu_partial(s) && !prior) { > - > - /* > - * Slab was on no list before and will be > - * partially empty > - * We can defer the list move and instead > - * freeze it. > - */ > - new.frozen = 1; > - > - } else { /* Needs to be taken off a list */ > + /* Needs to be taken off a list */ > + if (!kmem_cache_has_cpu_partial(s) || prior) { > > n = get_node(s, slab_nid(slab)); > /* > @@ -3675,7 +3650,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > * activity can be necessary. > */ > stat(s, FREE_FROZEN); > - } else if (new.frozen) { > + } else if (kmem_cache_has_cpu_partial(s) && !prior) { > /* > * If we just froze the slab then put it onto the > * per cpu partial list. I think this comment is now misleading, we didn't freeze the slab, so it's now something like "if we started with a full slab...".