From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A1DFC3ABC3 for ; Tue, 13 May 2025 16:08:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D18D28D0003; Tue, 13 May 2025 12:08:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC99F8D0001; Tue, 13 May 2025 12:08:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B43488D0003; Tue, 13 May 2025 12:08:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8FABA8D0001 for ; Tue, 13 May 2025 12:08:35 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 746DF5E9D3 for ; Tue, 13 May 2025 16:08:35 +0000 (UTC) X-FDA: 83438367390.24.0F48978 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf30.hostedemail.com (Postfix) with ESMTP id 32BD88000F for ; Tue, 13 May 2025 16:08:32 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=cQgcxTg+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=O4Etl99W; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=cQgcxTg+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=O4Etl99W; spf=pass (imf30.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747152513; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fBvtkgHlz5SobIfyPxIXYeeroBpFTJXa2BoBZCQAARY=; b=DyBdxguHHtCJ6vHrOzbCHAxCAIphxF6cABOTZiybDizfAfOlt0gHl+RS+ZAfs1DgJUzNV9 IReR+gH9YrgY42kGDQkCZjrwmtjfdUAMHUNLnBmHj1GoFC1MNKoUN3fCO9QgwM4M0hTmF3 5XcWEaG4HMRP1T8AQkxqYYHQ+R+WL+U= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=cQgcxTg+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=O4Etl99W; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=cQgcxTg+; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=O4Etl99W; spf=pass (imf30.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747152513; a=rsa-sha256; cv=none; b=XZePSNiP2CeZB8jcGcGTUzKhKgRwZEVOWPBcCEDNHszm1IeW7kwWbsFYirrVpCe7Hu+VqV ahskIPYPlX/SRaWsqnGW+S8UeiNsVLb/WHnrN03wkUlQUGjx7PQajI9zOS4/J+l6MyD1Lj sbF/kbu6CDyrvS3IupvlGQCH61XkdR8= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3FCE0211AC; Tue, 13 May 2025 16:08:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1747152511; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBvtkgHlz5SobIfyPxIXYeeroBpFTJXa2BoBZCQAARY=; b=cQgcxTg+InItRrJ/Jo2FeGn2PTozXt8Vlb9Z2WsRtvbIvm0tDBXx9aLB3QR6aM4ihhltRW nayHZa/04MV15cc1ajIK+fNqp9oWc40qtt608ybwGErE/yP6NrlC9Zqo8m6ITlQObcfxTF WfBFk+mvCWGMdgi7POZjK0lZ4668P2Y= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1747152511; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBvtkgHlz5SobIfyPxIXYeeroBpFTJXa2BoBZCQAARY=; b=O4Etl99WSKMRAFEfXcI2fqjvmTC9t+ntOo5upyppAGpQ5UADNQtgaNiR3uOq69mViVW+rl zzxW90OhyV812SBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1747152511; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBvtkgHlz5SobIfyPxIXYeeroBpFTJXa2BoBZCQAARY=; b=cQgcxTg+InItRrJ/Jo2FeGn2PTozXt8Vlb9Z2WsRtvbIvm0tDBXx9aLB3QR6aM4ihhltRW nayHZa/04MV15cc1ajIK+fNqp9oWc40qtt608ybwGErE/yP6NrlC9Zqo8m6ITlQObcfxTF WfBFk+mvCWGMdgi7POZjK0lZ4668P2Y= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1747152511; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBvtkgHlz5SobIfyPxIXYeeroBpFTJXa2BoBZCQAARY=; b=O4Etl99WSKMRAFEfXcI2fqjvmTC9t+ntOo5upyppAGpQ5UADNQtgaNiR3uOq69mViVW+rl zzxW90OhyV812SBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 221531365D; Tue, 13 May 2025 16:08:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id A9/SB39uI2gAFAAAD6G6ig (envelope-from ); Tue, 13 May 2025 16:08:31 +0000 Message-ID: <1724b59a-0c3a-482c-b141-b5611665d1f4@suse.cz> Date: Tue, 13 May 2025 18:08:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 1/9] slab: add opt-in caching layer of percpu sheaves Content-Language: en-US To: Harry Yoo Cc: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes , Roman Gushchin , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org References: <20250425-slub-percpu-caches-v4-0-8a636982b4a4@suse.cz> <20250425-slub-percpu-caches-v4-1-8a636982b4a4@suse.cz> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Stat-Signature: qfn7xfib5mwfarjb1nkdwzo8nwn3hjd5 X-Rspamd-Queue-Id: 32BD88000F X-Rspam-User: X-HE-Tag: 1747152512-978973 X-HE-Meta: U2FsdGVkX1/WoPQkXniPR8Wu3JOVRj4hzoX2yasWKFFjCO52nxlKq8Porc2tzxQxkMgKEZQvD9meEZyK90etubaF/JziC1pUlp1aHTOkFHPjFfTMcQ1tuksZ9ujpsFZ+RdY33/W5C0jRyIo8iJER+K7dJ8Ey6LERKn9lJV8KyrggqXXsxQ1Ob6+HZ2fMXbVfkJes1tZCVSjTqTjdo8wui+bcBMckjoXeseXn6GDkQwNwwrRHHELV9tF0xGZZakXG2jpB4AnQ+elKj1AsFLxsDFkWkCqcG8eQK5ZAdpsQHjpujeJEuolmbFzK6IcNtRBQnISnic5G3GE3g3/nu82If/MqwzbXlAjsMWbvwlD4fYw246V6H2vuX6PMdl6W5d/I1cddQq4tFueKG74p+KUcJSCxW2syzJQzQI23A5fohbQ3/ODumxERtHcMUBP8nWQeg1OvV/DOSx3Z52rqNQvRVHWiGfhvFWg6gi9Mszr0clNA3EnPy2ONsjSixat1n5YNW43tKloSJ1lfWtkEZ5FEvAxR08gz2VYaljzPK/YR/Kgm3KgQ07RccF4OiYGe6oEUtjeEBKV8VZ/lhFFYsGNHMkmY2cxIyXXACQeeGuYrPsnjBkb1OhcO98KmYaFWy0BqsbXwxPmtNQlMadR6FI9ZvmCSWeNA8hBJE++WgfFkFqNqIwhVYwC+ajOkHZ5MOUCw7+djivEfYvUi9DB79bgEPm5UhYIksHzmGY9EDUc4eXq//n14ji/Jl0YcijOdw6VN6p9Qt1MjOD7ulVl2QTZW4z290TO5jZQR9UYT6Gdhtm+wGH9Mngt3Rrk+CASZjbebdoxvaVsHDlWaFaRyxRG/XzgoT/MUZD9pRixSQB2kRysg/LQ4fw9W5mdMtRm5dzEPKum1bkPMQO5ur/ujImnbWWDkMFr9FQ+QNmlKDG1pRJZaCJFMMhul5g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/29/25 03:08, Harry Yoo wrote: > On Fri, Apr 25, 2025 at 10:27:21AM +0200, Vlastimil Babka wrote: >> Specifying a non-zero value for a new struct kmem_cache_args field >> sheaf_capacity will setup a caching layer of percpu arrays called >> sheaves of given capacity for the created cache. >> >> Allocations from the cache will allocate via the percpu sheaves (main or >> spare) as long as they have no NUMA node preference. Frees will also >> put the object back into one of the sheaves. >> >> When both percpu sheaves are found empty during an allocation, an empty >> sheaf may be replaced with a full one from the per-node barn. If none >> are available and the allocation is allowed to block, an empty sheaf is >> refilled from slab(s) by an internal bulk alloc operation. When both >> percpu sheaves are full during freeing, the barn can replace a full one >> with an empty one, unless over a full sheaves limit. In that case a >> sheaf is flushed to slab(s) by an internal bulk free operation. Flushing >> sheaves and barns is also wired to the existing cpu flushing and cache >> shrinking operations. >> >> The sheaves do not distinguish NUMA locality of the cached objects. If >> an allocation is requested with kmem_cache_alloc_node() (or a mempolicy >> with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE), >> the sheaves are bypassed. >> >> The bulk operations exposed to slab users also try to utilize the >> sheaves as long as the necessary (full or empty) sheaves are available >> on the cpu or in the barn. Once depleted, they will fallback to bulk >> alloc/free to slabs directly to avoid double copying. >> >> The sheaf_capacity value is exported in sysfs for observability. >> >> Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf >> count objects allocated or freed using the sheaves (and thus not >> counting towards the other alloc/free path counters). Counters >> sheaf_refill and sheaf_flush count objects filled or flushed from or to >> slab pages, and can be used to assess how effective the caching is. The >> refill and flush operations will also count towards the usual >> alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for >> the backing slabs. For barn operations, barn_get and barn_put count how >> many full sheaves were get from or put to the barn, the _fail variants >> count how many such requests could not be satisfied mainly because the >> barn was either empty or full. > >> While the barn also holds empty sheaves >> to make some operations easier, these are not as critical to mandate own >> counters. Finally, there are sheaf_alloc/sheaf_free counters. > > I initially thought we need counters for empty sheaves to see how many times > it grabs empty sheaves from the barn, but looks like barn_put > ("put full sheaves to the barn") is effectively a proxy for that, right? Mostly yes, the free sheaves in barn is mainly to make the "replace full with empty" easy, but if that fails because there's no empty sheaves, the fallback with allocating an empty sheaf should still be successful enough that tracking it in detail doesn't seem that useful. >> Access to the percpu sheaves is protected by local_trylock() when >> potential callers include irq context, and local_lock() otherwise (such >> as when we already know the gfp flags allow blocking). The trylock >> failures should be rare and we can easily fallback. Each per-NUMA-node >> barn has a spin_lock. >> >> When slub_debug is enabled for a cache with sheaf_capacity also >> specified, the latter is ignored so that allocations and frees reach the >> slow path where debugging hooks are processed. >> >> Signed-off-by: Vlastimil Babka >> --- > > Reviewed-by: Harry Yoo Thanks! > LGTM, with a few nits: I've applied them, thanks. Responding only to one that needs it: >> +static __fastpath_inline >> +bool free_to_pcs(struct kmem_cache *s, void *object) >> +{ >> + struct slub_percpu_sheaves *pcs; >> + >> +restart: >> + if (!local_trylock(&s->cpu_sheaves->lock)) >> + return false; >> + >> + pcs = this_cpu_ptr(s->cpu_sheaves); >> + >> + if (unlikely(pcs->main->size == s->sheaf_capacity)) { >> + >> + struct slab_sheaf *empty; >> + >> + if (!pcs->spare) { >> + empty = barn_get_empty_sheaf(pcs->barn); >> + if (empty) { >> + pcs->spare = pcs->main; >> + pcs->main = empty; >> + goto do_free; >> + } >> + goto alloc_empty; >> + } >> + >> + if (pcs->spare->size < s->sheaf_capacity) { >> + swap(pcs->main, pcs->spare); >> + goto do_free; >> + } >> + >> + empty = barn_replace_full_sheaf(pcs->barn, pcs->main); >> + >> + if (!IS_ERR(empty)) { >> + stat(s, BARN_PUT); >> + pcs->main = empty; >> + goto do_free; >> + } > > nit: stat(s, BARN_PUT_FAIL); should probably be here instead? Hm, the intention was that no, because when PTR_ERR(empty) == -ENOMEM, we try alloc_empty_sheaf(), and that will likely succeed, and then __pcs_install_empty_sheaf() will just force the put full sheaf (and record a BARN_PUT), because we already saw that we're not over capacity. But now I see I didn't describe it as a scenario for the function's comment, so I will add that. But technically we should also record stat(s, BARN_PUT_FAIL) when that alloc_empty_sheaf() fails, but not when we "goto alloc_empty" from the "no spare" above. Bit icky but I'll add that too. >> + >> + if (PTR_ERR(empty) == -E2BIG) { >> + /* Since we got here, spare exists and is full */ >> + struct slab_sheaf *to_flush = pcs->spare; >> + >> + stat(s, BARN_PUT_FAIL); >> + >> + pcs->spare = NULL; >> + local_unlock(&s->cpu_sheaves->lock); >> + >> + sheaf_flush_unused(s, to_flush); >> + empty = to_flush; >> + goto got_empty; >> + } >