From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57B84C4167B for ; Wed, 29 Nov 2023 10:35:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE6E56B03C0; Wed, 29 Nov 2023 05:35:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C97496B03C2; Wed, 29 Nov 2023 05:35:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5FAF6B03C3; Wed, 29 Nov 2023 05:35:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A86066B03C0 for ; Wed, 29 Nov 2023 05:35:56 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7FDE8403B0 for ; Wed, 29 Nov 2023 10:35:56 +0000 (UTC) X-FDA: 81510636312.17.FC661DF Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) by imf28.hostedemail.com (Postfix) with ESMTP id CAA03C0009 for ; Wed, 29 Nov 2023 10:35:54 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OZKNAjsM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of elver@google.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=elver@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701254154; a=rsa-sha256; cv=none; b=y97zyI6HqEA/C/q/yBA0hXCA3Yem2PUM/LFOCu5tQd9QtRGBOgJdTV2ydb+lYVl00aLXS+ q0Um7l9U87AmfmCgEQY99oeSDdq7EX7LywE73TnnV94KYORBFsncHmG93QehlqAsJcRMCT agYeeweg6Ij8SBkeuz0eHFSpd63puwg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OZKNAjsM; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of elver@google.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=elver@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701254154; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=; b=bsS9xim3EsH29/PwrW4xSNEw4JQt2dW4Rf7YGRKXGtF7UAFPSXDzSY81z+BecF1rqCB7eL bgSuND+j9qLGSBrCaTWyaSYdMJsjKngt1MHUriJTzs5O3kJU+W/UPMrmrrXRQPxOhFzlnI b4h3kGzkQP3ATQyxoejoevUkAPZnLfQ= Received: by mail-vs1-f48.google.com with SMTP id ada2fe7eead31-462a978c470so1342606137.2 for ; Wed, 29 Nov 2023 02:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701254154; x=1701858954; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=; b=OZKNAjsMUq7T2MhJbU12XSzvmF5lD88Af0WM8vS5glyILm6Y/M9GCFXukoZsvHMmaS uAJpq2WS28z2qAZI5f4KWyDQA0AP8pWgsIkBF1cqIJgfDC4bAd7JAlgGkhp+gOhnjoEC hEUwIE3O6J7d7FuSv/ZWpXQG/BdokJumfQeisdiqsp6nufJYcz7AcyeQIDCiusJDFDVl vtrQnSq2DiIpq4RTd0cQJkDKjpF98pD88LGUw/Jp+E6oLoKWj2AX1GBjozs78nTBZy20 uoNa3GuaU2wSpO+RR3Pi0PQypnNgX/IVV6Gd2tPjDA4kLRJk+hDhSHl4utO/na052TfO fTWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701254154; x=1701858954; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=; b=onAf4PARjvoP4cvoMVSf6MOrTr1omTaWQ5uyTrwqYbhyqjreAxNaCLr9Ff9HZSW5/M h6/Z/pCxGyg91W88uv5ZRqsG94wKqcFEQnPRwLNN3new6S1J2C++j7hW81pIA2SeWC2m KmCzOosO1GQY+yLVlVxjWJNXB04++4ALmm5mtIX0LOM8Y15dhgNXfyYNg343ivUx8JvE N83PoXTQwmeohQ9we85iq1I+fC1G3EJeXdI7oBbhm9ABJTu231SzQJ+U6RjgEVS2wg3q zmDFEFsaARfwqiXAc+LopkvUxws1od/PCz2qzYZfvxtwyrbGtZxToXfnc/HpPayZz3Dd yYOA== X-Gm-Message-State: AOJu0YxG67MMHICp4Ev8JYXIwzK1Mtl11LsgAPMXAhDZEnT9xFmhzya+ o8DvAerIlquFgOgQtmXgrmQDYQqdz4AgejswmwerNw== X-Google-Smtp-Source: AGHT+IGyTyUsubURK0Y62/ZJkkb/jcIQV3Lx2dIcd0Dmv73jmLOhEJzKxaomff0g95XI30xBdf2AsBLFYMTIOhT3gt4= X-Received: by 2002:a67:fb15:0:b0:464:408a:5d87 with SMTP id d21-20020a67fb15000000b00464408a5d87mr3293331vsr.33.1701254153678; Wed, 29 Nov 2023 02:35:53 -0800 (PST) MIME-Version: 1.0 References: <20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz> <20231129-slub-percpu-caches-v3-5-6bcf536772bc@suse.cz> In-Reply-To: <20231129-slub-percpu-caches-v3-5-6bcf536772bc@suse.cz> From: Marco Elver Date: Wed, 29 Nov 2023 11:35:15 +0100 Message-ID: Subject: Re: [PATCH RFC v3 5/9] mm/slub: add opt-in percpu array cache of objects To: Vlastimil Babka Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Matthew Wilcox , "Liam R. Howlett" , Andrew Morton , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Alexander Potapenko , Dmitry Vyukov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, kasan-dev@googlegroups.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CAA03C0009 X-Stat-Signature: 8hwzuguwgyobz9kzgrth8ooxhn61j8gt X-HE-Tag: 1701254154-332974 X-HE-Meta: U2FsdGVkX18v7pw94Fj118JuZIGY1b8g951khn9l+nJ97FMmgqTpktBeHNw7a46BUzIMWoaWfPK/3hxUCeLsSS5NXcBN73scWQG9jfAMl2+JmSKle49XfwIZgoWP5uQsExv8zZuB83+7LgWZIpyq14E8bWqQ6AXxDUxhiW8smeuNjlij+F4+t2OQ3BGONo9NNtMeMc7b5ad/S8wCTP8iTDdhjMdfTg39Be2UKAh9mLIFY2J8GCEpyhH9uotNBxV4AnHc72P/aQhF8et6qffHtvsKBF4Fwz8nvhuczdK5lQZaTOBXJPQq17AAHkZTqdr3vC7B8WbplYdYWgPxAPLJOpKu+5e9rzVes5ydJQ9VmZUWnIrfo27xzR/N8ItCj7FoQRyl6JIXJonoCPs9n4rnTMirA3Y2XDlt/Z+iUMfzp3omWgRNnHXpE9sCQgQOpfkzAx6Hq9DUkjHrUyJnNIXhXozEEmNiszFxwmTIdQhAfesVrkUByZdxwvvnlkwTNLWKweIb53/O/xdSTashQoTWiedyMtr6Erh2essOdSohs0NrNEnZkQxTKc77GwkmvEOLz3A79EQFV9PjQjt+Hgm0gCwdQ1KPNlLWs/wzdqQtgdcqPjD3W8mNQSAlsFW511TBayoWaCQ2ropWL+mX/0ZMLmgQehRoupT1Tg+/4Aeif1l6r06J5lSsiIk4pEpdvI8JJo0OtfQPvl4l3G4eEUCLH9dNgSMYq6CskfgvEU9lW6bl12XAcjqR4eEGQqnJIdM8r26TUxGsYOJNTHiqZij3GO0bWRcajG8mNYWw+q0AP+zBTjhq/9ko6lYhiMcZ2qiTO+vVUamu9xeMMDXVp5svtpxHZ3NPUg4OMaBssi3VwhA4R1wnsD6Og5k+Wy5I1RK8/fNTgrKmZFo00a8800kn9KMu5gUoXbtn/JFIlUgQg2nYaWeSSJoS4/2dxIcBPXCT9XJ4tZFScdgdRm8gtdg l33zNM2N 6puxO2lZzEyl1pXnO1FR5/GhDB9+qx5hb2XBQPyreeiMHLzIg4dQn/O4i/XSlWdNNfbEMYreWk4kExmJf+6TWCejpw+JgM4d9/fGcVVLYhkePGZjcLOAknDWg6rmxztaRiF18eCBxptFawNXYQsYgqi+HUs/3i+6GFVaH263bnlAP+5dUDNO2q6//nmVA1wxijpQwmUJpWkqoTkfB3uhQBA8V+3tKY8PV0XGq7FqhXPZGAECsTdcZuf4rS2tgYnob0b+kQyhchx+rRt32nnT76m1yekM7/FezIJVMYq2ah1LeqWY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 29 Nov 2023 at 10:53, Vlastimil Babka wrote: > > kmem_cache_setup_percpu_array() will allocate a per-cpu array for > caching alloc/free objects of given size for the cache. The cache > has to be created with SLAB_NO_MERGE flag. > > When empty, half of the array is filled by an internal bulk alloc > operation. When full, half of the array is flushed by an internal bulk > free operation. > > The array does not distinguish NUMA locality of the cached objects. If > an allocation is requested with kmem_cache_alloc_node() with numa node > not equal to NUMA_NO_NODE, the array is bypassed. > > The bulk operations exposed to slab users also try to utilize the array > when possible, but leave the array empty or full and use the bulk > alloc/free only to finish the operation itself. If kmemcg is enabled and > active, bulk freeing skips the array completely as it would be less > efficient to use it. > > The locking scheme is copied from the page allocator's pcplists, based > on embedded spin locks. Interrupts are not disabled, only preemption > (cpu migration on RT). Trylock is attempted to avoid deadlock due to an > interrupt; trylock failure means the array is bypassed. > > Sysfs stat counters alloc_cpu_cache and free_cpu_cache count objects > allocated or freed using the percpu array; counters cpu_cache_refill and > cpu_cache_flush count objects refilled or flushed form the array. > > kmem_cache_prefill_percpu_array() can be called to ensure the array on > the current cpu to at least the given number of objects. However this is > only opportunistic as there's no cpu pinning between the prefill and > usage, and trylocks may fail when the usage is in an irq handler. > Therefore allocations cannot rely on the array for success even after > the prefill. But misses should be rare enough that e.g. GFP_ATOMIC > allocations should be acceptable after the refill. > > When slub_debug is enabled for a cache with percpu array, the objects in > the array are considered as allocated from the slub_debug perspective, > and the alloc/free debugging hooks occur when moving the objects between > the array and slab pages. This means that e.g. an use-after-free that > occurs for an object cached in the array is undetected. Collected > alloc/free stacktraces might also be less useful. This limitation could > be changed in the future. > > On the other hand, KASAN, kmemcg and other hooks are executed on actual > allocations and frees by kmem_cache users even if those use the array, > so their debugging or accounting accuracy should be unaffected. > > Signed-off-by: Vlastimil Babka > --- > include/linux/slab.h | 4 + > include/linux/slub_def.h | 12 ++ > mm/Kconfig | 1 + > mm/slub.c | 457 ++++++++++++++++++++++++++++++++++++++++++++++- > 4 files changed, 468 insertions(+), 6 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index d6d6ffeeb9a2..fe0c0981be59 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -197,6 +197,8 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name, > void kmem_cache_destroy(struct kmem_cache *s); > int kmem_cache_shrink(struct kmem_cache *s); > > +int kmem_cache_setup_percpu_array(struct kmem_cache *s, unsigned int count); > + > /* > * Please use this macro to create slab caches. Simply specify the > * name of the structure and maybe some flags that are listed above. > @@ -512,6 +514,8 @@ void kmem_cache_free(struct kmem_cache *s, void *objp); > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p); > int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p); > > +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int count, gfp_t gfp); > + > static __always_inline void kfree_bulk(size_t size, void **p) > { > kmem_cache_free_bulk(NULL, size, p); > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index deb90cf4bffb..2083aa849766 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -13,8 +13,10 @@ > #include > > enum stat_item { > + ALLOC_PCA, /* Allocation from percpu array cache */ > ALLOC_FASTPATH, /* Allocation from cpu slab */ > ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */ > + FREE_PCA, /* Free to percpu array cache */ > FREE_FASTPATH, /* Free to cpu slab */ > FREE_SLOWPATH, /* Freeing not to cpu slab */ > FREE_FROZEN, /* Freeing to frozen slab */ > @@ -39,6 +41,8 @@ enum stat_item { > CPU_PARTIAL_FREE, /* Refill cpu partial on free */ > CPU_PARTIAL_NODE, /* Refill cpu partial from node partial */ > CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */ > + PCA_REFILL, /* Refilling empty percpu array cache */ > + PCA_FLUSH, /* Flushing full percpu array cache */ > NR_SLUB_STAT_ITEMS > }; > > @@ -66,6 +70,13 @@ struct kmem_cache_cpu { > }; > #endif /* CONFIG_SLUB_TINY */ > > +struct slub_percpu_array { > + spinlock_t lock; > + unsigned int count; > + unsigned int used; > + void * objects[]; checkpatch complains: "foo * bar" should be "foo *bar"