From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 57B84C4167B
	for <linux-mm@archiver.kernel.org>; Wed, 29 Nov 2023 10:35:57 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CE6E56B03C0; Wed, 29 Nov 2023 05:35:56 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id C97496B03C2; Wed, 29 Nov 2023 05:35:56 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B5FAF6B03C3; Wed, 29 Nov 2023 05:35:56 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id A86066B03C0
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 05:35:56 -0500 (EST)
Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 7FDE8403B0
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 10:35:56 +0000 (UTC)
X-FDA: 81510636312.17.FC661DF
Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48])
	by imf28.hostedemail.com (Postfix) with ESMTP id CAA03C0009
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 10:35:54 +0000 (UTC)
Authentication-Results: imf28.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=OZKNAjsM;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf28.hostedemail.com: domain of elver@google.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=elver@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701254154; a=rsa-sha256;
	cv=none;
	b=y97zyI6HqEA/C/q/yBA0hXCA3Yem2PUM/LFOCu5tQd9QtRGBOgJdTV2ydb+lYVl00aLXS+
	q0Um7l9U87AmfmCgEQY99oeSDdq7EX7LywE73TnnV94KYORBFsncHmG93QehlqAsJcRMCT
	agYeeweg6Ij8SBkeuz0eHFSpd63puwg=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=OZKNAjsM;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf28.hostedemail.com: domain of elver@google.com designates 209.85.217.48 as permitted sender) smtp.mailfrom=elver@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1701254154;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=;
	b=bsS9xim3EsH29/PwrW4xSNEw4JQt2dW4Rf7YGRKXGtF7UAFPSXDzSY81z+BecF1rqCB7eL
	bgSuND+j9qLGSBrCaTWyaSYdMJsjKngt1MHUriJTzs5O3kJU+W/UPMrmrrXRQPxOhFzlnI
	b4h3kGzkQP3ATQyxoejoevUkAPZnLfQ=
Received: by mail-vs1-f48.google.com with SMTP id ada2fe7eead31-462a978c470so1342606137.2
        for <linux-mm@kvack.org>; Wed, 29 Nov 2023 02:35:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1701254154; x=1701858954; darn=kvack.org;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=;
        b=OZKNAjsMUq7T2MhJbU12XSzvmF5lD88Af0WM8vS5glyILm6Y/M9GCFXukoZsvHMmaS
         uAJpq2WS28z2qAZI5f4KWyDQA0AP8pWgsIkBF1cqIJgfDC4bAd7JAlgGkhp+gOhnjoEC
         hEUwIE3O6J7d7FuSv/ZWpXQG/BdokJumfQeisdiqsp6nufJYcz7AcyeQIDCiusJDFDVl
         vtrQnSq2DiIpq4RTd0cQJkDKjpF98pD88LGUw/Jp+E6oLoKWj2AX1GBjozs78nTBZy20
         uoNa3GuaU2wSpO+RR3Pi0PQypnNgX/IVV6Gd2tPjDA4kLRJk+hDhSHl4utO/na052TfO
         fTWA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1701254154; x=1701858954;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=iMrQed01JJcN4xQKdeWzhemwVY/MGHkgvTWPoO+Z2wQ=;
        b=onAf4PARjvoP4cvoMVSf6MOrTr1omTaWQ5uyTrwqYbhyqjreAxNaCLr9Ff9HZSW5/M
         h6/Z/pCxGyg91W88uv5ZRqsG94wKqcFEQnPRwLNN3new6S1J2C++j7hW81pIA2SeWC2m
         KmCzOosO1GQY+yLVlVxjWJNXB04++4ALmm5mtIX0LOM8Y15dhgNXfyYNg343ivUx8JvE
         N83PoXTQwmeohQ9we85iq1I+fC1G3EJeXdI7oBbhm9ABJTu231SzQJ+U6RjgEVS2wg3q
         zmDFEFsaARfwqiXAc+LopkvUxws1od/PCz2qzYZfvxtwyrbGtZxToXfnc/HpPayZz3Dd
         yYOA==
X-Gm-Message-State: AOJu0YxG67MMHICp4Ev8JYXIwzK1Mtl11LsgAPMXAhDZEnT9xFmhzya+
	o8DvAerIlquFgOgQtmXgrmQDYQqdz4AgejswmwerNw==
X-Google-Smtp-Source: AGHT+IGyTyUsubURK0Y62/ZJkkb/jcIQV3Lx2dIcd0Dmv73jmLOhEJzKxaomff0g95XI30xBdf2AsBLFYMTIOhT3gt4=
X-Received: by 2002:a67:fb15:0:b0:464:408a:5d87 with SMTP id
 d21-20020a67fb15000000b00464408a5d87mr3293331vsr.33.1701254153678; Wed, 29
 Nov 2023 02:35:53 -0800 (PST)
MIME-Version: 1.0
References: <20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz> <20231129-slub-percpu-caches-v3-5-6bcf536772bc@suse.cz>
In-Reply-To: <20231129-slub-percpu-caches-v3-5-6bcf536772bc@suse.cz>
From: Marco Elver <elver@google.com>
Date: Wed, 29 Nov 2023 11:35:15 +0100
Message-ID: <CANpmjNNOUozLuop+QddSdNd462J6CysPVcTbS9jP+aswKS9XHg@mail.gmail.com>
Subject: Re: [PATCH RFC v3 5/9] mm/slub: add opt-in percpu array cache of objects
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, 
	David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, 
	Matthew Wilcox <willy@infradead.org>, "Liam R. Howlett" <Liam.Howlett@oracle.com>, 
	Andrew Morton <akpm@linux-foundation.org>, Roman Gushchin <roman.gushchin@linux.dev>, 
	Hyeonggon Yoo <42.hyeyoo@gmail.com>, Alexander Potapenko <glider@google.com>, 
	Dmitry Vyukov <dvyukov@google.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	maple-tree@lists.infradead.org, kasan-dev@googlegroups.com
Content-Type: text/plain; charset="UTF-8"
X-Rspam-User: 
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: CAA03C0009
X-Stat-Signature: 8hwzuguwgyobz9kzgrth8ooxhn61j8gt
X-HE-Tag: 1701254154-332974
X-HE-Meta: U2FsdGVkX18v7pw94Fj118JuZIGY1b8g951khn9l+nJ97FMmgqTpktBeHNw7a46BUzIMWoaWfPK/3hxUCeLsSS5NXcBN73scWQG9jfAMl2+JmSKle49XfwIZgoWP5uQsExv8zZuB83+7LgWZIpyq14E8bWqQ6AXxDUxhiW8smeuNjlij+F4+t2OQ3BGONo9NNtMeMc7b5ad/S8wCTP8iTDdhjMdfTg39Be2UKAh9mLIFY2J8GCEpyhH9uotNBxV4AnHc72P/aQhF8et6qffHtvsKBF4Fwz8nvhuczdK5lQZaTOBXJPQq17AAHkZTqdr3vC7B8WbplYdYWgPxAPLJOpKu+5e9rzVes5ydJQ9VmZUWnIrfo27xzR/N8ItCj7FoQRyl6JIXJonoCPs9n4rnTMirA3Y2XDlt/Z+iUMfzp3omWgRNnHXpE9sCQgQOpfkzAx6Hq9DUkjHrUyJnNIXhXozEEmNiszFxwmTIdQhAfesVrkUByZdxwvvnlkwTNLWKweIb53/O/xdSTashQoTWiedyMtr6Erh2essOdSohs0NrNEnZkQxTKc77GwkmvEOLz3A79EQFV9PjQjt+Hgm0gCwdQ1KPNlLWs/wzdqQtgdcqPjD3W8mNQSAlsFW511TBayoWaCQ2ropWL+mX/0ZMLmgQehRoupT1Tg+/4Aeif1l6r06J5lSsiIk4pEpdvI8JJo0OtfQPvl4l3G4eEUCLH9dNgSMYq6CskfgvEU9lW6bl12XAcjqR4eEGQqnJIdM8r26TUxGsYOJNTHiqZij3GO0bWRcajG8mNYWw+q0AP+zBTjhq/9ko6lYhiMcZ2qiTO+vVUamu9xeMMDXVp5svtpxHZ3NPUg4OMaBssi3VwhA4R1wnsD6Og5k+Wy5I1RK8/fNTgrKmZFo00a8800kn9KMu5gUoXbtn/JFIlUgQg2nYaWeSSJoS4/2dxIcBPXCT9XJ4tZFScdgdRm8gtdg
 l33zNM2N
 6puxO2lZzEyl1pXnO1FR5/GhDB9+qx5hb2XBQPyreeiMHLzIg4dQn/O4i/XSlWdNNfbEMYreWk4kExmJf+6TWCejpw+JgM4d9/fGcVVLYhkePGZjcLOAknDWg6rmxztaRiF18eCBxptFawNXYQsYgqi+HUs/3i+6GFVaH263bnlAP+5dUDNO2q6//nmVA1wxijpQwmUJpWkqoTkfB3uhQBA8V+3tKY8PV0XGq7FqhXPZGAECsTdcZuf4rS2tgYnob0b+kQyhchx+rRt32nnT76m1yekM7/FezIJVMYq2ah1LeqWY=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, 29 Nov 2023 at 10:53, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> kmem_cache_setup_percpu_array() will allocate a per-cpu array for
> caching alloc/free objects of given size for the cache. The cache
> has to be created with SLAB_NO_MERGE flag.
>
> When empty, half of the array is filled by an internal bulk alloc
> operation. When full, half of the array is flushed by an internal bulk
> free operation.
>
> The array does not distinguish NUMA locality of the cached objects. If
> an allocation is requested with kmem_cache_alloc_node() with numa node
> not equal to NUMA_NO_NODE, the array is bypassed.
>
> The bulk operations exposed to slab users also try to utilize the array
> when possible, but leave the array empty or full and use the bulk
> alloc/free only to finish the operation itself. If kmemcg is enabled and
> active, bulk freeing skips the array completely as it would be less
> efficient to use it.
>
> The locking scheme is copied from the page allocator's pcplists, based
> on embedded spin locks. Interrupts are not disabled, only preemption
> (cpu migration on RT). Trylock is attempted to avoid deadlock due to an
> interrupt; trylock failure means the array is bypassed.
>
> Sysfs stat counters alloc_cpu_cache and free_cpu_cache count objects
> allocated or freed using the percpu array; counters cpu_cache_refill and
> cpu_cache_flush count objects refilled or flushed form the array.
>
> kmem_cache_prefill_percpu_array() can be called to ensure the array on
> the current cpu to at least the given number of objects. However this is
> only opportunistic as there's no cpu pinning between the prefill and
> usage, and trylocks may fail when the usage is in an irq handler.
> Therefore allocations cannot rely on the array for success even after
> the prefill. But misses should be rare enough that e.g. GFP_ATOMIC
> allocations should be acceptable after the refill.
>
> When slub_debug is enabled for a cache with percpu array, the objects in
> the array are considered as allocated from the slub_debug perspective,
> and the alloc/free debugging hooks occur when moving the objects between
> the array and slab pages. This means that e.g. an use-after-free that
> occurs for an object cached in the array is undetected. Collected
> alloc/free stacktraces might also be less useful. This limitation could
> be changed in the future.
>
> On the other hand, KASAN, kmemcg and other hooks are executed on actual
> allocations and frees by kmem_cache users even if those use the array,
> so their debugging or accounting accuracy should be unaffected.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/slab.h     |   4 +
>  include/linux/slub_def.h |  12 ++
>  mm/Kconfig               |   1 +
>  mm/slub.c                | 457 ++++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 468 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index d6d6ffeeb9a2..fe0c0981be59 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -197,6 +197,8 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name,
>  void kmem_cache_destroy(struct kmem_cache *s);
>  int kmem_cache_shrink(struct kmem_cache *s);
>
> +int kmem_cache_setup_percpu_array(struct kmem_cache *s, unsigned int count);
> +
>  /*
>   * Please use this macro to create slab caches. Simply specify the
>   * name of the structure and maybe some flags that are listed above.
> @@ -512,6 +514,8 @@ void kmem_cache_free(struct kmem_cache *s, void *objp);
>  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
>  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
>
> +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int count, gfp_t gfp);
> +
>  static __always_inline void kfree_bulk(size_t size, void **p)
>  {
>         kmem_cache_free_bulk(NULL, size, p);
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index deb90cf4bffb..2083aa849766 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -13,8 +13,10 @@
>  #include <linux/local_lock.h>
>
>  enum stat_item {
> +       ALLOC_PCA,              /* Allocation from percpu array cache */
>         ALLOC_FASTPATH,         /* Allocation from cpu slab */
>         ALLOC_SLOWPATH,         /* Allocation by getting a new cpu slab */
> +       FREE_PCA,               /* Free to percpu array cache */
>         FREE_FASTPATH,          /* Free to cpu slab */
>         FREE_SLOWPATH,          /* Freeing not to cpu slab */
>         FREE_FROZEN,            /* Freeing to frozen slab */
> @@ -39,6 +41,8 @@ enum stat_item {
>         CPU_PARTIAL_FREE,       /* Refill cpu partial on free */
>         CPU_PARTIAL_NODE,       /* Refill cpu partial from node partial */
>         CPU_PARTIAL_DRAIN,      /* Drain cpu partial to node partial */
> +       PCA_REFILL,             /* Refilling empty percpu array cache */
> +       PCA_FLUSH,              /* Flushing full percpu array cache */
>         NR_SLUB_STAT_ITEMS
>  };
>
> @@ -66,6 +70,13 @@ struct kmem_cache_cpu {
>  };
>  #endif /* CONFIG_SLUB_TINY */
>
> +struct slub_percpu_array {
> +       spinlock_t lock;
> +       unsigned int count;
> +       unsigned int used;
> +       void * objects[];

checkpatch complains: "foo * bar" should be "foo *bar"