From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 37C4AC001DB
	for <linux-mm@archiver.kernel.org>; Tue,  8 Aug 2023 12:06:23 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8D8826B0071; Tue,  8 Aug 2023 08:06:22 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8890C6B0074; Tue,  8 Aug 2023 08:06:22 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 752846B0075; Tue,  8 Aug 2023 08:06:22 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 67B046B0071
	for <linux-mm@kvack.org>; Tue,  8 Aug 2023 08:06:22 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 2F6E9160C36
	for <linux-mm@kvack.org>; Tue,  8 Aug 2023 12:06:22 +0000 (UTC)
X-FDA: 81100809804.21.8D7292B
Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54])
	by imf02.hostedemail.com (Postfix) with ESMTP id 1BC4880018
	for <linux-mm@kvack.org>; Tue,  8 Aug 2023 12:06:18 +0000 (UTC)
Authentication-Results: imf02.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20221208 header.b=Wc4mMPNM;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf02.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1691496379;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=6VxlsPWdBTPAUvYb7Kun1tKN0ClHERPlCGEFAPO7bek=;
	b=4IdvinP9fAtSsp08wyHwIo32pfG2OdPT7jn7z3j00vJTvqerXxZB2ecumrwG0MuketsKs3
	nw9JgbyVuw5LBVxWe0DMrWhPg8iKVzIlcF0L9JaLAKwz74Qtk3T3u3XxitiHvnFu+H7pee
	jpqePJII5wUsNAocM6QS2T0cbe/KVII=
ARC-Authentication-Results: i=1;
	imf02.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20221208 header.b=Wc4mMPNM;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf02.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691496379; a=rsa-sha256;
	cv=none;
	b=VrSM4mMw6w5Lees2/s9MenxSefm92d12uDOmoM0qPCGlri45QBZG4mnmDtOsiuAg8kdKIa
	ZHJEKnq/umQKpMN/SWVr7m0hffQlb7A1ObltwA68cRGXANCrcjXkQRaUY53o4K7/8ig2qF
	wAr2O8Fz2AetkAmEcwdyBdqeeUhrG9A=
Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-4fe1344b707so8735145e87.1
        for <linux-mm@kvack.org>; Tue, 08 Aug 2023 05:06:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1691496377; x=1692101177;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=6VxlsPWdBTPAUvYb7Kun1tKN0ClHERPlCGEFAPO7bek=;
        b=Wc4mMPNMD4w/ICoukamaO2bZtr2lJAZOAvRWtRlfwQmrjGMC/xmsMhIATPraxqyd07
         DLTmFShm+iopMcLzZaoxoml19h+5A+/dKeKdM9vdjZ9ilbl2OKtMwhgzMZi7PUSUzb4v
         qmhDzznLbXRG0A5c+wSFlsnouAZexRrVZA7SCa7TTRBOV4QbJCWInRJs80mcPpVrVP/P
         ZNKBy+XX0aPT7O6eIA4z5nF3CrmFCu3qyQkVLSiY0zsBrlZWqJ2a9FZAM+Ld/yUxk8oT
         uCYpXX5girgutnTguoPmyjjgzpDUtyAKcoP23O6w7A5EuUjemlmvoLjhYs8+YKU5DMUb
         SfOQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1691496377; x=1692101177;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=6VxlsPWdBTPAUvYb7Kun1tKN0ClHERPlCGEFAPO7bek=;
        b=KacgyYHOdTGc/XnR0HWbzQ99OXzl98RQMo/dtf6xPCUxQoqS+v3pZxRsBqFcPlcibI
         LzpKUoP1K9TeZ6zVTDXkTd4lF5vleDvIN120Nvr0q+9efwV8gTWEt0r54yDeVtjx9Gwz
         VWhmxIGy4uPEbmUTXzjL0iIQpiN6VrXox9GrZMpdQqdpaheZYqhyS2Qa0MTxvDavefOa
         0G5YjK7PBYuvnCvDvUJbZFxo3TXp1DlQWcT/yUBiFav2U3JF/KXhqhA2ur2Bik731eBs
         4RWqOP8VcIQ0ebACObmHFx5DK09zT8U2ceQQkVQmfFeQQphkU0gVM9Zc1Vifs0sUcTBR
         p8+Q==
X-Gm-Message-State: AOJu0Yx+9Jotc8dWM7my+JN8I2H622mom0N6hELWH0rZju1hlF0T6p8y
	btMc/YMkRK1KzRUDQQqhJVsmNjQ8CYG0dwa9hnU=
X-Google-Smtp-Source: AGHT+IFJxFw11piHdQ5PI9U2WJIIc/+c38Pseu7O4O5J602XrFi/HqPWExd1zSzBT69vUcuwSJ38XgGzvfxwECyMXmk=
X-Received: by 2002:a05:6512:3c8b:b0:4f8:67aa:4f03 with SMTP id
 h11-20020a0565123c8b00b004f867aa4f03mr9348992lfv.1.1691496376797; Tue, 08 Aug
 2023 05:06:16 -0700 (PDT)
MIME-Version: 1.0
References: <20230808095342.12637-7-vbabka@suse.cz> <20230808095342.12637-9-vbabka@suse.cz>
In-Reply-To: <20230808095342.12637-9-vbabka@suse.cz>
From: Pedro Falcato <pedro.falcato@gmail.com>
Date: Tue, 8 Aug 2023 13:06:04 +0100
Message-ID: <CAKbZUD1v1xXB-sPknsVhVVa812TG6YffoVU+3r59NY7r3t=fmQ@mail.gmail.com>
Subject: Re: [RFC v1 2/5] mm, slub: add opt-in slub_percpu_array
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>, Matthew Wilcox <willy@infradead.org>, 
	Christoph Lameter <cl@linux.com>, David Rientjes <rientjes@google.com>, Pekka Enberg <penberg@kernel.org>, 
	Joonsoo Kim <iamjoonsoo.kim@lge.com>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, 
	Roman Gushchin <roman.gushchin@linux.dev>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	patches@lists.linux.dev
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Stat-Signature: wwtkuxagejdutk1dd65ppzmsjorqjudk
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 1BC4880018
X-HE-Tag: 1691496378-861739
X-HE-Meta: U2FsdGVkX19irt1rJMW15ei9JmCZVpOXNNzSUTv46pd0EJkzDr+0WjFh6zIFUvFzYYcvmadlnIFZUSQC+OZKKMDp8dPyxp5Ztcvs7un7/bTg+ie4gA/CEwz+XWs9igeQIyRoLVOqnQsqhTZYf8VLfz3b+FmU3f9lzNFL88VncPbVJDOBVXDYv7HO+9mDuRvv9toAUQjFADcZQc64gLM1i9vAwI9sbeUlAJixqeqZ7sCjHLTvk2x2GFwqheKJQy2KfBwPVhTI3Ek+uDccwhRbQ/i4P/DP5rVq0LOTIXyXrQF1D9+Q+dm0GCy8oHpbMTmDnaEcjUOV6KbSh+ECudMW+KjAw98tMIvSMOjJG3ByC4yIgP95Dd6xpUW18uNpSP1/2SCI+FsWWtpVd0KHznfhQ79anjTnRNZJjONEq+Y80LZ0nlXnIgqV7HVEFY9k0XcYyrvRP/n/aJE6Ma4cDE9K20wdLOosjssbqrGTVwzQpzmhzhBRn1/jQS4qspUt4ThWfj6RP9ayRaeZkZ6QDYyKgA+5dexe1fVYl+rieCurGkZWKbkK5DAmKqWZPd1BKIMYTOyFIhrgANpIUpRUfM/uH/BKcFLXeFvKIjTXM7LpGOitzHSz1quyETmHkgL4mBjARpSESBcX/8ln2+GG4dGtuk5VsMU/fLc+tcfVbHtY949hxN4/YmANykza4qWczj606r5Ix/o4IZFhwRgvPfxW2KRQXrpPvysNI2+7Rr9igtRTdvjnEH1koB9sOhMq0VPQhIro+c05TxN5IH+YmjQ2PoB++SYO4m2Ru82K9B24ZpGtWo5QAgYYOXy3HWAMLwZ838nefzKOgVO60TGCwcaIUePzsVT7SrcJR1W4e+uHn8igGh5CBqLd5NHm33jbLIv9dxVtd8yUSK2SeMiBGwtpsWEYXywJjJj4hHj08+aj+DuDwvQHrSUP77zJhWkTZzokf3JB47HlLkfSez9SbEZ
 MFC8h+jZ
 q5xKuMA/Nh+ptwmmX1IoK3ANM3Yn/3oHVVb/jkneDUXjlVsvzAML688DgfCgFuKkVXLoqzE7XvAJOjTb09K41h0WZoGhF64LmZMnpPDmYCBQpjE0C3i+6fz9Umil5sbI9WbDP8hKp4jeHxbJ8N0w6xzErCy7koy/ikJ7P5wpVEEVK1Ap05jckV2SUIUNZP0i8F8O6OXW2BHh0XlCOCXx3HYQEFdrZ/zROy1GtbWjzQ3VO9xPwVa/637zOoXUzeK9gj0dUFt0LgKwtbADehLDVCyWkprYUR4ENO4PuAWZ1eLyohpLHCp4kagN7WidUZUBBf2m2bvhvbEvfdjBmj+U+dGlwahTf2q/VgQHz2Fp4ZmedMGzpVT1B3+/zdi7EOdIrT9cGY/GfdxeC6XbHJ+kw3s9AbONk/22VmxdKlfx0kdbuTUfVY/FvFGfcw9gGS4ZmVt39IvBPmCvdaLk3hp4YAue27Uan4CL1PmOlXwQGAini6wJuEQBEXMgmcOxJ83uJnld6
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Aug 8, 2023 at 10:54=E2=80=AFAM Vlastimil Babka <vbabka@suse.cz> wr=
ote:
>
> kmem_cache_setup_percpu_array() will allocate a per-cpu array for
> caching alloc/free objects of given size for the cache. The cache
> has to be created with SLAB_NO_MERGE flag.
>
> The array is filled by freeing. When empty for alloc or full for
> freeing, it's simply bypassed by the operation, there's currently no
> batch freeing/allocations.
>
> The locking is copied from the page allocator's pcplists, based on
> embedded spin locks. Interrupts are not disabled, only preemption (cpu
> migration on RT). Trylock is attempted to avoid deadlock due to
> an intnerrupt, trylock failure means the array is bypassed.
>
> Sysfs stat counters alloc_cpu_cache and free_cpu_cache count operations
> that used the percpu array.
>
> Bulk allocation bypasses the array, bulk freeing does not.
>
> kmem_cache_prefill_percpu_array() can be called to ensure the array on
> the current cpu to at least the given number of objects. However this is
> only opportunistic as there's no cpu pinning and the trylocks may always
> fail. Therefore allocations cannot rely on the array for success even
> after the prefill. But misses should be rare enough that e.g. GFP_ATOMIC
> allocations should be acceptable after the refill.
> The operation is currently not optimized.

As I asked on IRC, I'm curious about three questions:

1) How does this affect SLUB's anti-queueing ideas?

2) Since this is so similar to SLAB's caching, is it realistic to make
this opt-out instead?

3) What performance difference do you expect/see from benchmarks?

> More TODO/FIXMEs:
>
> - NUMA awareness - preferred node currently ignored, __GFP_THISNODE not
>   honored
> - slub_debug - will not work for allocations from the array. Normally in
>   SLUB implementation the slub_debug kills all fast paths, but that
>   could lead to depleting the reserves if we ignore the prefill and use
>   GFP_ATOMIC. Needs more thought.
> ---
>  include/linux/slab.h     |   4 +
>  include/linux/slub_def.h |  10 ++
>  mm/slub.c                | 210 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 223 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 848c7c82ad5a..f6c91cbc1544 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -196,6 +196,8 @@ struct kmem_cache *kmem_cache_create_usercopy(const c=
har *name,
>  void kmem_cache_destroy(struct kmem_cache *s);
>  int kmem_cache_shrink(struct kmem_cache *s);
>
> +int kmem_cache_setup_percpu_array(struct kmem_cache *s, unsigned int cou=
nt);
> +
>  /*
>   * Please use this macro to create slab caches. Simply specify the
>   * name of the structure and maybe some flags that are listed above.
> @@ -494,6 +496,8 @@ void kmem_cache_free(struct kmem_cache *s, void *objp=
);
>  void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
>  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size=
, void **p);
>
> +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int c=
ount, gfp_t gfp);
> +
>  static __always_inline void kfree_bulk(size_t size, void **p)
>  {
>         kmem_cache_free_bulk(NULL, size, p);
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index deb90cf4bffb..c85434668419 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -13,8 +13,10 @@
>  #include <linux/local_lock.h>
>
>  enum stat_item {
> +       ALLOC_PERCPU_CACHE,     /* Allocation from percpu array cache */
>         ALLOC_FASTPATH,         /* Allocation from cpu slab */
>         ALLOC_SLOWPATH,         /* Allocation by getting a new cpu slab *=
/
> +       FREE_PERCPU_CACHE,      /* Free to percpu array cache */
>         FREE_FASTPATH,          /* Free to cpu slab */
>         FREE_SLOWPATH,          /* Freeing not to cpu slab */
>         FREE_FROZEN,            /* Freeing to frozen slab */
> @@ -66,6 +68,13 @@ struct kmem_cache_cpu {
>  };
>  #endif /* CONFIG_SLUB_TINY */
>
> +struct slub_percpu_array {
> +       spinlock_t lock;

Since this is a percpu array, you probably want to avoid a lock here.
An idea would be to have some sort of bool accessing;
and then doing:

preempt_disable();
WRITE_ONCE(accessing, 1);

/* doing pcpu array stuff */
WRITE_ONCE(accessing, 0);
preempt_enable();

which would avoid the atomic in a fast path while still giving you
safety on IRQ paths. Although reclamation gets harder as you stop
being able to reclaim these pcpu arrays from other CPUs.

--=20
Pedro