From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37C4AC001DB for ; Tue, 8 Aug 2023 12:06:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D8826B0071; Tue, 8 Aug 2023 08:06:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8890C6B0074; Tue, 8 Aug 2023 08:06:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 752846B0075; Tue, 8 Aug 2023 08:06:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 67B046B0071 for ; Tue, 8 Aug 2023 08:06:22 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2F6E9160C36 for ; Tue, 8 Aug 2023 12:06:22 +0000 (UTC) X-FDA: 81100809804.21.8D7292B Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf02.hostedemail.com (Postfix) with ESMTP id 1BC4880018 for ; Tue, 8 Aug 2023 12:06:18 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Wc4mMPNM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691496379; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6VxlsPWdBTPAUvYb7Kun1tKN0ClHERPlCGEFAPO7bek=; b=4IdvinP9fAtSsp08wyHwIo32pfG2OdPT7jn7z3j00vJTvqerXxZB2ecumrwG0MuketsKs3 nw9JgbyVuw5LBVxWe0DMrWhPg8iKVzIlcF0L9JaLAKwz74Qtk3T3u3XxitiHvnFu+H7pee jpqePJII5wUsNAocM6QS2T0cbe/KVII= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Wc4mMPNM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691496379; a=rsa-sha256; cv=none; b=VrSM4mMw6w5Lees2/s9MenxSefm92d12uDOmoM0qPCGlri45QBZG4mnmDtOsiuAg8kdKIa ZHJEKnq/umQKpMN/SWVr7m0hffQlb7A1ObltwA68cRGXANCrcjXkQRaUY53o4K7/8ig2qF wAr2O8Fz2AetkAmEcwdyBdqeeUhrG9A= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-4fe1344b707so8735145e87.1 for ; Tue, 08 Aug 2023 05:06:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691496377; x=1692101177; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6VxlsPWdBTPAUvYb7Kun1tKN0ClHERPlCGEFAPO7bek=; b=Wc4mMPNMD4w/ICoukamaO2bZtr2lJAZOAvRWtRlfwQmrjGMC/xmsMhIATPraxqyd07 DLTmFShm+iopMcLzZaoxoml19h+5A+/dKeKdM9vdjZ9ilbl2OKtMwhgzMZi7PUSUzb4v qmhDzznLbXRG0A5c+wSFlsnouAZexRrVZA7SCa7TTRBOV4QbJCWInRJs80mcPpVrVP/P ZNKBy+XX0aPT7O6eIA4z5nF3CrmFCu3qyQkVLSiY0zsBrlZWqJ2a9FZAM+Ld/yUxk8oT uCYpXX5girgutnTguoPmyjjgzpDUtyAKcoP23O6w7A5EuUjemlmvoLjhYs8+YKU5DMUb SfOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691496377; x=1692101177; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6VxlsPWdBTPAUvYb7Kun1tKN0ClHERPlCGEFAPO7bek=; b=KacgyYHOdTGc/XnR0HWbzQ99OXzl98RQMo/dtf6xPCUxQoqS+v3pZxRsBqFcPlcibI LzpKUoP1K9TeZ6zVTDXkTd4lF5vleDvIN120Nvr0q+9efwV8gTWEt0r54yDeVtjx9Gwz VWhmxIGy4uPEbmUTXzjL0iIQpiN6VrXox9GrZMpdQqdpaheZYqhyS2Qa0MTxvDavefOa 0G5YjK7PBYuvnCvDvUJbZFxo3TXp1DlQWcT/yUBiFav2U3JF/KXhqhA2ur2Bik731eBs 4RWqOP8VcIQ0ebACObmHFx5DK09zT8U2ceQQkVQmfFeQQphkU0gVM9Zc1Vifs0sUcTBR p8+Q== X-Gm-Message-State: AOJu0Yx+9Jotc8dWM7my+JN8I2H622mom0N6hELWH0rZju1hlF0T6p8y btMc/YMkRK1KzRUDQQqhJVsmNjQ8CYG0dwa9hnU= X-Google-Smtp-Source: AGHT+IFJxFw11piHdQ5PI9U2WJIIc/+c38Pseu7O4O5J602XrFi/HqPWExd1zSzBT69vUcuwSJ38XgGzvfxwECyMXmk= X-Received: by 2002:a05:6512:3c8b:b0:4f8:67aa:4f03 with SMTP id h11-20020a0565123c8b00b004f867aa4f03mr9348992lfv.1.1691496376797; Tue, 08 Aug 2023 05:06:16 -0700 (PDT) MIME-Version: 1.0 References: <20230808095342.12637-7-vbabka@suse.cz> <20230808095342.12637-9-vbabka@suse.cz> In-Reply-To: <20230808095342.12637-9-vbabka@suse.cz> From: Pedro Falcato Date: Tue, 8 Aug 2023 13:06:04 +0100 Message-ID: Subject: Re: [RFC v1 2/5] mm, slub: add opt-in slub_percpu_array To: Vlastimil Babka Cc: "Liam R. Howlett" , Matthew Wilcox , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: wwtkuxagejdutk1dd65ppzmsjorqjudk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1BC4880018 X-HE-Tag: 1691496378-861739 X-HE-Meta: U2FsdGVkX19irt1rJMW15ei9JmCZVpOXNNzSUTv46pd0EJkzDr+0WjFh6zIFUvFzYYcvmadlnIFZUSQC+OZKKMDp8dPyxp5Ztcvs7un7/bTg+ie4gA/CEwz+XWs9igeQIyRoLVOqnQsqhTZYf8VLfz3b+FmU3f9lzNFL88VncPbVJDOBVXDYv7HO+9mDuRvv9toAUQjFADcZQc64gLM1i9vAwI9sbeUlAJixqeqZ7sCjHLTvk2x2GFwqheKJQy2KfBwPVhTI3Ek+uDccwhRbQ/i4P/DP5rVq0LOTIXyXrQF1D9+Q+dm0GCy8oHpbMTmDnaEcjUOV6KbSh+ECudMW+KjAw98tMIvSMOjJG3ByC4yIgP95Dd6xpUW18uNpSP1/2SCI+FsWWtpVd0KHznfhQ79anjTnRNZJjONEq+Y80LZ0nlXnIgqV7HVEFY9k0XcYyrvRP/n/aJE6Ma4cDE9K20wdLOosjssbqrGTVwzQpzmhzhBRn1/jQS4qspUt4ThWfj6RP9ayRaeZkZ6QDYyKgA+5dexe1fVYl+rieCurGkZWKbkK5DAmKqWZPd1BKIMYTOyFIhrgANpIUpRUfM/uH/BKcFLXeFvKIjTXM7LpGOitzHSz1quyETmHkgL4mBjARpSESBcX/8ln2+GG4dGtuk5VsMU/fLc+tcfVbHtY949hxN4/YmANykza4qWczj606r5Ix/o4IZFhwRgvPfxW2KRQXrpPvysNI2+7Rr9igtRTdvjnEH1koB9sOhMq0VPQhIro+c05TxN5IH+YmjQ2PoB++SYO4m2Ru82K9B24ZpGtWo5QAgYYOXy3HWAMLwZ838nefzKOgVO60TGCwcaIUePzsVT7SrcJR1W4e+uHn8igGh5CBqLd5NHm33jbLIv9dxVtd8yUSK2SeMiBGwtpsWEYXywJjJj4hHj08+aj+DuDwvQHrSUP77zJhWkTZzokf3JB47HlLkfSez9SbEZ MFC8h+jZ q5xKuMA/Nh+ptwmmX1IoK3ANM3Yn/3oHVVb/jkneDUXjlVsvzAML688DgfCgFuKkVXLoqzE7XvAJOjTb09K41h0WZoGhF64LmZMnpPDmYCBQpjE0C3i+6fz9Umil5sbI9WbDP8hKp4jeHxbJ8N0w6xzErCy7koy/ikJ7P5wpVEEVK1Ap05jckV2SUIUNZP0i8F8O6OXW2BHh0XlCOCXx3HYQEFdrZ/zROy1GtbWjzQ3VO9xPwVa/637zOoXUzeK9gj0dUFt0LgKwtbADehLDVCyWkprYUR4ENO4PuAWZ1eLyohpLHCp4kagN7WidUZUBBf2m2bvhvbEvfdjBmj+U+dGlwahTf2q/VgQHz2Fp4ZmedMGzpVT1B3+/zdi7EOdIrT9cGY/GfdxeC6XbHJ+kw3s9AbONk/22VmxdKlfx0kdbuTUfVY/FvFGfcw9gGS4ZmVt39IvBPmCvdaLk3hp4YAue27Uan4CL1PmOlXwQGAini6wJuEQBEXMgmcOxJ83uJnld6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 8, 2023 at 10:54=E2=80=AFAM Vlastimil Babka wr= ote: > > kmem_cache_setup_percpu_array() will allocate a per-cpu array for > caching alloc/free objects of given size for the cache. The cache > has to be created with SLAB_NO_MERGE flag. > > The array is filled by freeing. When empty for alloc or full for > freeing, it's simply bypassed by the operation, there's currently no > batch freeing/allocations. > > The locking is copied from the page allocator's pcplists, based on > embedded spin locks. Interrupts are not disabled, only preemption (cpu > migration on RT). Trylock is attempted to avoid deadlock due to > an intnerrupt, trylock failure means the array is bypassed. > > Sysfs stat counters alloc_cpu_cache and free_cpu_cache count operations > that used the percpu array. > > Bulk allocation bypasses the array, bulk freeing does not. > > kmem_cache_prefill_percpu_array() can be called to ensure the array on > the current cpu to at least the given number of objects. However this is > only opportunistic as there's no cpu pinning and the trylocks may always > fail. Therefore allocations cannot rely on the array for success even > after the prefill. But misses should be rare enough that e.g. GFP_ATOMIC > allocations should be acceptable after the refill. > The operation is currently not optimized. As I asked on IRC, I'm curious about three questions: 1) How does this affect SLUB's anti-queueing ideas? 2) Since this is so similar to SLAB's caching, is it realistic to make this opt-out instead? 3) What performance difference do you expect/see from benchmarks? > More TODO/FIXMEs: > > - NUMA awareness - preferred node currently ignored, __GFP_THISNODE not > honored > - slub_debug - will not work for allocations from the array. Normally in > SLUB implementation the slub_debug kills all fast paths, but that > could lead to depleting the reserves if we ignore the prefill and use > GFP_ATOMIC. Needs more thought. > --- > include/linux/slab.h | 4 + > include/linux/slub_def.h | 10 ++ > mm/slub.c | 210 ++++++++++++++++++++++++++++++++++++++- > 3 files changed, 223 insertions(+), 1 deletion(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 848c7c82ad5a..f6c91cbc1544 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -196,6 +196,8 @@ struct kmem_cache *kmem_cache_create_usercopy(const c= har *name, > void kmem_cache_destroy(struct kmem_cache *s); > int kmem_cache_shrink(struct kmem_cache *s); > > +int kmem_cache_setup_percpu_array(struct kmem_cache *s, unsigned int cou= nt); > + > /* > * Please use this macro to create slab caches. Simply specify the > * name of the structure and maybe some flags that are listed above. > @@ -494,6 +496,8 @@ void kmem_cache_free(struct kmem_cache *s, void *objp= ); > void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p); > int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size= , void **p); > > +int kmem_cache_prefill_percpu_array(struct kmem_cache *s, unsigned int c= ount, gfp_t gfp); > + > static __always_inline void kfree_bulk(size_t size, void **p) > { > kmem_cache_free_bulk(NULL, size, p); > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index deb90cf4bffb..c85434668419 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -13,8 +13,10 @@ > #include > > enum stat_item { > + ALLOC_PERCPU_CACHE, /* Allocation from percpu array cache */ > ALLOC_FASTPATH, /* Allocation from cpu slab */ > ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab *= / > + FREE_PERCPU_CACHE, /* Free to percpu array cache */ > FREE_FASTPATH, /* Free to cpu slab */ > FREE_SLOWPATH, /* Freeing not to cpu slab */ > FREE_FROZEN, /* Freeing to frozen slab */ > @@ -66,6 +68,13 @@ struct kmem_cache_cpu { > }; > #endif /* CONFIG_SLUB_TINY */ > > +struct slub_percpu_array { > + spinlock_t lock; Since this is a percpu array, you probably want to avoid a lock here. An idea would be to have some sort of bool accessing; and then doing: preempt_disable(); WRITE_ONCE(accessing, 1); /* doing pcpu array stuff */ WRITE_ONCE(accessing, 0); preempt_enable(); which would avoid the atomic in a fast path while still giving you safety on IRQ paths. Although reclamation gets harder as you stop being able to reclaim these pcpu arrays from other CPUs. --=20 Pedro