Re: [PATCH v8 00/23] SLUB percpu sheaves

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Hellwig <hch@infradead.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
	Suren Baghdasaryan <surenb@google.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Harry Yoo <harry.yoo@oracle.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	rcu@vger.kernel.org, maple-tree@lists.infradead.org,
	Alexei Starovoitov <ast@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Venkat Rao Bagalkote <venkat88@linux.ibm.com>,
	Qianfeng Rong <rongqianfeng@vivo.com>,
	Wei Yang <richard.weiyang@gmail.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	WangYuli <wangyuli@uniontech.com>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>
Subject: Re: [PATCH v8 00/23] SLUB percpu sheaves
Date: Tue, 21 Oct 2025 23:47:06 -0700	[thread overview]
Message-ID: <aPh96jn2NcqXY4IC@infradead.org> (raw)
In-Reply-To: <c0eb4acf-47ff-4e39-8d32-4e5f3857a851@suse.cz>

On Wed, Oct 15, 2025 at 10:32:44AM +0200, Vlastimil Babka wrote:
> Yeah, not a replacement for mempools which have their special semantics.
> 
> > to implement a mempool_alloc_batch to allow grabbing multiple objects
> > out of a mempool safely for something I'm working on.
> 
> I can imagine allocating multiple objects can be difficult to achieve with
> the mempool's guaranteed progress semantics. Maybe the mempool could serve
> prefilled sheaves?

It doesn't look too bad, but I'd be happy for even better versions.

This is wht I have:

---
From 9d25a3ce6cff11b7853381921c53a51e51f27689 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 8 Sep 2025 18:22:12 +0200
Subject: mempool: add mempool_{alloc,free}_bulk

Add a version of the mempool allocator that works for batch allocations
of multiple objects.  Calling mempool_alloc in a loop is not safe
because it could deadlock if multiple threads are attemping such an
allocation at the same time.

As an extra benefit the interface is build so that the same array
can be used for alloc_pages_bulk / release_pages so that at least
for page backed mempools the fast path can use a nice batch optimization.

Still WIP, this needs proper documentation, and mempool also seems to
miss error injection to actually easily test the pool code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/mempool.h |   6 ++
 mm/mempool.c            | 131 ++++++++++++++++++++++++----------------
 2 files changed, 86 insertions(+), 51 deletions(-)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 34941a4b9026..59f14e94596f 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -66,9 +66,15 @@ extern void mempool_destroy(mempool_t *pool);
 extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
 #define mempool_alloc(...)						\
 	alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+int mempool_alloc_bulk_noprof(mempool_t *pool, void **elem,
+		unsigned int count, gfp_t gfp_mask);
+#define mempool_alloc_bulk(...)						\
+	alloc_hooks(mempool_alloc_bulk_noprof(__VA_ARGS__))
 
 extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
 extern void mempool_free(void *element, mempool_t *pool);
+unsigned int mempool_free_bulk(mempool_t *pool, void **elem,
+		unsigned int count);
 
 /*
  * A mempool_alloc_t and mempool_free_t that get the memory from
diff --git a/mm/mempool.c b/mm/mempool.c
index 1c38e873e546..d8884aef2666 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -371,26 +371,13 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
 }
 EXPORT_SYMBOL(mempool_resize);
 
-/**
- * mempool_alloc - allocate an element from a specific memory pool
- * @pool:      pointer to the memory pool which was allocated via
- *             mempool_create().
- * @gfp_mask:  the usual allocation bitmask.
- *
- * this function only sleeps if the alloc_fn() function sleeps or
- * returns NULL. Note that due to preallocation, this function
- * *never* fails when called from process contexts. (it might
- * fail if called from an IRQ context.)
- * Note: using __GFP_ZERO is not supported.
- *
- * Return: pointer to the allocated element or %NULL on error.
- */
-void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
+int mempool_alloc_bulk_noprof(mempool_t *pool, void **elem,
+		unsigned int count, gfp_t gfp_mask)
 {
-	void *element;
 	unsigned long flags;
 	wait_queue_entry_t wait;
 	gfp_t gfp_temp;
+	unsigned int i;
 
 	VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
 	might_alloc(gfp_mask);
@@ -401,15 +388,24 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 
 	gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
 
+	i = 0;
 repeat_alloc:
+	for (; i < count; i++) {
+		if (!elem[i])
+			elem[i] = pool->alloc(gfp_temp, pool->pool_data);
+		if (unlikely(!elem[i]))
+			goto use_pool;
+	}
 
-	element = pool->alloc(gfp_temp, pool->pool_data);
-	if (likely(element != NULL))
-		return element;
+	return 0;
 
+use_pool:
 	spin_lock_irqsave(&pool->lock, flags);
-	if (likely(pool->curr_nr)) {
-		element = remove_element(pool);
+	if (likely(pool->curr_nr >= count - i)) {
+		for (; i < count; i++) {
+			if (!elem[i])
+				elem[i] = remove_element(pool);
+		}
 		spin_unlock_irqrestore(&pool->lock, flags);
 		/* paired with rmb in mempool_free(), read comment there */
 		smp_wmb();
@@ -417,8 +413,9 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 		 * Update the allocation stack trace as this is more useful
 		 * for debugging.
 		 */
-		kmemleak_update_trace(element);
-		return element;
+		for (i = 0; i < count; i++)
+			kmemleak_update_trace(elem[i]);
+		return 0;
 	}
 
 	/*
@@ -434,10 +431,12 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 	/* We must not sleep if !__GFP_DIRECT_RECLAIM */
 	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		spin_unlock_irqrestore(&pool->lock, flags);
-		return NULL;
+		if (i > 0)
+			mempool_free_bulk(pool, elem + i, count - i);
+		return -ENOMEM;
 	}
 
-	/* Let's wait for someone else to return an element to @pool */
+	/* Let's wait for someone else to return elements to @pool */
 	init_wait(&wait);
 	prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
 
@@ -452,6 +451,30 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
 	finish_wait(&pool->wait, &wait);
 	goto repeat_alloc;
 }
+EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
+
+/**
+ * mempool_alloc - allocate an element from a specific memory pool
+ * @pool:      pointer to the memory pool which was allocated via
+ *             mempool_create().
+ * @gfp_mask:  the usual allocation bitmask.
+ *
+ * this function only sleeps if the alloc_fn() function sleeps or
+ * returns NULL. Note that due to preallocation, this function
+ * *never* fails when called from process contexts. (it might
+ * fail if called from an IRQ context.)
+ * Note: using __GFP_ZERO is not supported.
+ *
+ * Return: pointer to the allocated element or %NULL on error.
+ */
+void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
+{
+	void *elem[1] = { };
+
+	if (mempool_alloc_bulk_noprof(pool, elem, 1, gfp_mask) < 0)
+		return NULL;
+	return elem[0];
+}
 EXPORT_SYMBOL(mempool_alloc_noprof);
 
 /**
@@ -491,20 +514,11 @@ void *mempool_alloc_preallocated(mempool_t *pool)
 }
 EXPORT_SYMBOL(mempool_alloc_preallocated);
 
-/**
- * mempool_free - return an element to the pool.
- * @element:   pool element pointer.
- * @pool:      pointer to the memory pool which was allocated via
- *             mempool_create().
- *
- * this function only sleeps if the free_fn() function sleeps.
- */
-void mempool_free(void *element, mempool_t *pool)
+unsigned int mempool_free_bulk(mempool_t *pool, void **elem, unsigned int count)
 {
 	unsigned long flags;
-
-	if (unlikely(element == NULL))
-		return;
+	bool added = false;
+	unsigned int freed = 0;
 
 	/*
 	 * Paired with the wmb in mempool_alloc().  The preceding read is
@@ -541,15 +555,11 @@ void mempool_free(void *element, mempool_t *pool)
 	 */
 	if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
 		spin_lock_irqsave(&pool->lock, flags);
-		if (likely(pool->curr_nr < pool->min_nr)) {
-			add_element(pool, element);
-			spin_unlock_irqrestore(&pool->lock, flags);
-			if (wq_has_sleeper(&pool->wait))
-				wake_up(&pool->wait);
-			return;
+		while (pool->curr_nr < pool->min_nr && freed < count) {
+			add_element(pool, elem[freed++]);
+			added = true;
 		}
 		spin_unlock_irqrestore(&pool->lock, flags);
-	}
 
 	/*
 	 * Handle the min_nr = 0 edge case:
@@ -560,20 +570,39 @@ void mempool_free(void *element, mempool_t *pool)
 	 * allocation of element when both min_nr and curr_nr are 0, and
 	 * any active waiters are properly awakened.
 	 */
-	if (unlikely(pool->min_nr == 0 &&
+	} else if (unlikely(pool->min_nr == 0 &&
 		     READ_ONCE(pool->curr_nr) == 0)) {
 		spin_lock_irqsave(&pool->lock, flags);
 		if (likely(pool->curr_nr == 0)) {
-			add_element(pool, element);
-			spin_unlock_irqrestore(&pool->lock, flags);
-			if (wq_has_sleeper(&pool->wait))
-				wake_up(&pool->wait);
-			return;
+			add_element(pool, elem[freed++]);
+			added = true;
 		}
 		spin_unlock_irqrestore(&pool->lock, flags);
 	}
 
-	pool->free(element, pool->pool_data);
+	if (unlikely(added) && wq_has_sleeper(&pool->wait))
+		wake_up(&pool->wait);
+
+	return freed;
+}
+EXPORT_SYMBOL_GPL(mempool_free_bulk);
+
+/**
+ * mempool_free - return an element to the pool.
+ * @element:   pool element pointer.
+ * @pool:      pointer to the memory pool which was allocated via
+ *             mempool_create().
+ *
+ * this function only sleeps if the free_fn() function sleeps.
+ */
+void mempool_free(void *element, mempool_t *pool)
+{
+	if (likely(element)) {
+		void *elem[1] = { element };
+
+		if (!mempool_free_bulk(pool, elem, 1))
+			pool->free(element, pool->pool_data);
+	}
 }
 EXPORT_SYMBOL(mempool_free);
 
-- 
2.47.3

     prev parent reply	other threads:[~2025-10-22  6:47 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  8:01 Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 01/23] locking/local_lock: Expose dep_map in local_trylock_t Vlastimil Babka
2025-09-24 16:49   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 02/23] slab: simplify init_kmem_cache_nodes() error handling Vlastimil Babka
2025-09-24 16:52   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 03/23] slab: add opt-in caching layer of percpu sheaves Vlastimil Babka
2025-12-02  8:48   ` [PATCH] slub: add barn_get_full_sheaf() and refine empty-main sheaf Hao Li
2025-12-02  8:55     ` Hao Li
2025-12-02  9:00   ` slub: add barn_get_full_sheaf() and refine empty-main sheaf replacement Hao Li
2025-12-03  5:46     ` Harry Yoo
2025-12-03 11:15       ` Hao Li
2025-09-10  8:01 ` [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations Vlastimil Babka
2025-09-12  0:38   ` Sergey Senozhatsky
2025-09-12  7:03     ` Vlastimil Babka
2025-09-17  8:30   ` Harry Yoo
2025-09-17  9:55     ` Vlastimil Babka
2025-09-17 11:32       ` Harry Yoo
2025-09-17 12:05         ` Vlastimil Babka
2025-09-17 13:07           ` Harry Yoo
2025-09-17 13:21             ` Vlastimil Babka
2025-09-17 13:34               ` Harry Yoo
2025-09-17 14:14                 ` Vlastimil Babka
2025-09-18  8:09                   ` Vlastimil Babka
2025-09-19  6:47                     ` Harry Yoo
2025-09-19  7:02                       ` Vlastimil Babka
2025-09-19  8:59                         ` Harry Yoo
2025-09-25  4:35                     ` Suren Baghdasaryan
2025-09-25  8:52                       ` Harry Yoo
2025-09-25 13:38                         ` Suren Baghdasaryan
2025-09-26 10:08                       ` Vlastimil Babka
2025-09-26 15:41                         ` Suren Baghdasaryan
2025-09-17 11:36       ` Paul E. McKenney
2025-09-17 12:13         ` Vlastimil Babka
2025-10-31 21:32   ` Daniel Gomez
2025-11-03  3:17     ` Harry Yoo
2025-11-05 11:25       ` Vlastimil Babka
2025-11-27 14:00         ` Daniel Gomez
2025-11-27 19:29           ` Suren Baghdasaryan
2025-11-28 11:37             ` [PATCH V1] mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction Harry Yoo
2025-11-28 12:22               ` Harry Yoo
2025-11-28 12:38               ` Daniel Gomez
2025-12-02  9:29               ` Jon Hunter
2025-12-02 10:18                 ` Harry Yoo
2025-11-27 11:38     ` [PATCH v8 04/23] slab: add sheaf support for batching kfree_rcu() operations Jon Hunter
2025-11-27 11:50       ` Jon Hunter
2025-11-27 12:33       ` Harry Yoo
2025-11-27 12:48         ` Harry Yoo
2025-11-28  8:57           ` Jon Hunter
2025-12-01  6:55             ` Harry Yoo
2025-11-27 13:18       ` Vlastimil Babka
2025-11-28  8:59         ` Jon Hunter
2025-09-10  8:01 ` [PATCH v8 05/23] slab: sheaf prefilling for guaranteed allocations Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 06/23] slab: determine barn status racily outside of lock Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 07/23] slab: skip percpu sheaves for remote object freeing Vlastimil Babka
2025-09-25 16:14   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 08/23] slab: allow NUMA restricted allocations to use percpu sheaves Vlastimil Babka
2025-09-25 16:27   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 09/23] maple_tree: remove redundant __GFP_NOWARN Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 10/23] tools/testing/vma: clean up stubs in vma_internal.h Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 11/23] maple_tree: Drop bulk insert support Vlastimil Babka
2025-09-25 16:38   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 12/23] tools/testing/vma: Implement vm_refcnt reset Vlastimil Babka
2025-09-25 16:38   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 13/23] tools/testing: Add support for changes to slab for sheaves Vlastimil Babka
2025-09-26 23:28   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 14/23] mm, vma: use percpu sheaves for vm_area_struct cache Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 15/23] maple_tree: use percpu sheaves for maple_node_cache Vlastimil Babka
2025-09-12  2:20   ` Liam R. Howlett
2025-10-16 15:16   ` D, Suneeth
2025-10-16 16:15     ` Vlastimil Babka
2025-10-17 18:26       ` D, Suneeth
2025-09-10  8:01 ` [PATCH v8 16/23] tools/testing: include maple-shim.c in maple.c Vlastimil Babka
2025-09-26 23:45   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 17/23] testing/radix-tree/maple: Hack around kfree_rcu not existing Vlastimil Babka
2025-09-26 23:53   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 18/23] maple_tree: Use kfree_rcu in ma_free_rcu Vlastimil Babka
2025-09-17 11:46   ` Harry Yoo
2025-09-27  0:05     ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 19/23] maple_tree: Replace mt_free_one() with kfree() Vlastimil Babka
2025-09-27  0:06   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 20/23] tools/testing: Add support for prefilled slab sheafs Vlastimil Babka
2025-09-27  0:28   ` Suren Baghdasaryan
2025-09-10  8:01 ` [PATCH v8 21/23] maple_tree: Prefilled sheaf conversion and testing Vlastimil Babka
2025-09-27  1:08   ` Suren Baghdasaryan
2025-09-29  7:30     ` Vlastimil Babka
2025-09-29 16:51       ` Liam R. Howlett
2025-09-10  8:01 ` [PATCH v8 22/23] maple_tree: Add single node allocation support to maple state Vlastimil Babka
2025-09-27  1:17   ` Suren Baghdasaryan
2025-09-29  7:39     ` Vlastimil Babka
2025-09-10  8:01 ` [PATCH v8 23/23] maple_tree: Convert forking to use the sheaf interface Vlastimil Babka
2025-10-07  6:34 ` [PATCH v8 00/23] SLUB percpu sheaves Christoph Hellwig
2025-10-07  8:03   ` Vlastimil Babka
2025-10-08  6:04     ` Christoph Hellwig
2025-10-15  8:32       ` Vlastimil Babka
2025-10-22  6:47         ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPh96jn2NcqXY4IC@infradead.org \
    --to=hch@infradead.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=cl@gentwo.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=maple-tree@lists.infradead.org \
    --cc=pfalcato@suse.de \
    --cc=rcu@vger.kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rongqianfeng@vivo.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=venkat88@linux.ibm.com \
    --cc=wangyuli@uniontech.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox