* [PATCH 01/11] fault-inject: make enum fault_flags available unconditionally
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 02/11] mm: improve kerneldoc comments for __alloc_pages_bulk Christoph Hellwig
` (10 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
This will allow using should_fail_ex from code without having to
make it conditional on CONFIG_FAULT_INJECTION.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/fault-inject.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
index 8c829d28dcf3..58fd14c82270 100644
--- a/include/linux/fault-inject.h
+++ b/include/linux/fault-inject.h
@@ -8,6 +8,10 @@
struct dentry;
struct kmem_cache;
+enum fault_flags {
+ FAULT_NOWARN = 1 << 0,
+};
+
#ifdef CONFIG_FAULT_INJECTION
#include <linux/atomic.h>
@@ -36,10 +40,6 @@ struct fault_attr {
struct dentry *dname;
};
-enum fault_flags {
- FAULT_NOWARN = 1 << 0,
-};
-
#define FAULT_ATTR_INITIALIZER { \
.interval = 1, \
.times = ATOMIC_INIT(1), \
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 02/11] mm: improve kerneldoc comments for __alloc_pages_bulk
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
2025-11-13 8:39 ` [PATCH 01/11] fault-inject: make enum fault_flags available unconditionally Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 03/11] mempool: improve kerneldoc comments Christoph Hellwig
` (9 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Describe the semantincs in more detail, as the filling empty slots in
an array scheme is not quite obvious.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/page_alloc.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..b3d37169a553 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4982,13 +4982,18 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
* @nr_pages: The number of pages desired in the array
* @page_array: Array to store the pages
*
- * This is a batched version of the page allocator that attempts to
- * allocate nr_pages quickly. Pages are added to the page_array.
+ * This is a batched version of the page allocator that attempts to allocate
+ * @nr_pages quickly. Pages are added to @page_array.
*
- * Note that only NULL elements are populated with pages and nr_pages
- * is the maximum number of pages that will be stored in the array.
+ * Note that only the elements in @page_array that were cleared to %NULL on
+ * entry are populated with newly allocated pages. @nr_pages is the maximum
+ * number of pages that will be stored in the array.
*
- * Returns the number of pages in the array.
+ * Returns the number of pages in @page_array, including ones already
+ * allocated on entry. This can be less than the number requested in @nr_pages,
+ * but all empty slots are filled from the beginning. I.e., if all slots in
+ * @page_array were set to %NULL on entry, the slots from 0 to the return value
+ * - 1 will be filled.
*/
unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 03/11] mempool: improve kerneldoc comments
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
2025-11-13 8:39 ` [PATCH 01/11] fault-inject: make enum fault_flags available unconditionally Christoph Hellwig
2025-11-13 8:39 ` [PATCH 02/11] mm: improve kerneldoc comments for __alloc_pages_bulk Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 04/11] mempool: add error injection support Christoph Hellwig
` (8 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Use proper formatting, use full sentences and reduce some verbosity in
function parameter descriptions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 41 ++++++++++++++++++++++-------------------
1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 1c38e873e546..1f4701713203 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -372,18 +372,20 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
EXPORT_SYMBOL(mempool_resize);
/**
- * mempool_alloc - allocate an element from a specific memory pool
- * @pool: pointer to the memory pool which was allocated via
- * mempool_create().
- * @gfp_mask: the usual allocation bitmask.
+ * mempool_alloc - allocate an element from a memory pool
+ * @pool: pointer to the memory pool
+ * @gfp_mask: GFP_* flags. %__GFP_ZERO is not supported.
*
- * this function only sleeps if the alloc_fn() function sleeps or
- * returns NULL. Note that due to preallocation, this function
- * *never* fails when called from process contexts. (it might
- * fail if called from an IRQ context.)
- * Note: using __GFP_ZERO is not supported.
+ * Allocate an element from @pool. This is done by first calling into the
+ * alloc_fn supplied at pool initialization time, and dipping into the reserved
+ * pool when alloc_fn fails to allocate an element.
*
- * Return: pointer to the allocated element or %NULL on error.
+ * This function only sleeps if the alloc_fn callback sleeps, or when waiting
+ * for elements to become available in the pool.
+ *
+ * Return: pointer to the allocated element or %NULL when failing to allocate
+ * an element. Allocation failure can only happen when @gfp_mask does not
+ * include %__GFP_DIRECT_RECLAIM.
*/
void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
@@ -456,11 +458,10 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
/**
* mempool_alloc_preallocated - allocate an element from preallocated elements
- * belonging to a specific memory pool
- * @pool: pointer to the memory pool which was allocated via
- * mempool_create().
+ * belonging to a memory pool
+ * @pool: pointer to the memory pool
*
- * This function is similar to mempool_alloc, but it only attempts allocating
+ * This function is similar to mempool_alloc(), but it only attempts allocating
* an element from the preallocated elements. It does not sleep and immediately
* returns if no preallocated elements are available.
*
@@ -492,12 +493,14 @@ void *mempool_alloc_preallocated(mempool_t *pool)
EXPORT_SYMBOL(mempool_alloc_preallocated);
/**
- * mempool_free - return an element to the pool.
- * @element: pool element pointer.
- * @pool: pointer to the memory pool which was allocated via
- * mempool_create().
+ * mempool_free - return an element to a mempool
+ * @element: pointer to element
+ * @pool: pointer to the memory pool
+ *
+ * Returns @element to @pool if it needs replenishing, else frees it using
+ * the free_fn callback in @pool.
*
- * this function only sleeps if the free_fn() function sleeps.
+ * This function only sleeps if the free_fn callback sleeps.
*/
void mempool_free(void *element, mempool_t *pool)
{
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 04/11] mempool: add error injection support
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (2 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 03/11] mempool: improve kerneldoc comments Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 05/11] mempool: factor out a mempool_adjust_gfp helper Christoph Hellwig
` (7 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Add a call to should_fail_ex that forces mempool to actually allocate
from the pool to stress the mempool implementation when enabled through
debugfs. By default should_fail{,_ex} prints a very verbose stack trace
that clutters the kernel log, slows down execution and triggers the
kernel bug detection in xfstests. Pass FAULT_NOWARN and print a
single-line message notating the caller instead so that full tests
can be run with fault injection.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/mempool.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 1f4701713203..5cf59779cc3d 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -9,7 +9,7 @@
* started by Ingo Molnar, Copyright (C) 2001
* debugging by David Rientjes, Copyright (C) 2015
*/
-
+#include <linux/fault-inject.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/highmem.h>
@@ -20,6 +20,15 @@
#include <linux/writeback.h>
#include "slab.h"
+static DECLARE_FAULT_ATTR(fail_mempool_alloc);
+
+static int __init mempool_faul_inject_init(void)
+{
+ return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
+ NULL, &fail_mempool_alloc));
+}
+late_initcall(mempool_faul_inject_init);
+
#ifdef CONFIG_SLUB_DEBUG_ON
static void poison_error(mempool_t *pool, void *element, size_t size,
size_t byte)
@@ -404,9 +413,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
repeat_alloc:
+ if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
+ pr_info("forcing mempool usage for %pS\n",
+ (void *)_RET_IP_);
+ element = NULL;
+ } else {
+ element = pool->alloc(gfp_temp, pool->pool_data);
+ }
- element = pool->alloc(gfp_temp, pool->pool_data);
- if (likely(element != NULL))
+ if (likely(element))
return element;
spin_lock_irqsave(&pool->lock, flags);
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 05/11] mempool: factor out a mempool_adjust_gfp helper
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (3 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 04/11] mempool: add error injection support Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper Christoph Hellwig
` (6 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Add a helper to better isolate and document the gfp flags adjustments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 5cf59779cc3d..a0718a35c34f 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -380,6 +380,19 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
}
EXPORT_SYMBOL(mempool_resize);
+/*
+ * Adjust the gfp flags for mempool allocations, as we never want to dip into
+ * the global emergency reserves or retry in the page allocator.
+ *
+ * The first pass also doesn't want to go reclaim, but the next passes do, so
+ * return a separate subset for that first iteration.
+ */
+static inline gfp_t mempool_adjust_gfp(gfp_t *gfp_mask)
+{
+ *gfp_mask |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
+ return *gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO);
+}
+
/**
* mempool_alloc - allocate an element from a memory pool
* @pool: pointer to the memory pool
@@ -398,20 +411,14 @@ EXPORT_SYMBOL(mempool_resize);
*/
void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
+ gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
void *element;
unsigned long flags;
wait_queue_entry_t wait;
- gfp_t gfp_temp;
VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
might_alloc(gfp_mask);
- gfp_mask |= __GFP_NOMEMALLOC; /* don't allocate emergency reserves */
- gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */
- gfp_mask |= __GFP_NOWARN; /* failures are OK */
-
- gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
-
repeat_alloc:
if (should_fail_ex(&fail_mempool_alloc, 1, FAULT_NOWARN)) {
pr_info("forcing mempool usage for %pS\n",
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (4 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 05/11] mempool: factor out a mempool_adjust_gfp helper Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-23 3:42 ` Hugh Dickins
2025-11-13 8:39 ` [PATCH 07/11] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
` (5 subsequent siblings)
11 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Add a helper for the mempool_alloc slowpath to better separate it from the
fast path, and also use it to implement mempool_alloc_preallocated which
shares the same logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 121 ++++++++++++++++++++++++---------------------------
1 file changed, 57 insertions(+), 64 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index a0718a35c34f..c28087a3b8a9 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -380,6 +380,50 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
}
EXPORT_SYMBOL(mempool_resize);
+static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
+{
+ unsigned long flags;
+ void *element;
+
+ spin_lock_irqsave(&pool->lock, flags);
+ if (unlikely(!pool->curr_nr))
+ goto fail;
+ element = remove_element(pool);
+ spin_unlock_irqrestore(&pool->lock, flags);
+
+ /* Paired with rmb in mempool_free(), read comment there. */
+ smp_wmb();
+
+ /*
+ * Update the allocation stack trace as this is more useful for
+ * debugging.
+ */
+ kmemleak_update_trace(element);
+ return element;
+
+fail:
+ if (gfp_mask & __GFP_DIRECT_RECLAIM) {
+ DEFINE_WAIT(wait);
+
+ prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
+ spin_unlock_irqrestore(&pool->lock, flags);
+
+ /*
+ * Wait for someone else to return an element to @pool.
+ *
+ * FIXME: this should be io_schedule(). The timeout is there as
+ * a workaround for some DM problems in 2.6.18.
+ */
+ io_schedule_timeout(5 * HZ);
+ finish_wait(&pool->wait, &wait);
+ } else {
+ /* We must not sleep if __GFP_DIRECT_RECLAIM is not set. */
+ spin_unlock_irqrestore(&pool->lock, flags);
+ }
+
+ return NULL;
+}
+
/*
* Adjust the gfp flags for mempool allocations, as we never want to dip into
* the global emergency reserves or retry in the page allocator.
@@ -413,8 +457,6 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
void *element;
- unsigned long flags;
- wait_queue_entry_t wait;
VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
might_alloc(gfp_mask);
@@ -428,53 +470,22 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
element = pool->alloc(gfp_temp, pool->pool_data);
}
- if (likely(element))
- return element;
-
- spin_lock_irqsave(&pool->lock, flags);
- if (likely(pool->curr_nr)) {
- element = remove_element(pool);
- spin_unlock_irqrestore(&pool->lock, flags);
- /* paired with rmb in mempool_free(), read comment there */
- smp_wmb();
+ if (unlikely(!element)) {
/*
- * Update the allocation stack trace as this is more useful
- * for debugging.
+ * Try to allocate an element from the pool.
+ *
+ * The first pass won't have __GFP_DIRECT_RECLAIM and won't
+ * sleep in mempool_alloc_from_pool. Retry the allocation
+ * with all flags set in that case.
*/
- kmemleak_update_trace(element);
- return element;
- }
-
- /*
- * We use gfp mask w/o direct reclaim or IO for the first round. If
- * alloc failed with that and @pool was empty, retry immediately.
- */
- if (gfp_temp != gfp_mask) {
- spin_unlock_irqrestore(&pool->lock, flags);
- gfp_temp = gfp_mask;
- goto repeat_alloc;
- }
-
- /* We must not sleep if !__GFP_DIRECT_RECLAIM */
- if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
- spin_unlock_irqrestore(&pool->lock, flags);
- return NULL;
+ element = mempool_alloc_from_pool(pool, gfp_mask);
+ if (!element && gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
}
- /* Let's wait for someone else to return an element to @pool */
- init_wait(&wait);
- prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
-
- spin_unlock_irqrestore(&pool->lock, flags);
-
- /*
- * FIXME: this should be io_schedule(). The timeout is there as a
- * workaround for some DM problems in 2.6.18.
- */
- io_schedule_timeout(5*HZ);
-
- finish_wait(&pool->wait, &wait);
- goto repeat_alloc;
+ return element;
}
EXPORT_SYMBOL(mempool_alloc_noprof);
@@ -492,25 +503,7 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
*/
void *mempool_alloc_preallocated(mempool_t *pool)
{
- void *element;
- unsigned long flags;
-
- spin_lock_irqsave(&pool->lock, flags);
- if (likely(pool->curr_nr)) {
- element = remove_element(pool);
- spin_unlock_irqrestore(&pool->lock, flags);
- /* paired with rmb in mempool_free(), read comment there */
- smp_wmb();
- /*
- * Update the allocation stack trace as this is more useful
- * for debugging.
- */
- kmemleak_update_trace(element);
- return element;
- }
- spin_unlock_irqrestore(&pool->lock, flags);
-
- return NULL;
+ return mempool_alloc_from_pool(pool, GFP_NOWAIT);
}
EXPORT_SYMBOL(mempool_alloc_preallocated);
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-13 8:39 ` [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper Christoph Hellwig
@ 2025-11-23 3:42 ` Hugh Dickins
2025-11-23 11:34 ` Vlastimil Babka
2025-11-24 6:14 ` Christoph Hellwig
0 siblings, 2 replies; 23+ messages in thread
From: Hugh Dickins @ 2025-11-23 3:42 UTC (permalink / raw)
To: Christoph Hellwig, Vlastimil Babka
Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
Harry Yoo, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Zi Yan, Eric Biggers, linux-mm, linux-kernel
On Thu, 13 Nov 2025, Christoph Hellwig wrote:
> Add a helper for the mempool_alloc slowpath to better separate it from the
> fast path, and also use it to implement mempool_alloc_preallocated which
> shares the same logic.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
...
> @@ -413,8 +457,6 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> {
> gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
> void *element;
> - unsigned long flags;
> - wait_queue_entry_t wait;
>
> VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
> might_alloc(gfp_mask);
> @@ -428,53 +470,22 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> element = pool->alloc(gfp_temp, pool->pool_data);
> }
>
> - if (likely(element))
> - return element;
> -
> - spin_lock_irqsave(&pool->lock, flags);
> - if (likely(pool->curr_nr)) {
> - element = remove_element(pool);
> - spin_unlock_irqrestore(&pool->lock, flags);
> - /* paired with rmb in mempool_free(), read comment there */
> - smp_wmb();
> + if (unlikely(!element)) {
> /*
> - * Update the allocation stack trace as this is more useful
> - * for debugging.
> + * Try to allocate an element from the pool.
> + *
> + * The first pass won't have __GFP_DIRECT_RECLAIM and won't
> + * sleep in mempool_alloc_from_pool. Retry the allocation
> + * with all flags set in that case.
> */
> - kmemleak_update_trace(element);
> - return element;
> - }
> -
> - /*
> - * We use gfp mask w/o direct reclaim or IO for the first round. If
> - * alloc failed with that and @pool was empty, retry immediately.
> - */
> - if (gfp_temp != gfp_mask) {
> - spin_unlock_irqrestore(&pool->lock, flags);
> - gfp_temp = gfp_mask;
> - goto repeat_alloc;
> - }
> -
> - /* We must not sleep if !__GFP_DIRECT_RECLAIM */
> - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
> - spin_unlock_irqrestore(&pool->lock, flags);
> - return NULL;
> + element = mempool_alloc_from_pool(pool, gfp_mask);
> + if (!element && gfp_temp != gfp_mask) {
No, that is wrong, it breaks the mempool promise: linux-next oopses
in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
to return a good bio.
The refactoring makes it hard to see, but the old version always used
to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
whereas here it only does so the first time, when gfp_temp != gfp_mask.
After bisecting to here, I changed that "gfp_temp != gfp_mask" to
"(gfp & __GFP_DIRECT_RECLAIM)", and it worked again. But other patches
have come in on top, so below is a patch to the final mm/mempool.c...
> + gfp_temp = gfp_mask;
> + goto repeat_alloc;
> + }
> }
>
> - /* Let's wait for someone else to return an element to @pool */
> - init_wait(&wait);
> - prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
> -
> - spin_unlock_irqrestore(&pool->lock, flags);
> -
> - /*
> - * FIXME: this should be io_schedule(). The timeout is there as a
> - * workaround for some DM problems in 2.6.18.
> - */
> - io_schedule_timeout(5*HZ);
> -
> - finish_wait(&pool->wait, &wait);
> - goto repeat_alloc;
> + return element;
> }
> EXPORT_SYMBOL(mempool_alloc_noprof);
...
[PATCH] mempool: fix NULL from mempool_alloc_noprof()
mempool_alloc_noprof() with __GFP_DIRECT_RECLAIM used to loop until it
had allocated an element, but recently regressed to returning NULL when
pool->alloc and mempool_alloc_from_pool() have both failed twice, causing
oops in __swap_writepage() (and presumably others relying on mempool).
Fixes: 1d091d2c5bf3 ("mempool: factor out a mempool_alloc_from_pool helper")
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/mempool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 1558601600ba..dc9f2a1dc35f 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -576,7 +576,7 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
* with all flags set in that case.
*/
if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
- gfp_temp != gfp_mask) {
+ (gfp_mask & __GFP_DIRECT_RECLAIM)) {
gfp_temp = gfp_mask;
goto repeat_alloc;
}
--
2.51.0
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 3:42 ` Hugh Dickins
@ 2025-11-23 11:34 ` Vlastimil Babka
2025-11-23 17:49 ` Hugh Dickins
2025-11-24 6:15 ` Christoph Hellwig
2025-11-24 6:14 ` Christoph Hellwig
1 sibling, 2 replies; 23+ messages in thread
From: Vlastimil Babka @ 2025-11-23 11:34 UTC (permalink / raw)
To: Hugh Dickins, Christoph Hellwig
Cc: Andrew Morton, Christoph Lameter, David Rientjes, Roman Gushchin,
Harry Yoo, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Zi Yan, Eric Biggers, linux-mm, linux-kernel
On 11/23/25 04:42, Hugh Dickins wrote:
> On Thu, 13 Nov 2025, Christoph Hellwig wrote:
>
>> Add a helper for the mempool_alloc slowpath to better separate it from the
>> fast path, and also use it to implement mempool_alloc_preallocated which
>> shares the same logic.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
> ...
>> @@ -413,8 +457,6 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>> {
>> gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
>> void *element;
>> - unsigned long flags;
>> - wait_queue_entry_t wait;
>>
>> VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
>> might_alloc(gfp_mask);
>> @@ -428,53 +470,22 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>> element = pool->alloc(gfp_temp, pool->pool_data);
>> }
>>
>> - if (likely(element))
>> - return element;
>> -
>> - spin_lock_irqsave(&pool->lock, flags);
>> - if (likely(pool->curr_nr)) {
>> - element = remove_element(pool);
>> - spin_unlock_irqrestore(&pool->lock, flags);
>> - /* paired with rmb in mempool_free(), read comment there */
>> - smp_wmb();
>> + if (unlikely(!element)) {
>> /*
>> - * Update the allocation stack trace as this is more useful
>> - * for debugging.
>> + * Try to allocate an element from the pool.
>> + *
>> + * The first pass won't have __GFP_DIRECT_RECLAIM and won't
>> + * sleep in mempool_alloc_from_pool. Retry the allocation
>> + * with all flags set in that case.
>> */
>> - kmemleak_update_trace(element);
>> - return element;
>> - }
>> -
>> - /*
>> - * We use gfp mask w/o direct reclaim or IO for the first round. If
>> - * alloc failed with that and @pool was empty, retry immediately.
>> - */
>> - if (gfp_temp != gfp_mask) {
>> - spin_unlock_irqrestore(&pool->lock, flags);
>> - gfp_temp = gfp_mask;
>> - goto repeat_alloc;
>> - }
>> -
>> - /* We must not sleep if !__GFP_DIRECT_RECLAIM */
>> - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>> - spin_unlock_irqrestore(&pool->lock, flags);
>> - return NULL;
>> + element = mempool_alloc_from_pool(pool, gfp_mask);
>> + if (!element && gfp_temp != gfp_mask) {
>
> No, that is wrong, it breaks the mempool promise: linux-next oopses
> in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
> to return a good bio.
>
> The refactoring makes it hard to see, but the old version always used
> to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
> whereas here it only does so the first time, when gfp_temp != gfp_mask.
>
> After bisecting to here, I changed that "gfp_temp != gfp_mask" to
> "(gfp & __GFP_DIRECT_RECLAIM)", and it worked again. But other patches
> have come in on top, so below is a patch to the final mm/mempool.c...
Thanks a lot Hugh and sorry for the trouble.
Looking closer I noticed we're also not doing as the comment says about
passing the limited flags to mempool_alloc_from_pool() on the first attempt.
I would also rather keep distinguishing the "retry with full flags" and
"retry because we can sleep" for now, in case there are callers that can't
sleep, but can benefit from memalloc context. It's hypothetical and I haven't
made an audit, but we can clean that up deliberately later and not as part
of a refactor patch.
So I'd amend this patch with:
diff --git a/mm/mempool.c b/mm/mempool.c
index c28087a3b8a9..224a4dead239 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -478,10 +478,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
* sleep in mempool_alloc_from_pool. Retry the allocation
* with all flags set in that case.
*/
- element = mempool_alloc_from_pool(pool, gfp_mask);
- if (!element && gfp_temp != gfp_mask) {
- gfp_temp = gfp_mask;
- goto repeat_alloc;
+ element = mempool_alloc_from_pool(pool, gfp_temp);
+ if (!element) {
+ if (gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
+ if (gfp_mask & __GFP_DIRECT_RECLAIM) {
+ goto repeat_alloc;
+ }
}
}
With the followup commit fixed up during rebase, the diff of the whole
branch before/after is:
diff --git a/mm/mempool.c b/mm/mempool.c
index 5953fe801395..bb596cac57ff 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -555,10 +555,14 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
* sleep in mempool_alloc_from_pool. Retry the allocation
* with all flags set in that case.
*/
- if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
- gfp_temp != gfp_mask) {
- gfp_temp = gfp_mask;
- goto repeat_alloc;
+ if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_temp)) {
+ if (gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
+ if (gfp_mask & __GFP_DIRECT_RECLAIM) {
+ goto repeat_alloc;
+ }
}
}
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 11:34 ` Vlastimil Babka
@ 2025-11-23 17:49 ` Hugh Dickins
2025-11-23 21:22 ` Vlastimil Babka
2025-11-24 6:17 ` Christoph Hellwig
2025-11-24 6:15 ` Christoph Hellwig
1 sibling, 2 replies; 23+ messages in thread
From: Hugh Dickins @ 2025-11-23 17:49 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Hugh Dickins, Christoph Hellwig, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
On Sun, 23 Nov 2025, Vlastimil Babka wrote:
> On 11/23/25 04:42, Hugh Dickins wrote:
> > On Thu, 13 Nov 2025, Christoph Hellwig wrote:
> >
> >> Add a helper for the mempool_alloc slowpath to better separate it from the
> >> fast path, and also use it to implement mempool_alloc_preallocated which
> >> shares the same logic.
> >>
> >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >> ---
> > ...
> >> @@ -413,8 +457,6 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> >> {
> >> gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
> >> void *element;
> >> - unsigned long flags;
> >> - wait_queue_entry_t wait;
> >>
> >> VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
> >> might_alloc(gfp_mask);
> >> @@ -428,53 +470,22 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> >> element = pool->alloc(gfp_temp, pool->pool_data);
> >> }
> >>
> >> - if (likely(element))
> >> - return element;
> >> -
> >> - spin_lock_irqsave(&pool->lock, flags);
> >> - if (likely(pool->curr_nr)) {
> >> - element = remove_element(pool);
> >> - spin_unlock_irqrestore(&pool->lock, flags);
> >> - /* paired with rmb in mempool_free(), read comment there */
> >> - smp_wmb();
> >> + if (unlikely(!element)) {
> >> /*
> >> - * Update the allocation stack trace as this is more useful
> >> - * for debugging.
> >> + * Try to allocate an element from the pool.
> >> + *
> >> + * The first pass won't have __GFP_DIRECT_RECLAIM and won't
> >> + * sleep in mempool_alloc_from_pool. Retry the allocation
> >> + * with all flags set in that case.
> >> */
> >> - kmemleak_update_trace(element);
> >> - return element;
> >> - }
> >> -
> >> - /*
> >> - * We use gfp mask w/o direct reclaim or IO for the first round. If
> >> - * alloc failed with that and @pool was empty, retry immediately.
> >> - */
> >> - if (gfp_temp != gfp_mask) {
> >> - spin_unlock_irqrestore(&pool->lock, flags);
> >> - gfp_temp = gfp_mask;
> >> - goto repeat_alloc;
> >> - }
> >> -
> >> - /* We must not sleep if !__GFP_DIRECT_RECLAIM */
> >> - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
> >> - spin_unlock_irqrestore(&pool->lock, flags);
> >> - return NULL;
> >> + element = mempool_alloc_from_pool(pool, gfp_mask);
> >> + if (!element && gfp_temp != gfp_mask) {
> >
> > No, that is wrong, it breaks the mempool promise: linux-next oopses
> > in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
> > to return a good bio.
> >
> > The refactoring makes it hard to see, but the old version always used
> > to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
> > whereas here it only does so the first time, when gfp_temp != gfp_mask.
> >
> > After bisecting to here, I changed that "gfp_temp != gfp_mask" to
> > "(gfp & __GFP_DIRECT_RECLAIM)", and it worked again. But other patches
> > have come in on top, so below is a patch to the final mm/mempool.c...
>
> Thanks a lot Hugh and sorry for the trouble.
>
> Looking closer I noticed we're also not doing as the comment says about
> passing the limited flags to mempool_alloc_from_pool() on the first attempt.
>
> I would also rather keep distinguishing the "retry with full flags" and
> "retry because we can sleep" for now, in case there are callers that can't
> sleep, but can benefit from memalloc context. It's hypothetical and I haven't
> made an audit, but we can clean that up deliberately later and not as part
> of a refactor patch.
>
> So I'd amend this patch with:
>
> diff --git a/mm/mempool.c b/mm/mempool.c
> index c28087a3b8a9..224a4dead239 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -478,10 +478,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> * sleep in mempool_alloc_from_pool. Retry the allocation
> * with all flags set in that case.
> */
> - element = mempool_alloc_from_pool(pool, gfp_mask);
> - if (!element && gfp_temp != gfp_mask) {
> - gfp_temp = gfp_mask;
> - goto repeat_alloc;
> + element = mempool_alloc_from_pool(pool, gfp_temp);
Haha, no.
I had got excited when I too thought that should be gfp_temp not gfp_mask,
but (a) it didn't fix the bug and (b) I then came to see that gfp_mask
there is correct.
It's looking ahead to what will be tried next: mempool_alloc_from_pool()
is trying to alloc from mempool, and then, if it will be allowed to wait,
waiting a suitable length of time, before letting the caller try again.
If you substitute gfp_temp there, then it just does the same pool->alloc,
alloc from mempool sequence twice in a row with no delay between (because
gfp_temp does not at first allow waiting).
I agree it's confusing, and calls into question whether that was a good
refactoring. Maybe there's a form of words for the comment above which
will make it clearer. Perhaps mempool_alloc_from_pool() is better split
into two functions. Maybe gfp_temp could be named better. Etc etc: I
preferred not to mess around further with how Christoph did it, not now.
(I also wondered if it's right to pool->alloc before alloc from mempool
after the wait was for a mempool element to be freed: but that's how it
was before, and I expect it's been proved in the past that a strict
pool->alloc before alloc from mempool is the best strategy.)
> + if (!element) {
> + if (gfp_temp != gfp_mask) {
> + gfp_temp = gfp_mask;
> + goto repeat_alloc;
> + }
> + if (gfp_mask & __GFP_DIRECT_RECLAIM) {
> + goto repeat_alloc;
> + }
I still prefer what I posted.
Hugh
> }
> }
>
>
> With the followup commit fixed up during rebase, the diff of the whole
> branch before/after is:
>
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 5953fe801395..bb596cac57ff 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -555,10 +555,14 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
> * sleep in mempool_alloc_from_pool. Retry the allocation
> * with all flags set in that case.
> */
> - if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
> - gfp_temp != gfp_mask) {
> - gfp_temp = gfp_mask;
> - goto repeat_alloc;
> + if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_temp)) {
> + if (gfp_temp != gfp_mask) {
> + gfp_temp = gfp_mask;
> + goto repeat_alloc;
> + }
> + if (gfp_mask & __GFP_DIRECT_RECLAIM) {
> + goto repeat_alloc;
> + }
> }
> }
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 17:49 ` Hugh Dickins
@ 2025-11-23 21:22 ` Vlastimil Babka
2025-11-23 23:04 ` Hugh Dickins
2025-11-24 6:17 ` Christoph Hellwig
1 sibling, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2025-11-23 21:22 UTC (permalink / raw)
To: Hugh Dickins
Cc: Christoph Hellwig, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Harry Yoo, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Zi Yan, Eric Biggers, linux-mm,
linux-kernel
On 11/23/25 18:49, Hugh Dickins wrote:
> On Sun, 23 Nov 2025, Vlastimil Babka wrote:
>> On 11/23/25 04:42, Hugh Dickins wrote:
>> > On Thu, 13 Nov 2025, Christoph Hellwig wrote:
>> >
>> >
>> > No, that is wrong, it breaks the mempool promise: linux-next oopses
>> > in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
>> > to return a good bio.
>> >
>> > The refactoring makes it hard to see, but the old version always used
>> > to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
>> > whereas here it only does so the first time, when gfp_temp != gfp_mask.
>> >
>> > After bisecting to here, I changed that "gfp_temp != gfp_mask" to
>> > "(gfp & __GFP_DIRECT_RECLAIM)", and it worked again. But other patches
>> > have come in on top, so below is a patch to the final mm/mempool.c...
>>
>> Thanks a lot Hugh and sorry for the trouble.
>>
>> Looking closer I noticed we're also not doing as the comment says about
>> passing the limited flags to mempool_alloc_from_pool() on the first attempt.
>>
>> I would also rather keep distinguishing the "retry with full flags" and
>> "retry because we can sleep" for now, in case there are callers that can't
>> sleep, but can benefit from memalloc context. It's hypothetical and I haven't
>> made an audit, but we can clean that up deliberately later and not as part
>> of a refactor patch.
>>
>> So I'd amend this patch with:
>>
>> diff --git a/mm/mempool.c b/mm/mempool.c
>> index c28087a3b8a9..224a4dead239 100644
>> --- a/mm/mempool.c
>> +++ b/mm/mempool.c
>> @@ -478,10 +478,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>> * sleep in mempool_alloc_from_pool. Retry the allocation
>> * with all flags set in that case.
>> */
>> - element = mempool_alloc_from_pool(pool, gfp_mask);
>> - if (!element && gfp_temp != gfp_mask) {
>> - gfp_temp = gfp_mask;
>> - goto repeat_alloc;
>> + element = mempool_alloc_from_pool(pool, gfp_temp);
>
> Haha, no.
>
> I had got excited when I too thought that should be gfp_temp not gfp_mask,
> but (a) it didn't fix the bug and (b) I then came to see that gfp_mask
> there is correct.
>
> It's looking ahead to what will be tried next: mempool_alloc_from_pool()
> is trying to alloc from mempool, and then, if it will be allowed to wait,
> waiting a suitable length of time, before letting the caller try again.
> If you substitute gfp_temp there, then it just does the same pool->alloc,
> alloc from mempool sequence twice in a row with no delay between (because
> gfp_temp does not at first allow waiting).
But it's not exactly the same sequence, because in the second pass the
pool->alloc() has the original gfp flags restored (by gfp_temp = gfp_mask)
so it can now e.g. reclaim there. It's preferred to try that first before
waiting on a mempool refill. AFAIU the idea is to try succeeding quickly if
objects to allocate are either cheaply available to alloc() or in the pool,
and if that fails, go for the more expensive allocations or waiting for refill.
AFAICS both the code before Christoph's changes, and after the changes with
my fixup do this:
1. pool->alloc(limited gfp)
2. allocate from pool, but don't wait if there's nothing
3. pool->alloc(full gfp)
4. allocate from pool, wait if there's nothing
5. goto 3
Am I missing something?
> I agree it's confusing, and calls into question whether that was a good
> refactoring. Maybe there's a form of words for the comment above which
I'd say it was intended to be good, apart from the bugs.
> will make it clearer. Perhaps mempool_alloc_from_pool() is better split
> into two functions. Maybe gfp_temp could be named better. Etc etc: I
> preferred not to mess around further with how Christoph did it, not now.
>
> (I also wondered if it's right to pool->alloc before alloc from mempool
> after the wait was for a mempool element to be freed: but that's how it
> was before, and I expect it's been proved in the past that a strict
> pool->alloc before alloc from mempool is the best strategy.)
I'd think we better do it that way, otherwise se might be recovering more
slowly from a temporary memory shortage that cause a number of tasks to wait
in the mempool, which would then have to wait for mempool refills even
though new objects might be available to allocate thanks to the shortage
resolved.
>> + if (!element) {
>> + if (gfp_temp != gfp_mask) {
>> + gfp_temp = gfp_mask;
>> + goto repeat_alloc;
>> + }
>> + if (gfp_mask & __GFP_DIRECT_RECLAIM) {
>> + goto repeat_alloc;
>> + }
>
> I still prefer what I posted.
>
> Hugh
>
>> }
>> }
>>
>>
>> With the followup commit fixed up during rebase, the diff of the whole
>> branch before/after is:
>>
>> diff --git a/mm/mempool.c b/mm/mempool.c
>> index 5953fe801395..bb596cac57ff 100644
>> --- a/mm/mempool.c
>> +++ b/mm/mempool.c
>> @@ -555,10 +555,14 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
>> * sleep in mempool_alloc_from_pool. Retry the allocation
>> * with all flags set in that case.
>> */
>> - if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
>> - gfp_temp != gfp_mask) {
>> - gfp_temp = gfp_mask;
>> - goto repeat_alloc;
>> + if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_temp)) {
>> + if (gfp_temp != gfp_mask) {
>> + gfp_temp = gfp_mask;
>> + goto repeat_alloc;
>> + }
>> + if (gfp_mask & __GFP_DIRECT_RECLAIM) {
>> + goto repeat_alloc;
>> + }
>> }
>> }
>>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 21:22 ` Vlastimil Babka
@ 2025-11-23 23:04 ` Hugh Dickins
2025-11-25 11:32 ` Vlastimil Babka
0 siblings, 1 reply; 23+ messages in thread
From: Hugh Dickins @ 2025-11-23 23:04 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Hugh Dickins, Christoph Hellwig, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
On Sun, 23 Nov 2025, Vlastimil Babka wrote:
> On 11/23/25 18:49, Hugh Dickins wrote:
> > On Sun, 23 Nov 2025, Vlastimil Babka wrote:
> >> On 11/23/25 04:42, Hugh Dickins wrote:
> >> > On Thu, 13 Nov 2025, Christoph Hellwig wrote:
> >> >
> >> >
> >> > No, that is wrong, it breaks the mempool promise: linux-next oopses
> >> > in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
> >> > to return a good bio.
> >> >
> >> > The refactoring makes it hard to see, but the old version always used
> >> > to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
> >> > whereas here it only does so the first time, when gfp_temp != gfp_mask.
> >> >
> >> > After bisecting to here, I changed that "gfp_temp != gfp_mask" to
> >> > "(gfp & __GFP_DIRECT_RECLAIM)", and it worked again. But other patches
> >> > have come in on top, so below is a patch to the final mm/mempool.c...
> >>
> >> Thanks a lot Hugh and sorry for the trouble.
> >>
> >> Looking closer I noticed we're also not doing as the comment says about
> >> passing the limited flags to mempool_alloc_from_pool() on the first attempt.
> >>
> >> I would also rather keep distinguishing the "retry with full flags" and
> >> "retry because we can sleep" for now, in case there are callers that can't
> >> sleep, but can benefit from memalloc context. It's hypothetical and I haven't
> >> made an audit, but we can clean that up deliberately later and not as part
> >> of a refactor patch.
> >>
> >> So I'd amend this patch with:
> >>
> >> diff --git a/mm/mempool.c b/mm/mempool.c
> >> index c28087a3b8a9..224a4dead239 100644
> >> --- a/mm/mempool.c
> >> +++ b/mm/mempool.c
> >> @@ -478,10 +478,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
> >> * sleep in mempool_alloc_from_pool. Retry the allocation
> >> * with all flags set in that case.
> >> */
> >> - element = mempool_alloc_from_pool(pool, gfp_mask);
> >> - if (!element && gfp_temp != gfp_mask) {
> >> - gfp_temp = gfp_mask;
> >> - goto repeat_alloc;
> >> + element = mempool_alloc_from_pool(pool, gfp_temp);
> >
> > Haha, no.
> >
> > I had got excited when I too thought that should be gfp_temp not gfp_mask,
> > but (a) it didn't fix the bug and (b) I then came to see that gfp_mask
> > there is correct.
> >
> > It's looking ahead to what will be tried next: mempool_alloc_from_pool()
> > is trying to alloc from mempool, and then, if it will be allowed to wait,
> > waiting a suitable length of time, before letting the caller try again.
> > If you substitute gfp_temp there, then it just does the same pool->alloc,
> > alloc from mempool sequence twice in a row with no delay between (because
> > gfp_temp does not at first allow waiting).
>
> But it's not exactly the same sequence, because in the second pass the
> pool->alloc() has the original gfp flags restored (by gfp_temp = gfp_mask)
> so it can now e.g. reclaim there. It's preferred to try that first before
> waiting on a mempool refill. AFAIU the idea is to try succeeding quickly if
> objects to allocate are either cheaply available to alloc() or in the pool,
> and if that fails, go for the more expensive allocations or waiting for refill.
>
> AFAICS both the code before Christoph's changes, and after the changes with
> my fixup do this:
>
> 1. pool->alloc(limited gfp)
> 2. allocate from pool, but don't wait if there's nothing
> 3. pool->alloc(full gfp)
> 4. allocate from pool, wait if there's nothing
> 5. goto 3
>
> Am I missing something?
You are completely right: it was me who was missing that the second
pool->alloc is with the full gfp, so not an identical repeat of the
the first; and so it is correct not to wait after the first attempt,
so you are right to call with gfp_temp instead of gfp_mask there.
(And this also addresses my slight concern, of whether it was
appropriate to be doing a pool->alloc after waiting for a mempool
free: your way, there is no such wait, at least not that most
important first->second time: the wait would instead be inside
the pool->alloc probably, reclaiming.)
Yes, your fixup is better in all ways than mine, please go ahead.
Thanks,
Hugh
>
> > I agree it's confusing, and calls into question whether that was a good
> > refactoring. Maybe there's a form of words for the comment above which
>
> I'd say it was intended to be good, apart from the bugs.
>
> > will make it clearer. Perhaps mempool_alloc_from_pool() is better split
> > into two functions. Maybe gfp_temp could be named better. Etc etc: I
> > preferred not to mess around further with how Christoph did it, not now.
> >
> > (I also wondered if it's right to pool->alloc before alloc from mempool
> > after the wait was for a mempool element to be freed: but that's how it
> > was before, and I expect it's been proved in the past that a strict
> > pool->alloc before alloc from mempool is the best strategy.)
>
> I'd think we better do it that way, otherwise se might be recovering more
> slowly from a temporary memory shortage that cause a number of tasks to wait
> in the mempool, which would then have to wait for mempool refills even
> though new objects might be available to allocate thanks to the shortage
> resolved.
>
> >> + if (!element) {
> >> + if (gfp_temp != gfp_mask) {
> >> + gfp_temp = gfp_mask;
> >> + goto repeat_alloc;
> >> + }
> >> + if (gfp_mask & __GFP_DIRECT_RECLAIM) {
> >> + goto repeat_alloc;
> >> + }
> >
> > I still prefer what I posted.
> >
> > Hugh
> >
> >> }
> >> }
> >>
> >>
> >> With the followup commit fixed up during rebase, the diff of the whole
> >> branch before/after is:
> >>
> >> diff --git a/mm/mempool.c b/mm/mempool.c
> >> index 5953fe801395..bb596cac57ff 100644
> >> --- a/mm/mempool.c
> >> +++ b/mm/mempool.c
> >> @@ -555,10 +555,14 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
> >> * sleep in mempool_alloc_from_pool. Retry the allocation
> >> * with all flags set in that case.
> >> */
> >> - if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
> >> - gfp_temp != gfp_mask) {
> >> - gfp_temp = gfp_mask;
> >> - goto repeat_alloc;
> >> + if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_temp)) {
> >> + if (gfp_temp != gfp_mask) {
> >> + gfp_temp = gfp_mask;
> >> + goto repeat_alloc;
> >> + }
> >> + if (gfp_mask & __GFP_DIRECT_RECLAIM) {
> >> + goto repeat_alloc;
> >> + }
> >> }
> >> }
> >>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 23:04 ` Hugh Dickins
@ 2025-11-25 11:32 ` Vlastimil Babka
0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2025-11-25 11:32 UTC (permalink / raw)
To: Hugh Dickins
Cc: Christoph Hellwig, Andrew Morton, Christoph Lameter,
David Rientjes, Roman Gushchin, Harry Yoo, Suren Baghdasaryan,
Michal Hocko, Brendan Jackman, Zi Yan, Eric Biggers, linux-mm,
linux-kernel
On 11/24/25 00:04, Hugh Dickins wrote:
>
> You are completely right: it was me who was missing that the second
> pool->alloc is with the full gfp, so not an identical repeat of the
> the first; and so it is correct not to wait after the first attempt,
> so you are right to call with gfp_temp instead of gfp_mask there.
>
> (And this also addresses my slight concern, of whether it was
> appropriate to be doing a pool->alloc after waiting for a mempool
> free: your way, there is no such wait, at least not that most
> important first->second time: the wait would instead be inside
> the pool->alloc probably, reclaiming.)
>
> Yes, your fixup is better in all ways than mine, please go ahead.
>
> Thanks,
> Hugh
Thanks, Hugh!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 17:49 ` Hugh Dickins
2025-11-23 21:22 ` Vlastimil Babka
@ 2025-11-24 6:17 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-24 6:17 UTC (permalink / raw)
To: Hugh Dickins
Cc: Vlastimil Babka, Christoph Hellwig, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
On Sun, Nov 23, 2025 at 09:49:08AM -0800, Hugh Dickins wrote:
> I agree it's confusing, and calls into question whether that was a good
> refactoring.
What part is confusing?
> (I also wondered if it's right to pool->alloc before alloc from mempool
> after the wait was for a mempool element to be freed: but that's how it
> was before, and I expect it's been proved in the past that a strict
> pool->alloc before alloc from mempool is the best strategy.)
In general given how good the allocator is at satisfying small request
you might get something from the general pool quicker than the
reserved lists, especially if the io_schedule timeout hits.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 11:34 ` Vlastimil Babka
2025-11-23 17:49 ` Hugh Dickins
@ 2025-11-24 6:15 ` Christoph Hellwig
2025-11-25 11:34 ` Vlastimil Babka
1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-24 6:15 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Hugh Dickins, Christoph Hellwig, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
On Sun, Nov 23, 2025 at 12:34:40PM +0100, Vlastimil Babka wrote:
> Looking closer I noticed we're also not doing as the comment says about
> passing the limited flags to mempool_alloc_from_pool() on the first attempt.
It does, that's gfp_temp.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-24 6:15 ` Christoph Hellwig
@ 2025-11-25 11:34 ` Vlastimil Babka
0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2025-11-25 11:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Hugh Dickins, Andrew Morton, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Zi Yan, Eric Biggers, linux-mm, linux-kernel
On 11/24/25 07:15, Christoph Hellwig wrote:
> On Sun, Nov 23, 2025 at 12:34:40PM +0100, Vlastimil Babka wrote:
>> Looking closer I noticed we're also not doing as the comment says about
>> passing the limited flags to mempool_alloc_from_pool() on the first attempt.
>
> It does, that's gfp_temp.
It's gfp_temp now after my fixup, but you were passing gfp_mask by mistake.
Thanks for the review!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper
2025-11-23 3:42 ` Hugh Dickins
2025-11-23 11:34 ` Vlastimil Babka
@ 2025-11-24 6:14 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-24 6:14 UTC (permalink / raw)
To: Hugh Dickins
Cc: Christoph Hellwig, Vlastimil Babka, Andrew Morton,
Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
This looks good, thanks:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 07/11] mempool: add mempool_{alloc,free}_bulk
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (5 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool helper Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 08/11] mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool Christoph Hellwig
` (4 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Add a version of the mempool allocator that works for batch allocations
of multiple objects. Calling mempool_alloc in a loop is not safe because
it could deadlock if multiple threads are performing such an allocation
at the same time.
As an extra benefit the interface is build so that the same array can be
used for alloc_pages_bulk / release_pages so that at least for page
backed mempools the fast path can use a nice batch optimization.
Note that mempool_alloc_bulk does not take a gfp_mask argument as it
must always be able to sleep and doesn't support any non-trivial
modifiers. NOFO or NOIO constrainst must be set through the scoped API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/mempool.h | 6 ++
mm/mempool.c | 178 ++++++++++++++++++++++++++++++----------
2 files changed, 142 insertions(+), 42 deletions(-)
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 34941a4b9026..e914fec0e119 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -66,9 +66,15 @@ extern void mempool_destroy(mempool_t *pool);
extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
#define mempool_alloc(...) \
alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
+ unsigned int count, unsigned int allocated);
+#define mempool_alloc_bulk(...) \
+ alloc_hooks(mempool_alloc_bulk_noprof(__VA_ARGS__))
extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
extern void mempool_free(void *element, mempool_t *pool);
+unsigned int mempool_free_bulk(struct mempool *pool, void **elem,
+ unsigned int count);
/*
* A mempool_alloc_t and mempool_free_t that get the memory from
diff --git a/mm/mempool.c b/mm/mempool.c
index c28087a3b8a9..88b9a8476d31 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -21,11 +21,21 @@
#include "slab.h"
static DECLARE_FAULT_ATTR(fail_mempool_alloc);
+static DECLARE_FAULT_ATTR(fail_mempool_alloc_bulk);
static int __init mempool_faul_inject_init(void)
{
- return PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
+ int error;
+
+ error = PTR_ERR_OR_ZERO(fault_create_debugfs_attr("fail_mempool_alloc",
NULL, &fail_mempool_alloc));
+ if (error)
+ return error;
+
+ /* booting will fail on error return here, don't bother to cleanup */
+ return PTR_ERR_OR_ZERO(
+ fault_create_debugfs_attr("fail_mempool_alloc_bulk", NULL,
+ &fail_mempool_alloc_bulk));
}
late_initcall(mempool_faul_inject_init);
@@ -380,15 +390,22 @@ int mempool_resize(mempool_t *pool, int new_min_nr)
}
EXPORT_SYMBOL(mempool_resize);
-static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
+static unsigned int mempool_alloc_from_pool(struct mempool *pool, void **elems,
+ unsigned int count, unsigned int allocated,
+ gfp_t gfp_mask)
{
unsigned long flags;
- void *element;
+ unsigned int i;
spin_lock_irqsave(&pool->lock, flags);
- if (unlikely(!pool->curr_nr))
+ if (unlikely(pool->curr_nr < count - allocated))
goto fail;
- element = remove_element(pool);
+ for (i = 0; i < count; i++) {
+ if (!elems[i]) {
+ elems[i] = remove_element(pool);
+ allocated++;
+ }
+ }
spin_unlock_irqrestore(&pool->lock, flags);
/* Paired with rmb in mempool_free(), read comment there. */
@@ -398,8 +415,9 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
* Update the allocation stack trace as this is more useful for
* debugging.
*/
- kmemleak_update_trace(element);
- return element;
+ for (i = 0; i < count; i++)
+ kmemleak_update_trace(elems[i]);
+ return allocated;
fail:
if (gfp_mask & __GFP_DIRECT_RECLAIM) {
@@ -421,7 +439,7 @@ static void *mempool_alloc_from_pool(struct mempool *pool, gfp_t gfp_mask)
spin_unlock_irqrestore(&pool->lock, flags);
}
- return NULL;
+ return allocated;
}
/*
@@ -437,6 +455,65 @@ static inline gfp_t mempool_adjust_gfp(gfp_t *gfp_mask)
return *gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO);
}
+/**
+ * mempool_alloc_bulk - allocate multiple elements from a memory pool
+ * @pool: pointer to the memory pool
+ * @elems: partially or fully populated elements array
+ * @count: number of entries in @elem that need to be allocated
+ * @allocated: number of entries in @elem already allocated
+ *
+ * Allocate elements for each slot in @elem that is non-%NULL. This is done by
+ * first calling into the alloc_fn supplied at pool initialization time, and
+ * dipping into the reserved pool when alloc_fn fails to allocate an element.
+ *
+ * On return all @count elements in @elems will be populated.
+ *
+ * Return: Always 0. If it wasn't for %$#^$ alloc tags, it would return void.
+ */
+int mempool_alloc_bulk_noprof(struct mempool *pool, void **elems,
+ unsigned int count, unsigned int allocated)
+{
+ gfp_t gfp_mask = GFP_KERNEL;
+ gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
+ unsigned int i = 0;
+
+ VM_WARN_ON_ONCE(count > pool->min_nr);
+ might_alloc(gfp_mask);
+
+ /*
+ * If an error is injected, fail all elements in a bulk allocation so
+ * that we stress the multiple elements missing path.
+ */
+ if (should_fail_ex(&fail_mempool_alloc_bulk, 1, FAULT_NOWARN)) {
+ pr_info("forcing mempool usage for %pS\n",
+ (void *)_RET_IP_);
+ goto use_pool;
+ }
+
+repeat_alloc:
+ /*
+ * Try to allocate the elements using the allocation callback first as
+ * that might succeed even when the caller's bulk allocation did not.
+ */
+ for (i = 0; i < count; i++) {
+ if (elems[i])
+ continue;
+ elems[i] = pool->alloc(gfp_temp, pool->pool_data);
+ if (unlikely(!elems[i]))
+ goto use_pool;
+ allocated++;
+ }
+
+ return 0;
+
+use_pool:
+ allocated = mempool_alloc_from_pool(pool, elems, count, allocated,
+ gfp_temp);
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+}
+EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
+
/**
* mempool_alloc - allocate an element from a memory pool
* @pool: pointer to the memory pool
@@ -478,8 +555,8 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
* sleep in mempool_alloc_from_pool. Retry the allocation
* with all flags set in that case.
*/
- element = mempool_alloc_from_pool(pool, gfp_mask);
- if (!element && gfp_temp != gfp_mask) {
+ if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
+ gfp_temp != gfp_mask) {
gfp_temp = gfp_mask;
goto repeat_alloc;
}
@@ -503,26 +580,33 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
*/
void *mempool_alloc_preallocated(mempool_t *pool)
{
- return mempool_alloc_from_pool(pool, GFP_NOWAIT);
+ void *element = NULL;
+
+ mempool_alloc_from_pool(pool, &element, 1, 0, GFP_NOWAIT);
+ return element;
}
EXPORT_SYMBOL(mempool_alloc_preallocated);
/**
- * mempool_free - return an element to a mempool
- * @element: pointer to element
+ * mempool_free_bulk - return elements to a mempool
* @pool: pointer to the memory pool
+ * @elems: elements to return
+ * @count: number of elements to return
*
- * Returns @element to @pool if it needs replenishing, else frees it using
- * the free_fn callback in @pool.
+ * Returns a number of elements from the start of @elem to @pool if @pool needs
+ * replenishing and sets their slots in @elem to NULL. Other elements are left
+ * in @elem.
*
- * This function only sleeps if the free_fn callback sleeps.
+ * Return: number of elements transferred to @pool. Elements are always
+ * transferred from the beginning of @elem, so the return value can be used as
+ * an offset into @elem for the freeing the remaining elements in the caller.
*/
-void mempool_free(void *element, mempool_t *pool)
+unsigned int mempool_free_bulk(struct mempool *pool, void **elems,
+ unsigned int count)
{
unsigned long flags;
-
- if (unlikely(element == NULL))
- return;
+ unsigned int freed = 0;
+ bool added = false;
/*
* Paired with the wmb in mempool_alloc(). The preceding read is
@@ -556,21 +640,6 @@ void mempool_free(void *element, mempool_t *pool)
* Waiters happen iff curr_nr is 0 and the above guarantee also
* ensures that there will be frees which return elements to the
* pool waking up the waiters.
- */
- if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
- spin_lock_irqsave(&pool->lock, flags);
- if (likely(pool->curr_nr < pool->min_nr)) {
- add_element(pool, element);
- spin_unlock_irqrestore(&pool->lock, flags);
- if (wq_has_sleeper(&pool->wait))
- wake_up(&pool->wait);
- return;
- }
- spin_unlock_irqrestore(&pool->lock, flags);
- }
-
- /*
- * Handle the min_nr = 0 edge case:
*
* For zero-minimum pools, curr_nr < min_nr (0 < 0) never succeeds,
* so waiters sleeping on pool->wait would never be woken by the
@@ -578,20 +647,45 @@ void mempool_free(void *element, mempool_t *pool)
* allocation of element when both min_nr and curr_nr are 0, and
* any active waiters are properly awakened.
*/
- if (unlikely(pool->min_nr == 0 &&
+ if (unlikely(READ_ONCE(pool->curr_nr) < pool->min_nr)) {
+ spin_lock_irqsave(&pool->lock, flags);
+ while (pool->curr_nr < pool->min_nr && freed < count) {
+ add_element(pool, elems[freed++]);
+ added = true;
+ }
+ spin_unlock_irqrestore(&pool->lock, flags);
+ } else if (unlikely(pool->min_nr == 0 &&
READ_ONCE(pool->curr_nr) == 0)) {
+ /* Handle the min_nr = 0 edge case: */
spin_lock_irqsave(&pool->lock, flags);
if (likely(pool->curr_nr == 0)) {
- add_element(pool, element);
- spin_unlock_irqrestore(&pool->lock, flags);
- if (wq_has_sleeper(&pool->wait))
- wake_up(&pool->wait);
- return;
+ add_element(pool, elems[freed++]);
+ added = true;
}
spin_unlock_irqrestore(&pool->lock, flags);
}
- pool->free(element, pool->pool_data);
+ if (unlikely(added) && wq_has_sleeper(&pool->wait))
+ wake_up(&pool->wait);
+
+ return freed;
+}
+EXPORT_SYMBOL_GPL(mempool_free_bulk);
+
+/**
+ * mempool_free - return an element to the pool.
+ * @element: element to return
+ * @pool: pointer to the memory pool
+ *
+ * Returns @element to @pool if it needs replenishing, else frees it using
+ * the free_fn callback in @pool.
+ *
+ * This function only sleeps if the free_fn callback sleeps.
+ */
+void mempool_free(void *element, struct mempool *pool)
+{
+ if (likely(element) && !mempool_free_bulk(pool, &element, 1))
+ pool->free(element, pool->pool_data);
}
EXPORT_SYMBOL(mempool_free);
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 08/11] mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (6 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 07/11] mempool: add mempool_{alloc,free}_bulk Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 09/11] mempool: remove mempool_{init,create}_kvmalloc_pool Christoph Hellwig
` (3 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
The timeout here is and old workaround with a Fixme comment. But
thinking about it, it makes sense to keep it, so reword the comment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 88b9a8476d31..ea2f4f9bcfa1 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -427,10 +427,10 @@ static unsigned int mempool_alloc_from_pool(struct mempool *pool, void **elems,
spin_unlock_irqrestore(&pool->lock, flags);
/*
- * Wait for someone else to return an element to @pool.
- *
- * FIXME: this should be io_schedule(). The timeout is there as
- * a workaround for some DM problems in 2.6.18.
+ * Wait for someone else to return an element to @pool, but wake
+ * up occasionally as memory pressure might have reduced even
+ * and the normal allocation in alloc_fn could succeed even if
+ * no element was returned.
*/
io_schedule_timeout(5 * HZ);
finish_wait(&pool->wait, &wait);
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 09/11] mempool: remove mempool_{init,create}_kvmalloc_pool
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (7 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 08/11] mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 10/11] mempool: de-typedef Christoph Hellwig
` (2 subsequent siblings)
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
This was added for bcachefs and is unused now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/mempool.h | 13 -------------
mm/mempool.c | 13 -------------
2 files changed, 26 deletions(-)
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index e914fec0e119..d9332485e8ca 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -103,19 +103,6 @@ void mempool_kfree(void *element, void *pool_data);
mempool_create((_min_nr), mempool_kmalloc, mempool_kfree, \
(void *)(unsigned long)(_size))
-void *mempool_kvmalloc(gfp_t gfp_mask, void *pool_data);
-void mempool_kvfree(void *element, void *pool_data);
-
-static inline int mempool_init_kvmalloc_pool(mempool_t *pool, int min_nr, size_t size)
-{
- return mempool_init(pool, min_nr, mempool_kvmalloc, mempool_kvfree, (void *) size);
-}
-
-static inline mempool_t *mempool_create_kvmalloc_pool(int min_nr, size_t size)
-{
- return mempool_create(min_nr, mempool_kvmalloc, mempool_kvfree, (void *) size);
-}
-
/*
* A mempool_alloc_t and mempool_free_t for a simple page allocator that
* allocates pages of the order specified by pool_data
diff --git a/mm/mempool.c b/mm/mempool.c
index ea2f4f9bcfa1..72c788888ed7 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -724,19 +724,6 @@ void mempool_kfree(void *element, void *pool_data)
}
EXPORT_SYMBOL(mempool_kfree);
-void *mempool_kvmalloc(gfp_t gfp_mask, void *pool_data)
-{
- size_t size = (size_t)pool_data;
- return kvmalloc(size, gfp_mask);
-}
-EXPORT_SYMBOL(mempool_kvmalloc);
-
-void mempool_kvfree(void *element, void *pool_data)
-{
- kvfree(element);
-}
-EXPORT_SYMBOL(mempool_kvfree);
-
/*
* A simple mempool-backed page allocator that allocates pages
* of the order specified by pool_data.
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 10/11] mempool: de-typedef
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (8 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 09/11] mempool: remove mempool_{init,create}_kvmalloc_pool Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 8:39 ` [PATCH 11/11] mempool: drop the file name in the top of file comment Christoph Hellwig
2025-11-13 16:12 ` mempool_alloc_bulk and various mempool improvements v3 Vlastimil Babka
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Switch all uses of the deprecated mempool_t typedef in the core mempool
code to use struct mempool instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/mempool.h | 39 ++++++++++++++++----------------
mm/mempool.c | 50 +++++++++++++++++++++--------------------
2 files changed, 45 insertions(+), 44 deletions(-)
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index d9332485e8ca..e8e440e04a06 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -27,32 +27,31 @@ typedef struct mempool {
wait_queue_head_t wait;
} mempool_t;
-static inline bool mempool_initialized(mempool_t *pool)
+static inline bool mempool_initialized(struct mempool *pool)
{
return pool->elements != NULL;
}
-static inline bool mempool_is_saturated(mempool_t *pool)
+static inline bool mempool_is_saturated(struct mempool *pool)
{
return READ_ONCE(pool->curr_nr) >= pool->min_nr;
}
-void mempool_exit(mempool_t *pool);
-int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data,
- gfp_t gfp_mask, int node_id);
-
-int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data);
+void mempool_exit(struct mempool *pool);
+int mempool_init_node(struct mempool *pool, int min_nr,
+ mempool_alloc_t *alloc_fn, mempool_free_t *free_fn,
+ void *pool_data, gfp_t gfp_mask, int node_id);
+int mempool_init_noprof(struct mempool *pool, int min_nr,
+ mempool_alloc_t *alloc_fn, mempool_free_t *free_fn,
+ void *pool_data);
#define mempool_init(...) \
alloc_hooks(mempool_init_noprof(__VA_ARGS__))
-extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data);
-
-extern mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data,
- gfp_t gfp_mask, int nid);
+struct mempool *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data);
+struct mempool *mempool_create_node_noprof(int min_nr,
+ mempool_alloc_t *alloc_fn, mempool_free_t *free_fn,
+ void *pool_data, gfp_t gfp_mask, int nid);
#define mempool_create_node(...) \
alloc_hooks(mempool_create_node_noprof(__VA_ARGS__))
@@ -60,10 +59,10 @@ extern mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_
mempool_create_node(_min_nr, _alloc_fn, _free_fn, _pool_data, \
GFP_KERNEL, NUMA_NO_NODE)
-extern int mempool_resize(mempool_t *pool, int new_min_nr);
-extern void mempool_destroy(mempool_t *pool);
+int mempool_resize(struct mempool *pool, int new_min_nr);
+void mempool_destroy(struct mempool *pool);
-extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
+void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask) __malloc;
#define mempool_alloc(...) \
alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
@@ -71,8 +70,8 @@ int mempool_alloc_bulk_noprof(struct mempool *pool, void **elem,
#define mempool_alloc_bulk(...) \
alloc_hooks(mempool_alloc_bulk_noprof(__VA_ARGS__))
-extern void *mempool_alloc_preallocated(mempool_t *pool) __malloc;
-extern void mempool_free(void *element, mempool_t *pool);
+void *mempool_alloc_preallocated(struct mempool *pool) __malloc;
+void mempool_free(void *element, struct mempool *pool);
unsigned int mempool_free_bulk(struct mempool *pool, void **elem,
unsigned int count);
diff --git a/mm/mempool.c b/mm/mempool.c
index 72c788888ed7..448ff2e578a0 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -40,7 +40,7 @@ static int __init mempool_faul_inject_init(void)
late_initcall(mempool_faul_inject_init);
#ifdef CONFIG_SLUB_DEBUG_ON
-static void poison_error(mempool_t *pool, void *element, size_t size,
+static void poison_error(struct mempool *pool, void *element, size_t size,
size_t byte)
{
const int nr = pool->curr_nr;
@@ -57,7 +57,7 @@ static void poison_error(mempool_t *pool, void *element, size_t size,
dump_stack();
}
-static void __check_element(mempool_t *pool, void *element, size_t size)
+static void __check_element(struct mempool *pool, void *element, size_t size)
{
u8 *obj = element;
size_t i;
@@ -73,7 +73,7 @@ static void __check_element(mempool_t *pool, void *element, size_t size)
memset(obj, POISON_INUSE, size);
}
-static void check_element(mempool_t *pool, void *element)
+static void check_element(struct mempool *pool, void *element)
{
/* Skip checking: KASAN might save its metadata in the element. */
if (kasan_enabled())
@@ -102,7 +102,7 @@ static void __poison_element(void *element, size_t size)
obj[size - 1] = POISON_END;
}
-static void poison_element(mempool_t *pool, void *element)
+static void poison_element(struct mempool *pool, void *element)
{
/* Skip poisoning: KASAN might save its metadata in the element. */
if (kasan_enabled())
@@ -123,15 +123,16 @@ static void poison_element(mempool_t *pool, void *element)
}
}
#else /* CONFIG_SLUB_DEBUG_ON */
-static inline void check_element(mempool_t *pool, void *element)
+static inline void check_element(struct mempool *pool, void *element)
{
}
-static inline void poison_element(mempool_t *pool, void *element)
+static inline void poison_element(struct mempool *pool, void *element)
{
}
#endif /* CONFIG_SLUB_DEBUG_ON */
-static __always_inline bool kasan_poison_element(mempool_t *pool, void *element)
+static __always_inline bool kasan_poison_element(struct mempool *pool,
+ void *element)
{
if (pool->alloc == mempool_alloc_slab || pool->alloc == mempool_kmalloc)
return kasan_mempool_poison_object(element);
@@ -141,7 +142,7 @@ static __always_inline bool kasan_poison_element(mempool_t *pool, void *element)
return true;
}
-static void kasan_unpoison_element(mempool_t *pool, void *element)
+static void kasan_unpoison_element(struct mempool *pool, void *element)
{
if (pool->alloc == mempool_kmalloc)
kasan_mempool_unpoison_object(element, (size_t)pool->pool_data);
@@ -153,7 +154,7 @@ static void kasan_unpoison_element(mempool_t *pool, void *element)
(unsigned long)pool->pool_data);
}
-static __always_inline void add_element(mempool_t *pool, void *element)
+static __always_inline void add_element(struct mempool *pool, void *element)
{
BUG_ON(pool->min_nr != 0 && pool->curr_nr >= pool->min_nr);
poison_element(pool, element);
@@ -161,7 +162,7 @@ static __always_inline void add_element(mempool_t *pool, void *element)
pool->elements[pool->curr_nr++] = element;
}
-static void *remove_element(mempool_t *pool)
+static void *remove_element(struct mempool *pool)
{
void *element = pool->elements[--pool->curr_nr];
@@ -182,7 +183,7 @@ static void *remove_element(mempool_t *pool)
* May be called on a zeroed but uninitialized mempool (i.e. allocated with
* kzalloc()).
*/
-void mempool_exit(mempool_t *pool)
+void mempool_exit(struct mempool *pool)
{
while (pool->curr_nr) {
void *element = remove_element(pool);
@@ -201,7 +202,7 @@ EXPORT_SYMBOL(mempool_exit);
* Free all reserved elements in @pool and @pool itself. This function
* only sleeps if the free_fn() function sleeps.
*/
-void mempool_destroy(mempool_t *pool)
+void mempool_destroy(struct mempool *pool)
{
if (unlikely(!pool))
return;
@@ -211,9 +212,9 @@ void mempool_destroy(mempool_t *pool)
}
EXPORT_SYMBOL(mempool_destroy);
-int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data,
- gfp_t gfp_mask, int node_id)
+int mempool_init_node(struct mempool *pool, int min_nr,
+ mempool_alloc_t *alloc_fn, mempool_free_t *free_fn,
+ void *pool_data, gfp_t gfp_mask, int node_id)
{
spin_lock_init(&pool->lock);
pool->min_nr = min_nr;
@@ -263,8 +264,9 @@ EXPORT_SYMBOL(mempool_init_node);
*
* Return: %0 on success, negative error code otherwise.
*/
-int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data)
+int mempool_init_noprof(struct mempool *pool, int min_nr,
+ mempool_alloc_t *alloc_fn, mempool_free_t *free_fn,
+ void *pool_data)
{
return mempool_init_node(pool, min_nr, alloc_fn, free_fn,
pool_data, GFP_KERNEL, NUMA_NO_NODE);
@@ -290,11 +292,11 @@ EXPORT_SYMBOL(mempool_init_noprof);
*
* Return: pointer to the created memory pool object or %NULL on error.
*/
-mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data,
- gfp_t gfp_mask, int node_id)
+struct mempool *mempool_create_node_noprof(int min_nr,
+ mempool_alloc_t *alloc_fn, mempool_free_t *free_fn,
+ void *pool_data, gfp_t gfp_mask, int node_id)
{
- mempool_t *pool;
+ struct mempool *pool;
pool = kmalloc_node_noprof(sizeof(*pool), gfp_mask | __GFP_ZERO, node_id);
if (!pool)
@@ -328,7 +330,7 @@ EXPORT_SYMBOL(mempool_create_node_noprof);
*
* Return: %0 on success, negative error code otherwise.
*/
-int mempool_resize(mempool_t *pool, int new_min_nr)
+int mempool_resize(struct mempool *pool, int new_min_nr)
{
void *element;
void **new_elements;
@@ -530,7 +532,7 @@ EXPORT_SYMBOL_GPL(mempool_alloc_bulk_noprof);
* an element. Allocation failure can only happen when @gfp_mask does not
* include %__GFP_DIRECT_RECLAIM.
*/
-void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
+void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
{
gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
void *element;
@@ -578,7 +580,7 @@ EXPORT_SYMBOL(mempool_alloc_noprof);
* Return: pointer to the allocated element or %NULL if no elements are
* available.
*/
-void *mempool_alloc_preallocated(mempool_t *pool)
+void *mempool_alloc_preallocated(struct mempool *pool)
{
void *element = NULL;
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 11/11] mempool: drop the file name in the top of file comment
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (9 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 10/11] mempool: de-typedef Christoph Hellwig
@ 2025-11-13 8:39 ` Christoph Hellwig
2025-11-13 16:12 ` mempool_alloc_bulk and various mempool improvements v3 Vlastimil Babka
11 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2025-11-13 8:39 UTC (permalink / raw)
To: Vlastimil Babka, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
Mentioning the name of the file is redundant, so drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/mempool.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/mempool.c b/mm/mempool.c
index 448ff2e578a0..e14d1cf46c72 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -1,7 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * linux/mm/mempool.c
- *
* memory buffer pool support. Such pools are mostly used
* for guaranteed, deadlock-free memory allocations during
* extreme VM load.
--
2.47.3
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: mempool_alloc_bulk and various mempool improvements v3
2025-11-13 8:39 mempool_alloc_bulk and various mempool improvements v3 Christoph Hellwig
` (10 preceding siblings ...)
2025-11-13 8:39 ` [PATCH 11/11] mempool: drop the file name in the top of file comment Christoph Hellwig
@ 2025-11-13 16:12 ` Vlastimil Babka
11 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2025-11-13 16:12 UTC (permalink / raw)
To: Christoph Hellwig, Andrew Morton
Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Zi Yan,
Eric Biggers, linux-mm, linux-kernel
On 11/13/25 09:39, Christoph Hellwig wrote:
> Hi all,
>
> this series adds a bulk version of mempool_alloc that makes allocating
> multiple objects deadlock safe.
>
> The initial users is the blk-crypto-fallback code:
>
> https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/
>
> with which v1 was posted, but I also have a few other users in mind.
>
> Changes since v2:
> - improve the alloc_pages_bulk documentation a bit
> - drop the not needed race fix again
> - drop the gfp_mask argument to mempool_alloc_bulk to reduce misuse
> potential
> - add an allocated argument to mempool_alloc_bulk
> - drop the caller_ip handling that was already not required since the
> last version
> - add a few more mempool cleanups
LGTM now.
Since mempool is now part of SLAB in MAINTAINERS, added to slab/for-next.
Thanks!
> Changes since v1:
> - fix build for !CONFIG_FAULT_INJECTION
> - improve the kerneldoc comments further
> - refactor the code so that the mempool_alloc fastpath does not
> have to deal with arrays
> - don't support !__GFP_DIRECT_RECLAIM for bulk allocations
> - do poll allocations even if not all elements are available
> - s/elem/elems/
> - use a separate failure injection know for the bulk allocator
>
> diffstat:
> include/linux/fault-inject.h | 8
> include/linux/mempool.h | 58 ++----
> mm/mempool.c | 401 ++++++++++++++++++++++++++-----------------
> mm/page_alloc.c | 15 +
> 4 files changed, 289 insertions(+), 193 deletions(-)
^ permalink raw reply [flat|nested] 23+ messages in thread