* [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
@ 2025-12-04 14:26 Shardul Bankar
2025-12-05 7:22 ` David Hildenbrand (Red Hat)
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Shardul Bankar @ 2025-12-04 14:26 UTC (permalink / raw)
To: willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, david, shardulsb08, janak,
Shardul Bankar
xas_create_range() is typically called in a retry loop that uses
xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
xa_node and store it in xas->xa_alloc for use in the retry.
If the lock is dropped after xas_nomem(), another thread can expand the
xarray tree in the meantime. On the next retry, xas_create_range() can
then succeed without consuming the spare node stored in xas->xa_alloc.
If the function returns without freeing this spare node, it leaks.
xas_create_range() calls xas_create() multiple times in a loop for
different index ranges. A spare node that isn't needed for one range
iteration might be needed for the next, so we cannot free it after each
xas_create() call. We can only safely free it after xas_create_range()
completes.
Fix this by calling xas_destroy() at the end of xas_create_range() to
free any unused spare node. This makes the API safer by default and
prevents callers from needing to remember cleanup.
This fixes a memory leak in mm/khugepaged.c and potentially other
callers that use xas_nomem() with xas_create_range().
Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Link: https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
v4:
- Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
already checks xa_alloc internally.
v3:
- Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
- Fix in library function makes API safer by default, preventing callers from needing
to remember cleanup
- Use shared cleanup label that both restore: and success: paths jump to
- Clean up unused spare node on both success and error exit paths
v2:
- Call xas_destroy() on both success and failure
- Explained retry semantics and xa_alloc / concurrency risk
- Dropped cleanup_empty_nodes from previous proposal
lib/xarray.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..f49ccfa5f57d 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
xas->xa_shift = shift;
xas->xa_sibs = sibs;
xas->xa_index = index;
- return;
+ goto cleanup;
+
success:
xas->xa_index = index;
if (xas->xa_node)
xas_set_offset(xas);
+
+cleanup:
+ /* Free any unused spare node from xas_nomem() */
+ xas_destroy(xas);
}
EXPORT_SYMBOL_GPL(xas_create_range);
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
@ 2025-12-05 7:22 ` David Hildenbrand (Red Hat)
2025-12-05 10:51 ` Shardul Bankar
2025-12-08 8:37 ` Dev Jain
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-05 7:22 UTC (permalink / raw)
To: Shardul Bankar, willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, shardulsb08, janak
On 12/4/25 15:26, Shardul Bankar wrote:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link: https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
> v4:
> - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
> already checks xa_alloc internally.
> v3:
> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
> - Fix in library function makes API safer by default, preventing callers from needing
> to remember cleanup
> - Use shared cleanup label that both restore: and success: paths jump to
> - Clean up unused spare node on both success and error exit paths
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
>
> lib/xarray.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
> xas->xa_shift = shift;
> xas->xa_sibs = sibs;
> xas->xa_index = index;
> - return;
> + goto cleanup;
> +
> success:
> xas->xa_index = index;
> if (xas->xa_node)
> xas_set_offset(xas);
> +
> +cleanup:
> + /* Free any unused spare node from xas_nomem() */
> + xas_destroy(xas);
> }
> EXPORT_SYMBOL_GPL(xas_create_range);
>
Nothing jumped at me, except that the label situation is a bit
suboptimal.
Hoping Willy will take another look as well.
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
BTW, do we have a way to test this in a test case?
A follow-up cleanup that avoids labels could be something like (untested):
diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540cf..325f264530fb2 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -714,6 +714,7 @@ void xas_create_range(struct xa_state *xas)
unsigned long index = xas->xa_index;
unsigned char shift = xas->xa_shift;
unsigned char sibs = xas->xa_sibs;
+ bool success = false;
xas->xa_index |= ((sibs + 1UL) << shift) - 1;
if (xas_is_node(xas) && xas->xa_node->shift == xas->xa_shift)
@@ -724,9 +725,11 @@ void xas_create_range(struct xa_state *xas)
for (;;) {
xas_create(xas, true);
if (xas_error(xas))
- goto restore;
- if (xas->xa_index <= (index | XA_CHUNK_MASK))
- goto success;
+ break
+ if (xas->xa_index <= (index | XA_CHUNK_MASK)) {
+ succeess = true;
+ break;
+ }
xas->xa_index -= XA_CHUNK_SIZE;
for (;;) {
@@ -740,15 +743,17 @@ void xas_create_range(struct xa_state *xas)
}
}
-restore:
- xas->xa_shift = shift;
- xas->xa_sibs = sibs;
- xas->xa_index = index;
- return;
-success:
- xas->xa_index = index;
- if (xas->xa_node)
- xas_set_offset(xas);
+ if (success) {
+ xas->xa_index = index;
+ if (xas->xa_node)
+ xas_set_offset(xas);
+ } else {
+ xas->xa_shift = shift;
+ xas->xa_sibs = sibs;
+ xas->xa_index = index;
+ }
+ /* Free any unused spare node from xas_nomem() */
+ xas_destroy(xas);
}
EXPORT_SYMBOL_GPL(xas_create_range);
--
Cheers
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-05 7:22 ` David Hildenbrand (Red Hat)
@ 2025-12-05 10:51 ` Shardul Bankar
2025-12-08 11:36 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 12+ messages in thread
From: Shardul Bankar @ 2025-12-05 10:51 UTC (permalink / raw)
To: David Hildenbrand (Red Hat), willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, janak, shardulsb08
On Fri, 2025-12-05 at 08:22 +0100, David Hildenbrand (Red Hat) wrote:
> > Link:
> > https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> ...
> Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
>
>
> BTW, do we have a way to test this in a test case?
Hi David,
Thanks for the review and the Reviewed-by.
Regarding a test case: I don’t have a focused selftest or fault-
injection setup yet that reliably hits this xas_nomem() +
xas_create_range() corner case.
I noticed this spare-node leak while analyzing the Syzbot report I
referenced in the Link: tag, but the reproducer I see there doesn’t
isolate this path and reports other kmemleaks.
For now I’d prefer to treat this as a small correctness fix in xarray
itself. If I manage to come up with a robust way to exercise this path
in a selftest (e.g. via targeted fault injection in lib/test_xarray.c),
I can follow up with a separate patch, but I don’t have anything solid
to propose today.
>
>
> A follow-up cleanup that avoids labels could be something like
> (untested):
>
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540cf..325f264530fb2 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -714,6 +714,7 @@ void xas_create_range(struct xa_state *xas)
> unsigned long index = xas->xa_index;
> unsigned char shift = xas->xa_shift;
> unsigned char sibs = xas->xa_sibs;
> + bool success = false;
>
> xas->xa_index |= ((sibs + 1UL) << shift) - 1;
> if (xas_is_node(xas) && xas->xa_node->shift == xas-
> >xa_shift)
> @@ -724,9 +725,11 @@ void xas_create_range(struct xa_state *xas)
> for (;;) {
> xas_create(xas, true);
> if (xas_error(xas))
> - goto restore;
> - if (xas->xa_index <= (index | XA_CHUNK_MASK))
> - goto success;
> + break
> + if (xas->xa_index <= (index | XA_CHUNK_MASK)) {
> + succeess = true;
> + break;
> + }
> xas->xa_index -= XA_CHUNK_SIZE;
>
> for (;;) {
> @@ -740,15 +743,17 @@ void xas_create_range(struct xa_state *xas)
> }
> }
>
> -restore:
> - xas->xa_shift = shift;
> - xas->xa_sibs = sibs;
> - xas->xa_index = index;
> - return;
> -success:
> - xas->xa_index = index;
> - if (xas->xa_node)
> - xas_set_offset(xas);
> + if (success) {
> + xas->xa_index = index;
> + if (xas->xa_node)
> + xas_set_offset(xas);
> + } else {
> + xas->xa_shift = shift;
> + xas->xa_sibs = sibs;
> + xas->xa_index = index;
> + }
> + /* Free any unused spare node from xas_nomem() */
> + xas_destroy(xas);
> }
> EXPORT_SYMBOL_GPL(xas_create_range);
>
>
Your bool-based version reads nicer; I’m happy to follow up with a
small cleanup patch on top that switches xas_create_range() over to
that style (with a Suggested-by tag).
Thanks and Regards,
Shardul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
2025-12-05 7:22 ` David Hildenbrand (Red Hat)
@ 2025-12-08 8:37 ` Dev Jain
2025-12-15 2:19 ` Jinjiang Tu
2025-12-31 6:29 ` Shardul Bankar
3 siblings, 0 replies; 12+ messages in thread
From: Dev Jain @ 2025-12-08 8:37 UTC (permalink / raw)
To: Shardul Bankar, willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, david, shardulsb08, janak
On 04/12/25 7:56 pm, Shardul Bankar wrote:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
Nit: "When the lock is dropped after xas_create_range()".
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link: https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
> v4:
> - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
> already checks xa_alloc internally.
> v3:
> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
> - Fix in library function makes API safer by default, preventing callers from needing
> to remember cleanup
> - Use shared cleanup label that both restore: and success: paths jump to
> - Clean up unused spare node on both success and error exit paths
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
>
> lib/xarray.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
> xas->xa_shift = shift;
> xas->xa_sibs = sibs;
> xas->xa_index = index;
> - return;
> + goto cleanup;
> +
> success:
> xas->xa_index = index;
> if (xas->xa_node)
> xas_set_offset(xas);
> +
> +cleanup:
> + /* Free any unused spare node from xas_nomem() */
> + xas_destroy(xas);
> }
> EXPORT_SYMBOL_GPL(xas_create_range);
Since there are other code paths doing this fashion of xas_create/xas_store followed by
calling xas_nomem till the former succeeds, I believe we should handle this in the caller -
for example, I believe we need to fix xa_store_range() as well? It does xas_create, then
drops the lock in the same fashion, and I believe we have no choice here but to handle it
in the caller - we should not put xas_destroy inside xas_create.
Although for this particular case, we have only one caller of xas_create_range, so this
is fine really.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-05 10:51 ` Shardul Bankar
@ 2025-12-08 11:36 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-08 11:36 UTC (permalink / raw)
To: Shardul Bankar, willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, janak, shardulsb08
>> -restore:
>> - xas->xa_shift = shift;
>> - xas->xa_sibs = sibs;
>> - xas->xa_index = index;
>> - return;
>> -success:
>> - xas->xa_index = index;
>> - if (xas->xa_node)
>> - xas_set_offset(xas);
>> + if (success) {
>> + xas->xa_index = index;
>> + if (xas->xa_node)
>> + xas_set_offset(xas);
>> + } else {
>> + xas->xa_shift = shift;
>> + xas->xa_sibs = sibs;
>> + xas->xa_index = index;
>> + }
>> + /* Free any unused spare node from xas_nomem() */
>> + xas_destroy(xas);
>> }
>> EXPORT_SYMBOL_GPL(xas_create_range);
>>
>>
> Your bool-based version reads nicer; I’m happy to follow up with a
> small cleanup patch on top that switches xas_create_range() over to
> that style (with a Suggested-by tag).
Yeah, feel free to send a cleanup out that removes some of these labels
(doesn't necessarily have to be what I proposed).
--
Cheers
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
2025-12-05 7:22 ` David Hildenbrand (Red Hat)
2025-12-08 8:37 ` Dev Jain
@ 2025-12-15 2:19 ` Jinjiang Tu
2025-12-15 3:42 ` Jinjiang Tu
2025-12-31 6:29 ` Shardul Bankar
3 siblings, 1 reply; 12+ messages in thread
From: Jinjiang Tu @ 2025-12-15 2:19 UTC (permalink / raw)
To: Shardul Bankar, willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, david, shardulsb08, janak
[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]
在 2025/12/4 22:26, Shardul Bankar 写道:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
I encountered another memory leak issue in xas_create_range().
collapse_file() calls xas_create_range() to pre-create all slots needed.
If collapse_file() finally fails, these pre-created slots are empty nodes.
When the file is deleted, shmem_evict_inode()->shmem_truncate_range()->shmem_undo_range()
calls xas_store(&xas, NULL) for each entries to delete nodes, but leaving those pre-created
empty nodes leaked.
I can reproduce it with following steps.
1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file
2) memset for the first 2MB
3) madvise(MADV_COLLAPSE) for the second 2MB.
4) unlink the file
in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse
due to the whole 2M region is empty, leading to the new created empty nodes leaked.
To fix it, maybe we should add a new function xas_delete_range() to revert what xas_create_range()
does when xas_create_range() runs into rollback path?
>
> Link:https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link:https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar<shardul.b@mpiricsoftware.com>
> ---
> v4:
> - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
> already checks xa_alloc internally.
> v3:
> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
> - Fix in library function makes API safer by default, preventing callers from needing
> to remember cleanup
> - Use shared cleanup label that both restore: and success: paths jump to
> - Clean up unused spare node on both success and error exit paths
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
>
> lib/xarray.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
> xas->xa_shift = shift;
> xas->xa_sibs = sibs;
> xas->xa_index = index;
> - return;
> + goto cleanup;
> +
> success:
> xas->xa_index = index;
> if (xas->xa_node)
> xas_set_offset(xas);
> +
> +cleanup:
> + /* Free any unused spare node from xas_nomem() */
> + xas_destroy(xas);
> }
> EXPORT_SYMBOL_GPL(xas_create_range);
>
[-- Attachment #2: Type: text/html, Size: 4539 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-15 2:19 ` Jinjiang Tu
@ 2025-12-15 3:42 ` Jinjiang Tu
0 siblings, 0 replies; 12+ messages in thread
From: Jinjiang Tu @ 2025-12-15 3:42 UTC (permalink / raw)
To: Shardul Bankar, willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, david, shardulsb08, janak,
Kefeng Wang
在 2025/12/15 10:19, Jinjiang Tu 写道:
>
>
> 在 2025/12/4 22:26, Shardul Bankar 写道:
>> xas_create_range() is typically called in a retry loop that uses
>> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
>> xa_node and store it in xas->xa_alloc for use in the retry.
>>
>> If the lock is dropped after xas_nomem(), another thread can expand the
>> xarray tree in the meantime. On the next retry, xas_create_range() can
>> then succeed without consuming the spare node stored in xas->xa_alloc.
>> If the function returns without freeing this spare node, it leaks.
>>
>> xas_create_range() calls xas_create() multiple times in a loop for
>> different index ranges. A spare node that isn't needed for one range
>> iteration might be needed for the next, so we cannot free it after each
>> xas_create() call. We can only safely free it after xas_create_range()
>> completes.
>>
>> Fix this by calling xas_destroy() at the end of xas_create_range() to
>> free any unused spare node. This makes the API safer by default and
>> prevents callers from needing to remember cleanup.
>>
>> This fixes a memory leak in mm/khugepaged.c and potentially other
>> callers that use xas_nomem() with xas_create_range().
> I encountered another memory leak issue in xas_create_range().
>
> collapse_file() calls xas_create_range() to pre-create all slots needed.
> If collapse_file() finally fails, these pre-created slots are empty nodes.
> When the file is deleted, shmem_evict_inode()->shmem_truncate_range()->shmem_undo_range()
> calls xas_store(&xas, NULL) for each entries to delete nodes, but leaving those pre-created
> empty nodes leaked.
>
> I can reproduce it with following steps.
> 1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file
> 2) memset for the first 2MB
> 3) madvise(MADV_COLLAPSE) for the second 2MB.
> 4) unlink the file
>
> in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse
> due to the whole 2M region is empty, leading to the new created empty nodes leaked.
>
> To fix it, maybe we should add a new function xas_delete_range() to revert what xas_create_range()
> does when xas_create_range() runs into rollback path?
How about the following diff? I tried it, and the memory leak disappears. I'm new in xarray, so
I don't if this fix works properly.
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index be850174e802..972df5ceeb84 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1555,6 +1555,7 @@ void xas_destroy(struct xa_state *);
void xas_pause(struct xa_state *);
void xas_create_range(struct xa_state *);
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end);
#ifdef CONFIG_XARRAY_MULTI
int xa_get_order(struct xarray *, unsigned long index);
diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..ab15dc939962 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -752,6 +752,21 @@ void xas_create_range(struct xa_state *xas)
}
EXPORT_SYMBOL_GPL(xas_create_range);
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end)
+{
+ unsigned long index;
+ void *entry;
+
+ for (index = start; index < end; ++index) {
+ xas_set(xas, index);
+ entry = xas_load(xas);
+ if (entry)
+ continue;
+ else if (xas->xa_node && !xas->xa_node->count)
+ xas_delete_node(xas);
+ }
+}
+
static void update_node(struct xa_state *xas, struct xa_node *node,
int count, int values)
{
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 97d1b2824386..dd9d3f202c4b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2247,7 +2247,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
goto out;
rollback:
- /* Something went wrong: roll back page cache changes */
+ /* Something went wrong: roll back empty xa_node created by
+ * xas_create_range() and page cache changes
+ */
+ xas_destroy_range(&xas, start, end);
+
if (nr_none) {
xas_lock_irq(&xas);
mapping->nrpages -= nr_none;
>> Link:https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
>> Link:https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
>> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
>> Signed-off-by: Shardul Bankar<shardul.b@mpiricsoftware.com>
>> ---
>> v4:
>> - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
>> already checks xa_alloc internally.
>> v3:
>> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>> - Fix in library function makes API safer by default, preventing callers from needing
>> to remember cleanup
>> - Use shared cleanup label that both restore: and success: paths jump to
>> - Clean up unused spare node on both success and error exit paths
>> v2:
>> - Call xas_destroy() on both success and failure
>> - Explained retry semantics and xa_alloc / concurrency risk
>> - Dropped cleanup_empty_nodes from previous proposal
>>
>> lib/xarray.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/xarray.c b/lib/xarray.c
>> index 9a8b4916540c..f49ccfa5f57d 100644
>> --- a/lib/xarray.c
>> +++ b/lib/xarray.c
>> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
>> xas->xa_shift = shift;
>> xas->xa_sibs = sibs;
>> xas->xa_index = index;
>> - return;
>> + goto cleanup;
>> +
>> success:
>> xas->xa_index = index;
>> if (xas->xa_node)
>> xas_set_offset(xas);
>> +
>> +cleanup:
>> + /* Free any unused spare node from xas_nomem() */
>> + xas_destroy(xas);
>> }
>> EXPORT_SYMBOL_GPL(xas_create_range);
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
` (2 preceding siblings ...)
2025-12-15 2:19 ` Jinjiang Tu
@ 2025-12-31 6:29 ` Shardul Bankar
2026-01-26 5:46 ` Shardul B
2026-01-27 3:48 ` Matthew Wilcox
3 siblings, 2 replies; 12+ messages in thread
From: Shardul Bankar @ 2025-12-31 6:29 UTC (permalink / raw)
To: willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, david, janak, shardulsb08,
tujinjiang
Hi Matthew, Andrew,
Just a gentle ping on this one.
v4 has a Reviewed-by from David, and Dev and Jinjiang both followed up
with additional observations and ideas for related cleanups. As far as
I
can see, there are no outstanding objections to the current xas_nomem()
/
xas_create_range() spare-node fix.
If this looks good to you, could it be queued for inclusion via
whichever
tree you think is appropriate?
The separate question that Jinjiang raised about empty xa_nodes
installed
by xas_create_range() but never populated is being discussed in its own
bug-report thread here:
https://lore.kernel.org/all/86834731-02ba-43ea-9def-8b8ca156ec4a@huawei.com/
Once this patch is accepted/taken, I plan to follow up with a small
cleanup
patch that simplifies the label usage in xas_create_range() along the
lines
David suggested.
Thanks,
Shardul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-31 6:29 ` Shardul Bankar
@ 2026-01-26 5:46 ` Shardul B
2026-01-26 20:04 ` Andrew Morton
2026-01-27 3:48 ` Matthew Wilcox
1 sibling, 1 reply; 12+ messages in thread
From: Shardul B @ 2026-01-26 5:46 UTC (permalink / raw)
To: willy, akpm, linux-mm
Cc: linux-fsdevel, linux-kernel, dev.jain, david, janak, tujinjiang,
shardulsb08
[-- Attachment #1: Type: text/plain, Size: 437 bytes --]
Hi Matthew, Andrew,
Gently pinging this patch. I’ve checked the latest linux-next (next-20260123) and the akpm-unstable branches, and it doesn't appear to have been picked up yet.
The patch has a Reviewed-by from David Hildenbrand and fixes a memory leak in the XArray library identified by syzbot.
If there are no further concerns, could you please let me know if this is queued for the next cycle?
Thanks, Shardul
[-- Attachment #2: Type: text/html, Size: 1606 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2026-01-26 5:46 ` Shardul B
@ 2026-01-26 20:04 ` Andrew Morton
2026-01-27 1:12 ` Jinjiang Tu
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2026-01-26 20:04 UTC (permalink / raw)
To: Shardul B
Cc: willy, linux-mm, linux-fsdevel, linux-kernel, dev.jain, david,
janak, tujinjiang, shardulsb08
On Mon, 26 Jan 2026 11:16:08 +0530 Shardul B <shardul.b@mpiricsoftware.com> wrote:
> Hi Matthew, Andrew,
>
>
> Gently pinging this patch. I’ve checked the latest linux-next (next-20260123) and the akpm-unstable branches, and it doesn't appear to have been picked up yet.
>
>
> The patch has a Reviewed-by from David Hildenbrand and fixes a memory leak in the XArray library identified by syzbot.
>
>
> If there are no further concerns, could you please let me know if this is queued for the next cycle?
There are comments from Jinjiang Tu and Dev Jain which remain
unaddressed, please.
The leak is a rare and this is a minor problem, I believe?
I'll queue the patch for some exposure and testing but it appears that
some additional consideration is needed before it should be progressed
further.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2026-01-26 20:04 ` Andrew Morton
@ 2026-01-27 1:12 ` Jinjiang Tu
0 siblings, 0 replies; 12+ messages in thread
From: Jinjiang Tu @ 2026-01-27 1:12 UTC (permalink / raw)
To: Andrew Morton, Shardul B
Cc: willy, linux-mm, linux-fsdevel, linux-kernel, dev.jain, david,
janak, shardulsb08
在 2026/1/27 4:04, Andrew Morton 写道:
> On Mon, 26 Jan 2026 11:16:08 +0530 Shardul B <shardul.b@mpiricsoftware.com> wrote:
>
>> Hi Matthew, Andrew,
>>
>>
>> Gently pinging this patch. I’ve checked the latest linux-next (next-20260123) and the akpm-unstable branches, and it doesn't appear to have been picked up yet.
>>
>>
>> The patch has a Reviewed-by from David Hildenbrand and fixes a memory leak in the XArray library identified by syzbot.
>>
>>
>> If there are no further concerns, could you please let me know if this is queued for the next cycle?
> There are comments from Jinjiang Tu and Dev Jain which remain
> unaddressed, please.
Hi, Andrew
The issue is another problem, independent ofShardul's patch.
I have posted a patch to solve it.
https://lore.kernel.org/linux-mm/20260121062243.1893129-1-tujinjiang@huawei.com
> The leak is a rare and this is a minor problem, I believe?
>
> I'll queue the patch for some exposure and testing but it appears that
> some additional consideration is needed before it should be progressed
> further.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
2025-12-31 6:29 ` Shardul Bankar
2026-01-26 5:46 ` Shardul B
@ 2026-01-27 3:48 ` Matthew Wilcox
1 sibling, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2026-01-27 3:48 UTC (permalink / raw)
To: Shardul Bankar
Cc: akpm, linux-mm, linux-fsdevel, linux-kernel, dev.jain, david,
janak, shardulsb08, tujinjiang
On Wed, Dec 31, 2025 at 11:59:42AM +0530, Shardul Bankar wrote:
> Hi Matthew, Andrew,
pinging on december 31st is an entirely ineffective thing to do.
for one, oracle is in shutdown.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-01-27 3:48 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
2025-12-05 7:22 ` David Hildenbrand (Red Hat)
2025-12-05 10:51 ` Shardul Bankar
2025-12-08 11:36 ` David Hildenbrand (Red Hat)
2025-12-08 8:37 ` Dev Jain
2025-12-15 2:19 ` Jinjiang Tu
2025-12-15 3:42 ` Jinjiang Tu
2025-12-31 6:29 ` Shardul Bankar
2026-01-26 5:46 ` Shardul B
2026-01-26 20:04 ` Andrew Morton
2026-01-27 1:12 ` Jinjiang Tu
2026-01-27 3:48 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox