linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
@ 2025-12-04 14:26 Shardul Bankar
  2025-12-05  7:22 ` David Hildenbrand (Red Hat)
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Shardul Bankar @ 2025-12-04 14:26 UTC (permalink / raw)
  To: willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, david, shardulsb08, janak,
	Shardul Bankar

xas_create_range() is typically called in a retry loop that uses
xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
xa_node and store it in xas->xa_alloc for use in the retry.

If the lock is dropped after xas_nomem(), another thread can expand the
xarray tree in the meantime. On the next retry, xas_create_range() can
then succeed without consuming the spare node stored in xas->xa_alloc.
If the function returns without freeing this spare node, it leaks.

xas_create_range() calls xas_create() multiple times in a loop for
different index ranges. A spare node that isn't needed for one range
iteration might be needed for the next, so we cannot free it after each
xas_create() call. We can only safely free it after xas_create_range()
completes.

Fix this by calling xas_destroy() at the end of xas_create_range() to
free any unused spare node. This makes the API safer by default and
prevents callers from needing to remember cleanup.

This fixes a memory leak in mm/khugepaged.c and potentially other
callers that use xas_nomem() with xas_create_range().

Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Link: https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 v4:
 - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
   already checks xa_alloc internally.
 v3:
 - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
 - Fix in library function makes API safer by default, preventing callers from needing
   to remember cleanup
 - Use shared cleanup label that both restore: and success: paths jump to
 - Clean up unused spare node on both success and error exit paths
 v2:
 - Call xas_destroy() on both success and failure
 - Explained retry semantics and xa_alloc / concurrency risk
 - Dropped cleanup_empty_nodes from previous proposal

 lib/xarray.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..f49ccfa5f57d 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
 	xas->xa_shift = shift;
 	xas->xa_sibs = sibs;
 	xas->xa_index = index;
-	return;
+	goto cleanup;
+
 success:
 	xas->xa_index = index;
 	if (xas->xa_node)
 		xas_set_offset(xas);
+
+cleanup:
+	/* Free any unused spare node from xas_nomem() */
+	xas_destroy(xas);
 }
 EXPORT_SYMBOL_GPL(xas_create_range);
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
@ 2025-12-05  7:22 ` David Hildenbrand (Red Hat)
  2025-12-05 10:51   ` Shardul Bankar
  2025-12-08  8:37 ` Dev Jain
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-05  7:22 UTC (permalink / raw)
  To: Shardul Bankar, willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, shardulsb08, janak

On 12/4/25 15:26, Shardul Bankar wrote:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
> 
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
> 
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
> 
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
> 
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
> 
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link: https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>   v4:
>   - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
>     already checks xa_alloc internally.
>   v3:
>   - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>   - Fix in library function makes API safer by default, preventing callers from needing
>     to remember cleanup
>   - Use shared cleanup label that both restore: and success: paths jump to
>   - Clean up unused spare node on both success and error exit paths
>   v2:
>   - Call xas_destroy() on both success and failure
>   - Explained retry semantics and xa_alloc / concurrency risk
>   - Dropped cleanup_empty_nodes from previous proposal
> 
>   lib/xarray.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
>   	xas->xa_shift = shift;
>   	xas->xa_sibs = sibs;
>   	xas->xa_index = index;
> -	return;
> +	goto cleanup;
> +
>   success:
>   	xas->xa_index = index;
>   	if (xas->xa_node)
>   		xas_set_offset(xas);
> +
> +cleanup:
> +	/* Free any unused spare node from xas_nomem() */
> +	xas_destroy(xas);
>   }
>   EXPORT_SYMBOL_GPL(xas_create_range);
>   

Nothing jumped at me, except that the label situation is a bit
suboptimal.

Hoping Willy will take another look as well.

Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>


BTW, do we have a way to test this in a test case?


A follow-up cleanup that avoids labels could be something like (untested):


diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540cf..325f264530fb2 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -714,6 +714,7 @@ void xas_create_range(struct xa_state *xas)
         unsigned long index = xas->xa_index;
         unsigned char shift = xas->xa_shift;
         unsigned char sibs = xas->xa_sibs;
+       bool success = false;
  
         xas->xa_index |= ((sibs + 1UL) << shift) - 1;
         if (xas_is_node(xas) && xas->xa_node->shift == xas->xa_shift)
@@ -724,9 +725,11 @@ void xas_create_range(struct xa_state *xas)
         for (;;) {
                 xas_create(xas, true);
                 if (xas_error(xas))
-                       goto restore;
-               if (xas->xa_index <= (index | XA_CHUNK_MASK))
-                       goto success;
+                       break
+               if (xas->xa_index <= (index | XA_CHUNK_MASK)) {
+                       succeess = true;
+                       break;
+               }
                 xas->xa_index -= XA_CHUNK_SIZE;
  
                 for (;;) {
@@ -740,15 +743,17 @@ void xas_create_range(struct xa_state *xas)
                 }
         }
  
-restore:
-       xas->xa_shift = shift;
-       xas->xa_sibs = sibs;
-       xas->xa_index = index;
-       return;
-success:
-       xas->xa_index = index;
-       if (xas->xa_node)
-               xas_set_offset(xas);
+       if (success) {
+               xas->xa_index = index;
+               if (xas->xa_node)
+                       xas_set_offset(xas);
+       } else {
+               xas->xa_shift = shift;
+               xas->xa_sibs = sibs;
+               xas->xa_index = index;
+       }
+       /* Free any unused spare node from xas_nomem() */
+       xas_destroy(xas);
  }
  EXPORT_SYMBOL_GPL(xas_create_range);


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-05  7:22 ` David Hildenbrand (Red Hat)
@ 2025-12-05 10:51   ` Shardul Bankar
  2025-12-08 11:36     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 8+ messages in thread
From: Shardul Bankar @ 2025-12-05 10:51 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, janak, shardulsb08

On Fri, 2025-12-05 at 08:22 +0100, David Hildenbrand (Red Hat) wrote:
> > Link:
> > https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> ...
> Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
> 
> 
> BTW, do we have a way to test this in a test case?
Hi David,

Thanks for the review and the Reviewed-by.

Regarding a test case: I don’t have a focused selftest or fault-
injection setup yet that reliably hits this xas_nomem() +
xas_create_range() corner case. 
I noticed this spare-node leak while analyzing the Syzbot report I
referenced in the Link: tag, but the reproducer I see there doesn’t
isolate this path and reports other kmemleaks.

For now I’d prefer to treat this as a small correctness fix in xarray
itself. If I manage to come up with a robust way to exercise this path
in a selftest (e.g. via targeted fault injection in lib/test_xarray.c),
I can follow up with a separate patch, but I don’t have anything solid
to propose today.

> 
> 
> A follow-up cleanup that avoids labels could be something like
> (untested):
> 
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540cf..325f264530fb2 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -714,6 +714,7 @@ void xas_create_range(struct xa_state *xas)
>          unsigned long index = xas->xa_index;
>          unsigned char shift = xas->xa_shift;
>          unsigned char sibs = xas->xa_sibs;
> +       bool success = false;
>   
>          xas->xa_index |= ((sibs + 1UL) << shift) - 1;
>          if (xas_is_node(xas) && xas->xa_node->shift == xas-
> >xa_shift)
> @@ -724,9 +725,11 @@ void xas_create_range(struct xa_state *xas)
>          for (;;) {
>                  xas_create(xas, true);
>                  if (xas_error(xas))
> -                       goto restore;
> -               if (xas->xa_index <= (index | XA_CHUNK_MASK))
> -                       goto success;
> +                       break
> +               if (xas->xa_index <= (index | XA_CHUNK_MASK)) {
> +                       succeess = true;
> +                       break;
> +               }
>                  xas->xa_index -= XA_CHUNK_SIZE;
>   
>                  for (;;) {
> @@ -740,15 +743,17 @@ void xas_create_range(struct xa_state *xas)
>                  }
>          }
>   
> -restore:
> -       xas->xa_shift = shift;
> -       xas->xa_sibs = sibs;
> -       xas->xa_index = index;
> -       return;
> -success:
> -       xas->xa_index = index;
> -       if (xas->xa_node)
> -               xas_set_offset(xas);
> +       if (success) {
> +               xas->xa_index = index;
> +               if (xas->xa_node)
> +                       xas_set_offset(xas);
> +       } else {
> +               xas->xa_shift = shift;
> +               xas->xa_sibs = sibs;
> +               xas->xa_index = index;
> +       }
> +       /* Free any unused spare node from xas_nomem() */
> +       xas_destroy(xas);
>   }
>   EXPORT_SYMBOL_GPL(xas_create_range);
> 
> 
Your bool-based version reads nicer; I’m happy to follow up with a
small cleanup patch on top that switches xas_create_range() over to
that style (with a Suggested-by tag).

Thanks and Regards,
Shardul


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
  2025-12-05  7:22 ` David Hildenbrand (Red Hat)
@ 2025-12-08  8:37 ` Dev Jain
  2025-12-15  2:19 ` Jinjiang Tu
  2025-12-31  6:29 ` Shardul Bankar
  3 siblings, 0 replies; 8+ messages in thread
From: Dev Jain @ 2025-12-08  8:37 UTC (permalink / raw)
  To: Shardul Bankar, willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, david, shardulsb08, janak


On 04/12/25 7:56 pm, Shardul Bankar wrote:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the

Nit: "When the lock is dropped after xas_create_range()".

> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link: https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>   v4:
>   - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
>     already checks xa_alloc internally.
>   v3:
>   - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>   - Fix in library function makes API safer by default, preventing callers from needing
>     to remember cleanup
>   - Use shared cleanup label that both restore: and success: paths jump to
>   - Clean up unused spare node on both success and error exit paths
>   v2:
>   - Call xas_destroy() on both success and failure
>   - Explained retry semantics and xa_alloc / concurrency risk
>   - Dropped cleanup_empty_nodes from previous proposal
>
>   lib/xarray.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
>   	xas->xa_shift = shift;
>   	xas->xa_sibs = sibs;
>   	xas->xa_index = index;
> -	return;
> +	goto cleanup;
> +
>   success:
>   	xas->xa_index = index;
>   	if (xas->xa_node)
>   		xas_set_offset(xas);
> +
> +cleanup:
> +	/* Free any unused spare node from xas_nomem() */
> +	xas_destroy(xas);
>   }
>   EXPORT_SYMBOL_GPL(xas_create_range);

Since there are other code paths doing this fashion of xas_create/xas_store followed by
calling xas_nomem till the former succeeds, I believe we should handle this in the caller -
for example, I believe we need to fix xa_store_range() as well? It does xas_create, then
drops the lock in the same fashion, and I believe we have no choice here but to handle it
in the caller - we should not put xas_destroy inside xas_create.

Although for this particular case, we have only one caller of xas_create_range, so this
is fine really.


>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-05 10:51   ` Shardul Bankar
@ 2025-12-08 11:36     ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-08 11:36 UTC (permalink / raw)
  To: Shardul Bankar, willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, janak, shardulsb08

>> -restore:
>> -       xas->xa_shift = shift;
>> -       xas->xa_sibs = sibs;
>> -       xas->xa_index = index;
>> -       return;
>> -success:
>> -       xas->xa_index = index;
>> -       if (xas->xa_node)
>> -               xas_set_offset(xas);
>> +       if (success) {
>> +               xas->xa_index = index;
>> +               if (xas->xa_node)
>> +                       xas_set_offset(xas);
>> +       } else {
>> +               xas->xa_shift = shift;
>> +               xas->xa_sibs = sibs;
>> +               xas->xa_index = index;
>> +       }
>> +       /* Free any unused spare node from xas_nomem() */
>> +       xas_destroy(xas);
>>    }
>>    EXPORT_SYMBOL_GPL(xas_create_range);
>>
>>
> Your bool-based version reads nicer; I’m happy to follow up with a
> small cleanup patch on top that switches xas_create_range() over to
> that style (with a Suggested-by tag).

Yeah, feel free to send a cleanup out that removes some of these labels
(doesn't necessarily have to be what I proposed).

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
  2025-12-05  7:22 ` David Hildenbrand (Red Hat)
  2025-12-08  8:37 ` Dev Jain
@ 2025-12-15  2:19 ` Jinjiang Tu
  2025-12-15  3:42   ` Jinjiang Tu
  2025-12-31  6:29 ` Shardul Bankar
  3 siblings, 1 reply; 8+ messages in thread
From: Jinjiang Tu @ 2025-12-15  2:19 UTC (permalink / raw)
  To: Shardul Bankar, willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, david, shardulsb08, janak

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]


在 2025/12/4 22:26, Shardul Bankar 写道:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().

I encountered another memory leak issue in xas_create_range().

collapse_file() calls xas_create_range() to pre-create all slots needed.
If collapse_file() finally fails, these pre-created slots are empty nodes.
When the file is deleted, shmem_evict_inode()->shmem_truncate_range()->shmem_undo_range()
calls xas_store(&xas, NULL) for each entries to delete nodes, but leaving those pre-created
empty nodes leaked.

I can reproduce it with following steps.
1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file
2) memset for the first 2MB
3) madvise(MADV_COLLAPSE) for the second 2MB.
4) unlink the file

in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse
due to the whole 2M region is empty, leading to the new created empty nodes leaked.

To fix it, maybe we should add a new function xas_delete_range() to revert what xas_create_range()
does when xas_create_range() runs into rollback path?

>
> Link:https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link:https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar<shardul.b@mpiricsoftware.com>
> ---
>   v4:
>   - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
>     already checks xa_alloc internally.
>   v3:
>   - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>   - Fix in library function makes API safer by default, preventing callers from needing
>     to remember cleanup
>   - Use shared cleanup label that both restore: and success: paths jump to
>   - Clean up unused spare node on both success and error exit paths
>   v2:
>   - Call xas_destroy() on both success and failure
>   - Explained retry semantics and xa_alloc / concurrency risk
>   - Dropped cleanup_empty_nodes from previous proposal
>
>   lib/xarray.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
>   	xas->xa_shift = shift;
>   	xas->xa_sibs = sibs;
>   	xas->xa_index = index;
> -	return;
> +	goto cleanup;
> +
>   success:
>   	xas->xa_index = index;
>   	if (xas->xa_node)
>   		xas_set_offset(xas);
> +
> +cleanup:
> +	/* Free any unused spare node from xas_nomem() */
> +	xas_destroy(xas);
>   }
>   EXPORT_SYMBOL_GPL(xas_create_range);
>   

[-- Attachment #2: Type: text/html, Size: 4539 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-15  2:19 ` Jinjiang Tu
@ 2025-12-15  3:42   ` Jinjiang Tu
  0 siblings, 0 replies; 8+ messages in thread
From: Jinjiang Tu @ 2025-12-15  3:42 UTC (permalink / raw)
  To: Shardul Bankar, willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, david, shardulsb08, janak,
	Kefeng Wang


在 2025/12/15 10:19, Jinjiang Tu 写道:
>
>
> 在 2025/12/4 22:26, Shardul Bankar 写道:
>> xas_create_range() is typically called in a retry loop that uses
>> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
>> xa_node and store it in xas->xa_alloc for use in the retry.
>>
>> If the lock is dropped after xas_nomem(), another thread can expand the
>> xarray tree in the meantime. On the next retry, xas_create_range() can
>> then succeed without consuming the spare node stored in xas->xa_alloc.
>> If the function returns without freeing this spare node, it leaks.
>>
>> xas_create_range() calls xas_create() multiple times in a loop for
>> different index ranges. A spare node that isn't needed for one range
>> iteration might be needed for the next, so we cannot free it after each
>> xas_create() call. We can only safely free it after xas_create_range()
>> completes.
>>
>> Fix this by calling xas_destroy() at the end of xas_create_range() to
>> free any unused spare node. This makes the API safer by default and
>> prevents callers from needing to remember cleanup.
>>
>> This fixes a memory leak in mm/khugepaged.c and potentially other
>> callers that use xas_nomem() with xas_create_range().
> I encountered another memory leak issue in xas_create_range().
>
> collapse_file() calls xas_create_range() to pre-create all slots needed.
> If collapse_file() finally fails, these pre-created slots are empty nodes.
> When the file is deleted, shmem_evict_inode()->shmem_truncate_range()->shmem_undo_range()
> calls xas_store(&xas, NULL) for each entries to delete nodes, but leaving those pre-created
> empty nodes leaked.
>
> I can reproduce it with following steps.
> 1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file
> 2) memset for the first 2MB
> 3) madvise(MADV_COLLAPSE) for the second 2MB.
> 4) unlink the file
>
> in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse
> due to the whole 2M region is empty, leading to the new created empty nodes leaked.
>
> To fix it, maybe we should add a new function xas_delete_range() to revert what xas_create_range()
> does when xas_create_range() runs into rollback path?

How about the following diff? I tried it, and the memory leak disappears. I'm new in xarray, so
I don't if this fix works properly.

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index be850174e802..972df5ceeb84 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1555,6 +1555,7 @@ void xas_destroy(struct xa_state *);
  void xas_pause(struct xa_state *);
  
  void xas_create_range(struct xa_state *);
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end);
  
  #ifdef CONFIG_XARRAY_MULTI
  int xa_get_order(struct xarray *, unsigned long index);
diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..ab15dc939962 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -752,6 +752,21 @@ void xas_create_range(struct xa_state *xas)
  }
  EXPORT_SYMBOL_GPL(xas_create_range);
  
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end)
+{
+    unsigned long index;
+    void *entry;
+
+    for (index = start; index < end; ++index) {
+        xas_set(xas, index);
+        entry = xas_load(xas);
+        if (entry)
+            continue;
+        else if (xas->xa_node && !xas->xa_node->count)
+            xas_delete_node(xas);
+    }
+}
+
  static void update_node(struct xa_state *xas, struct xa_node *node,
          int count, int values)
  {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 97d1b2824386..dd9d3f202c4b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2247,7 +2247,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
      goto out;
  
  rollback:
-    /* Something went wrong: roll back page cache changes */
+    /* Something went wrong: roll back empty xa_node created by
+     * xas_create_range() and page cache changes
+     */
+    xas_destroy_range(&xas, start, end);
+
      if (nr_none) {
          xas_lock_irq(&xas);
          mapping->nrpages -= nr_none;

>> Link:https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
>> Link:https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
>> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
>> Signed-off-by: Shardul Bankar<shardul.b@mpiricsoftware.com>
>> ---
>>   v4:
>>   - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
>>     already checks xa_alloc internally.
>>   v3:
>>   - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>>   - Fix in library function makes API safer by default, preventing callers from needing
>>     to remember cleanup
>>   - Use shared cleanup label that both restore: and success: paths jump to
>>   - Clean up unused spare node on both success and error exit paths
>>   v2:
>>   - Call xas_destroy() on both success and failure
>>   - Explained retry semantics and xa_alloc / concurrency risk
>>   - Dropped cleanup_empty_nodes from previous proposal
>>
>>   lib/xarray.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/xarray.c b/lib/xarray.c
>> index 9a8b4916540c..f49ccfa5f57d 100644
>> --- a/lib/xarray.c
>> +++ b/lib/xarray.c
>> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
>>   	xas->xa_shift = shift;
>>   	xas->xa_sibs = sibs;
>>   	xas->xa_index = index;
>> -	return;
>> +	goto cleanup;
>> +
>>   success:
>>   	xas->xa_index = index;
>>   	if (xas->xa_node)
>>   		xas_set_offset(xas);
>> +
>> +cleanup:
>> +	/* Free any unused spare node from xas_nomem() */
>> +	xas_destroy(xas);
>>   }
>>   EXPORT_SYMBOL_GPL(xas_create_range);
>>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
  2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
                   ` (2 preceding siblings ...)
  2025-12-15  2:19 ` Jinjiang Tu
@ 2025-12-31  6:29 ` Shardul Bankar
  3 siblings, 0 replies; 8+ messages in thread
From: Shardul Bankar @ 2025-12-31  6:29 UTC (permalink / raw)
  To: willy, akpm, linux-mm
  Cc: linux-fsdevel, linux-kernel, dev.jain, david, janak, shardulsb08,
	tujinjiang

Hi Matthew, Andrew,

Just a gentle ping on this one.

v4 has a Reviewed-by from David, and Dev and Jinjiang both followed up
with additional observations and ideas for related cleanups. As far as
I
can see, there are no outstanding objections to the current xas_nomem()
/
xas_create_range() spare-node fix.

If this looks good to you, could it be queued for inclusion via
whichever
tree you think is appropriate?

The separate question that Jinjiang raised about empty xa_nodes
installed
by xas_create_range() but never populated is being discussed in its own
bug-report thread here:
 
https://lore.kernel.org/all/86834731-02ba-43ea-9def-8b8ca156ec4a@huawei.com/

Once this patch is accepted/taken, I plan to follow up with a small
cleanup
patch that simplifies the label usage in xas_create_range() along the
lines
David suggested.

Thanks,
Shardul




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-31  6:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-04 14:26 [PATCH v4] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
2025-12-05  7:22 ` David Hildenbrand (Red Hat)
2025-12-05 10:51   ` Shardul Bankar
2025-12-08 11:36     ` David Hildenbrand (Red Hat)
2025-12-08  8:37 ` Dev Jain
2025-12-15  2:19 ` Jinjiang Tu
2025-12-15  3:42   ` Jinjiang Tu
2025-12-31  6:29 ` Shardul Bankar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox