linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path
@ 2025-11-23 13:27 Shardul Bankar
  2025-11-23 14:49 ` Dev Jain
  2025-11-24 10:02 ` David Hildenbrand (Red Hat)
  0 siblings, 2 replies; 12+ messages in thread
From: Shardul Bankar @ 2025-11-23 13:27 UTC (permalink / raw)
  To: shardulsb08
  Cc: linux-kernel, linux-mm, syzbot+a785d07959bc94837d51, akpm, david,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, janak, shardul.b

When collapse_file() fails after xas_create_range() succeeds, the
rollback path does not clean up pre-allocated XArray nodes stored in
xas->xa_alloc. These nodes are allocated by xas_nomem() when
xas_create() fails with GFP_NOWAIT and need to be freed.

The leak occurs because:
1. xas_create_range() may call xas_nomem() which allocates a node
   and stores it in xas->xa_alloc
2. If the collapse operation fails later, the rollback path jumps
   to the 'rollback:' label
3. The rollback path cleans up folios but does not call xas_destroy()
   to free the pre-allocated nodes in xas->xa_alloc

Fix this by calling xas_destroy(&xas) at the beginning of the rollback
path to free any pre-allocated nodes. This is safe because xas_destroy()
only frees nodes in xas->xa_alloc that were never inserted into the
XArray tree.

Note: This fixes the leak of pre-allocated nodes. A separate fix will
be needed to clean up empty nodes that were inserted into the tree by
xas_create_range() but never populated.

Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 mm/khugepaged.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..f2fe7924afa0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2230,6 +2230,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	goto out;
 
 rollback:
+	xas_destroy(&xas);
 	/* Something went wrong: roll back page cache changes */
 	if (nr_none) {
 		xas_lock_irq(&xas);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path
  2025-11-23 13:27 [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path Shardul Bankar
@ 2025-11-23 14:49 ` Dev Jain
  2025-11-24 10:02 ` David Hildenbrand (Red Hat)
  1 sibling, 0 replies; 12+ messages in thread
From: Dev Jain @ 2025-11-23 14:49 UTC (permalink / raw)
  To: Shardul Bankar, shardulsb08
  Cc: linux-kernel, linux-mm, syzbot+a785d07959bc94837d51, akpm, david,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, baohua, lance.yang, janak


On 23/11/25 6:57 pm, Shardul Bankar wrote:
> When collapse_file() fails after xas_create_range() succeeds, the
> rollback path does not clean up pre-allocated XArray nodes stored in
> xas->xa_alloc. These nodes are allocated by xas_nomem() when
> xas_create() fails with GFP_NOWAIT and need to be freed.
>
> The leak occurs because:
> 1. xas_create_range() may call xas_nomem() which allocates a node
>     and stores it in xas->xa_alloc
> 2. If the collapse operation fails later, the rollback path jumps
>     to the 'rollback:' label
> 3. The rollback path cleans up folios but does not call xas_destroy()
>     to free the pre-allocated nodes in xas->xa_alloc
>
> Fix this by calling xas_destroy(&xas) at the beginning of the rollback
> path to free any pre-allocated nodes. This is safe because xas_destroy()
> only frees nodes in xas->xa_alloc that were never inserted into the
> XArray tree.
>
> Note: This fixes the leak of pre-allocated nodes. A separate fix will
> be needed to clean up empty nodes that were inserted into the tree by
> xas_create_range() but never populated.

No "fix" is needed in this case, the empty nodes are there in the tree
and there is no leak.

>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>   mm/khugepaged.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..f2fe7924afa0 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2230,6 +2230,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>   	goto out;
>   
>   rollback:
> +	xas_destroy(&xas);
>   	/* Something went wrong: roll back page cache changes */
>   	if (nr_none) {
>   		xas_lock_irq(&xas);

I believe that the correct explanation of this patch is - after xas_nomem, if someone else
expands the tree and the retry for xas_create_range() trivially succeeds without using the
node allocated from xas_nomem, then we need to free that node. Although now I am worried
about other users of xas_nomem() which do not call xas_destroy() on failure path...

I think this should work,

Reviewed-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path
  2025-11-23 13:27 [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path Shardul Bankar
  2025-11-23 14:49 ` Dev Jain
@ 2025-11-24 10:02 ` David Hildenbrand (Red Hat)
  2025-11-24 11:46   ` Dev Jain
  1 sibling, 1 reply; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-24 10:02 UTC (permalink / raw)
  To: Shardul Bankar, shardulsb08
  Cc: linux-kernel, linux-mm, syzbot+a785d07959bc94837d51, akpm,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, janak

On 11/23/25 14:27, Shardul Bankar wrote:
> When collapse_file() fails after xas_create_range() succeeds, the
> rollback path does not clean up pre-allocated XArray nodes stored in
> xas->xa_alloc. These nodes are allocated by xas_nomem() when
> xas_create() fails with GFP_NOWAIT and need to be freed.
> 
> The leak occurs because:
> 1. xas_create_range() may call xas_nomem() which allocates a node
>     and stores it in xas->xa_alloc

Do you mean that, if xas_create_range() failed, collapse_file() will 
call xas_nomem() to preallocate memory?

I don't immediately see how xas_create_range() would call xas_nomem().

> 2. If the collapse operation fails later, the rollback path jumps
>     to the 'rollback:' label
> 3. The rollback path cleans up folios but does not call xas_destroy()
>     to free the pre-allocated nodes in xas->xa_alloc

Note that after we call xas_nomem(), we retry xas_create_range() -- that 
previously failed to to -ENOMEM.

So the assumption is that the xas_create_range() call would consume that 
memory.

I'm sure there is some corner case where it is not the case (some 
concurrent action? not sure)

> 
> Fix this by calling xas_destroy(&xas) at the beginning of the rollback
> path to free any pre-allocated nodes. This is safe because xas_destroy()
> only frees nodes in xas->xa_alloc that were never inserted into the
> XArray tree.

Shouldn't we just call xas_destroy() in any case, also when everything 
succeeded?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path
  2025-11-24 10:02 ` David Hildenbrand (Red Hat)
@ 2025-11-24 11:46   ` Dev Jain
  2025-11-24 15:23     ` Shardul Bankar
  0 siblings, 1 reply; 12+ messages in thread
From: Dev Jain @ 2025-11-24 11:46 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), Shardul Bankar, shardulsb08
  Cc: linux-kernel, linux-mm, syzbot+a785d07959bc94837d51, akpm,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, baohua, lance.yang, janak

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]


On 24/11/25 3:32 pm, David Hildenbrand (Red Hat) wrote:
> On 11/23/25 14:27, Shardul Bankar wrote:
>> When collapse_file() fails after xas_create_range() succeeds, the
>> rollback path does not clean up pre-allocated XArray nodes stored in
>> xas->xa_alloc. These nodes are allocated by xas_nomem() when
>> xas_create() fails with GFP_NOWAIT and need to be freed.
>>
>> The leak occurs because:
>> 1. xas_create_range() may call xas_nomem() which allocates a node
>>     and stores it in xas->xa_alloc
>
> Do you mean that, if xas_create_range() failed, collapse_file() will 
> call xas_nomem() to preallocate memory?
>
> I don't immediately see how xas_create_range() would call xas_nomem().
>
>> 2. If the collapse operation fails later, the rollback path jumps
>>     to the 'rollback:' label
>> 3. The rollback path cleans up folios but does not call xas_destroy()
>>     to free the pre-allocated nodes in xas->xa_alloc
>
> Note that after we call xas_nomem(), we retry xas_create_range() -- 
> that previously failed to to -ENOMEM.
>
> So the assumption is that the xas_create_range() call would consume 
> that memory.
>
> I'm sure there is some corner case where it is not the case (some 
> concurrent action? not sure)

Someone else can put slots in the xarray since we dropped the lock. I 
cannot prove this, but surely

disproving this is harder : )


>
>>
>> Fix this by calling xas_destroy(&xas) at the beginning of the rollback
>> path to free any pre-allocated nodes. This is safe because xas_destroy()
>> only frees nodes in xas->xa_alloc that were never inserted into the
>> XArray tree.
>
> Shouldn't we just call xas_destroy() in any case, also when everything 
> succeeded?

Yeah you are right. We should probably do

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..0794a99c807f100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1872,11+1872,14@@ staticintcollapse_file(structmm_struct *mm, 
unsignedlongaddr,
do{
xas_lock_irq(&xas);
xas_create_range(&xas);
- if(!xas_error(&xas))
+ if(!xas_error(&xas)) {
+ xas_destroy(&xas);
break;
+ }
xas_unlock_irq(&xas);
if(!xas_nomem(&xas, GFP_KERNEL)) {
result = SCAN_FAIL;
+ xas_destroy(&xas);
gotorollback;
}
} while(1);


[-- Attachment #2: Type: text/html, Size: 7279 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path
  2025-11-24 11:46   ` Dev Jain
@ 2025-11-24 15:23     ` Shardul Bankar
  2025-11-24 16:11       ` [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop Shardul Bankar
  0 siblings, 1 reply; 12+ messages in thread
From: Shardul Bankar @ 2025-11-24 15:23 UTC (permalink / raw)
  To: Dev Jain, David Hildenbrand (Red Hat)
  Cc: linux-kernel, linux-mm, syzbot+a785d07959bc94837d51, akpm,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, baohua, lance.yang, janak, shardulsb08

Hi David, Dev,

Thanks for the clarification, that really helped straighten things out.

On Mon, 2025-11-24 at 17:16 +0530, Dev Jain wrote:
> 
> On 24/11/25 3:32 pm, David Hildenbrand (Red Hat) wrote:
> > 
> >  Do you mean that, if xas_create_range() failed, collapse_file()
> > will call xas_nomem() to preallocate memory? 

Yes that's correct.

> >  I don't immediately see how xas_create_range() would call
> > xas_nomem(). 

xas_create_range() indeed doesn't call xas_nomem() internally. The
control flow is:

  xas_create_range(&xas)
    -> xas_create()
        -> may set XA_ERROR(-ENOMEM)
  collapse_file() detects xas_error()
    -> calls xas_nomem()
        -> allocates a spare node and stores it in xas->xa_alloc
> > 

> >  Note that after we call xas_nomem(), we retry xas_create_range() -
> > - that previously failed to to -ENOMEM. 
> > 
> >  So the assumption is that the xas_create_range() call would
> > consume that memory. 
> > 
> >  I'm sure there is some corner case where it is not the case (some
> > concurrent action? not sure) 
>  
> Someone else can put slots in the xarray since we dropped the lock. I
> cannot prove this, but surely
> disproving this is harder : )

As Dev pointed out, after xas_nomem(), another thread may expand the
xarray while the lock is dropped. In that case, the next
xas_create_range() retry could succeed without consuming xa_alloc, so
xas_destroy() should clear the unused spare node. Calling xas_destroy()
on all exit paths therefore seems correct and safe.

> >  Shouldn't we just call xas_destroy() in any case, also when
> > everything succeeded? 
>  
> Yeah you are right. We should probably do
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..0794a99c807f100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1872,11 +1872,14 @@ staticintcollapse_file(struct mm_struct *mm,
> unsignedlongaddr,
> do {
> xas_lock_irq(&xas);
> xas_create_range(&xas);
> - if (!xas_error(&xas))
> + if (!xas_error(&xas)) {
> + xas_destroy(&xas);
> break;
> + }
> xas_unlock_irq(&xas);
> if (!xas_nomem(&xas, GFP_KERNEL)) {
>  result = SCAN_FAIL;
> + xas_destroy(&xas);
> goto rollback;
>  }
>  } while (1);


I’ll prepare v2 implementing Dev’s suggestion and update the commit
message accordingly.

Thanks and Regards,
Shardul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop
  2025-11-24 15:23     ` Shardul Bankar
@ 2025-11-24 16:11       ` Shardul Bankar
  2025-11-24 16:21         ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Shardul Bankar @ 2025-11-24 16:11 UTC (permalink / raw)
  To: linux-mm, dev.jain, david
  Cc: linux-kernel, syzbot+a785d07959bc94837d51, akpm, lorenzo.stoakes,
	ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	lance.yang, janak, shardul.b, shardulsb08

collapse_file() uses xas_create_range() in a retry loop that calls
xas_nomem() on -ENOMEM and then retries. xas_nomem() may allocate a
spare xa_node and store it in xas->xa_alloc.

If the lock is dropped after xas_nomem(), another thread can expand
the xarray tree in the meantime. On the next retry, xas_create_range()
can then succeed trivially without consuming the node stored in
xas->xa_alloc. If we then either succeed or give up and go to the
rollback path without calling xas_destroy(), that spare node leaks.

Fix this by calling xas_destroy(&xas) in both the success case
(!xas_error(&xas)) and the failure case where xas_nomem() returns
false and we abort. xas_destroy() will free any unused spare node in
xas->xa_alloc and is a no-op if there is nothing left to free.

Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 v2:
 - Call xas_destroy() on both success and failure
 - Explained retry semantics and xa_alloc / concurrency risk
 - Dropped cleanup_empty_nodes from previous proposal

 mm/khugepaged.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..0794a99c807f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	do {
 		xas_lock_irq(&xas);
 		xas_create_range(&xas);
-		if (!xas_error(&xas))
+		if (!xas_error(&xas)) {
+			xas_destroy(&xas);
 			break;
+		}
 		xas_unlock_irq(&xas);
 		if (!xas_nomem(&xas, GFP_KERNEL)) {
 			result = SCAN_FAIL;
+			xas_destroy(&xas);
 			goto rollback;
 		}
 	} while (1);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop
  2025-11-24 16:11       ` [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop Shardul Bankar
@ 2025-11-24 16:21         ` Matthew Wilcox
  2025-11-24 17:37           ` Shardul Bankar
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2025-11-24 16:21 UTC (permalink / raw)
  To: Shardul Bankar
  Cc: linux-mm, dev.jain, david, linux-kernel,
	syzbot+a785d07959bc94837d51, akpm, lorenzo.stoakes, ziy,
	baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	lance.yang, janak, shardulsb08

On Mon, Nov 24, 2025 at 09:41:49PM +0530, Shardul Bankar wrote:
> collapse_file() uses xas_create_range() in a retry loop that calls
> xas_nomem() on -ENOMEM and then retries. xas_nomem() may allocate a
> spare xa_node and store it in xas->xa_alloc.
> 
> If the lock is dropped after xas_nomem(), another thread can expand
> the xarray tree in the meantime. On the next retry, xas_create_range()
> can then succeed trivially without consuming the node stored in
> xas->xa_alloc. If we then either succeed or give up and go to the
> rollback path without calling xas_destroy(), that spare node leaks.

Then wouldn't freeing the excess node in xas_create_range() be the
correct fix, instead of requiring the caller to think about this?

> Fix this by calling xas_destroy(&xas) in both the success case
> (!xas_error(&xas)) and the failure case where xas_nomem() returns
> false and we abort. xas_destroy() will free any unused spare node in
> xas->xa_alloc and is a no-op if there is nothing left to free.
> 
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>  v2:
>  - Call xas_destroy() on both success and failure
>  - Explained retry semantics and xa_alloc / concurrency risk
>  - Dropped cleanup_empty_nodes from previous proposal
> 
>  mm/khugepaged.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..0794a99c807f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  	do {
>  		xas_lock_irq(&xas);
>  		xas_create_range(&xas);
> -		if (!xas_error(&xas))
> +		if (!xas_error(&xas)) {
> +			xas_destroy(&xas);
>  			break;
> +		}
>  		xas_unlock_irq(&xas);
>  		if (!xas_nomem(&xas, GFP_KERNEL)) {
>  			result = SCAN_FAIL;
> +			xas_destroy(&xas);
>  			goto rollback;
>  		}
>  	} while (1);
> -- 
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop
  2025-11-24 16:21         ` Matthew Wilcox
@ 2025-11-24 17:37           ` Shardul Bankar
  2025-12-01  7:45             ` [PATCH v3] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
  0 siblings, 1 reply; 12+ messages in thread
From: Shardul Bankar @ 2025-11-24 17:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, dev.jain, david, linux-kernel,
	syzbot+a785d07959bc94837d51, akpm, lorenzo.stoakes, ziy,
	baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	lance.yang, janak, shardulsb08

On Mon, 2025-11-24 at 16:21 +0000, Matthew Wilcox wrote:
> 
> Then wouldn't freeing the excess node in xas_create_range() be the
> correct fix, instead of requiring the caller to think about this?
> 
> 
Hi Matthew,

Thanks for the feedback. Agreed, this is better fixed inside xarray
instead of in collapse_file(), so callers don’t need to think about
xas_destroy() at all.

Looking at the internals, xas_nomem() only allocates a spare node into
xas->xa_alloc and xas_alloc() consumes it only if it is required. The
only point where we know that the retry loop is truly finished is after
xas_create_range() (or xas_create()) succeeds — at that point, any
remaining xa_alloc must be unused.

So to align API expectations, I’m trying to understand where you would
prefer to enforce the invariant:

  - In xas_create_range() after success, ensuring no spare remains?
  - Or in xas_create(), so that non-range callers benefit as well?

Once that API boundary is clear, I can prepare a v3 that moves the fix
into lib/xarray.c.

Thanks,
Shardul



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] lib: xarray: free unused spare node in xas_create_range()
  2025-11-24 17:37           ` Shardul Bankar
@ 2025-12-01  7:45             ` Shardul Bankar
  2025-12-01  8:39               ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 12+ messages in thread
From: Shardul Bankar @ 2025-12-01  7:45 UTC (permalink / raw)
  To: willy, linux-mm, akpm
  Cc: dev.jain, david, linux-fsdevel, linux-kernel, shardulsb08, janak,
	Shardul Bankar

xas_create_range() is typically called in a retry loop that uses
xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
xa_node and store it in xas->xa_alloc for use in the retry.

If the lock is dropped after xas_nomem(), another thread can expand the
xarray tree in the meantime. On the next retry, xas_create_range() can
then succeed without consuming the spare node stored in xas->xa_alloc.
If the function returns without freeing this spare node, it leaks.

xas_create_range() calls xas_create() multiple times in a loop for
different index ranges. A spare node that isn't needed for one range
iteration might be needed for the next, so we cannot free it after each
xas_create() call. We can only safely free it after xas_create_range()
completes.

Fix this by calling xas_destroy() at the end of xas_create_range() to
free any unused spare node. This makes the API safer by default and
prevents callers from needing to remember cleanup.

This fixes a memory leak in mm/khugepaged.c and potentially other
callers that use xas_nomem() with xas_create_range().

Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 v3:
 - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
 - Fix in library function makes API safer by default, preventing callers from needing
   to remember cleanup
 - Use shared cleanup label that both restore: and success: paths jump to
 - Clean up unused spare node on both success and error exit paths
 v2:
 - Call xas_destroy() on both success and failure
 - Explained retry semantics and xa_alloc / concurrency risk
 - Dropped cleanup_empty_nodes from previous proposal
 lib/xarray.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..a924421c0c4c 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
 	xas->xa_shift = shift;
 	xas->xa_sibs = sibs;
 	xas->xa_index = index;
-	return;
+	goto cleanup;
+
 success:
 	xas->xa_index = index;
 	if (xas->xa_node)
 		xas_set_offset(xas);
+
+cleanup:
+	/* Free any unused spare node from xas_nomem() */
+	if (xas->xa_alloc)
+		xas_destroy(xas);
 }
 EXPORT_SYMBOL_GPL(xas_create_range);
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] lib: xarray: free unused spare node in xas_create_range()
  2025-12-01  7:45             ` [PATCH v3] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
@ 2025-12-01  8:39               ` David Hildenbrand (Red Hat)
  2025-12-04 14:15                 ` Shardul Bankar
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-01  8:39 UTC (permalink / raw)
  To: Shardul Bankar, willy, linux-mm, akpm
  Cc: dev.jain, linux-fsdevel, linux-kernel, shardulsb08, janak

On 12/1/25 08:45, Shardul Bankar wrote:

Please don't post new versions as reply to old versions.

> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
> 
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
> 
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
> 
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
> 
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
> 
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>   v3:
>   - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>   - Fix in library function makes API safer by default, preventing callers from needing
>     to remember cleanup
>   - Use shared cleanup label that both restore: and success: paths jump to
>   - Clean up unused spare node on both success and error exit paths
>   v2:
>   - Call xas_destroy() on both success and failure
>   - Explained retry semantics and xa_alloc / concurrency risk
>   - Dropped cleanup_empty_nodes from previous proposal
>   lib/xarray.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..a924421c0c4c 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
>   	xas->xa_shift = shift;
>   	xas->xa_sibs = sibs;
>   	xas->xa_index = index;
> -	return;
> +	goto cleanup;
> +
>   success:
>   	xas->xa_index = index;
>   	if (xas->xa_node)
>   		xas_set_offset(xas);
> +
> +cleanup:
> +	/* Free any unused spare node from xas_nomem() */
> +	if (xas->xa_alloc)
> +		xas_destroy(xas);

The first thing xas_destroy() does is check whether xa_alloc is set.

I'd assume that the compiler is smart enough to inline xas_destroy() 
completely here, so likely the xa_alloc check here can just be dropped.


Staring at xas_destroy() callers, we only have a single one outside of 
lib: mm/huge_memory.c:__folio_split()

Is that one still required?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] lib: xarray: free unused spare node in xas_create_range()
  2025-12-01  8:39               ` David Hildenbrand (Red Hat)
@ 2025-12-04 14:15                 ` Shardul Bankar
  2025-12-04 21:15                   ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 12+ messages in thread
From: Shardul Bankar @ 2025-12-04 14:15 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), willy, linux-mm, akpm
  Cc: dev.jain, linux-fsdevel, linux-kernel, janak, shardulsb08

On Mon, 2025-12-01 at 09:39 +0100, David Hildenbrand (Red Hat) wrote:
> Please don't post new versions as reply to old versions.
> ...
> 
> ...
> The first thing xas_destroy() does is check whether xa_alloc is set.
> 
> I'd assume that the compiler is smart enough to inline xas_destroy() 
> completely here, so likely the xa_alloc check here can just be
> dropped.

Got it, will share a v4 of the patch on a new chain with redundant
xas_destroy() removed.

> Staring at xas_destroy() callers, we only have a single one outside
> of 
> lib: mm/huge_memory.c:__folio_split()
> 
> Is that one still required?

I checked the callers of xas_destroy(). Apart from the internal uses in
lib/xarray.c and the unit tests in lib/test_xarray.c, the only external
user is indeed mm/huge_memory.c:__folio_split().

That path is slightly different from the xas_nomem() retry loop I fixed
in xas_create_range():

	__folio_split() goes through xas_split_alloc() and then
xas_split() / xas_try_split(), which allocate and consume nodes via
xas->xa_alloc.

	The final xas_destroy(&xas) in __folio_split() is there to
drop any leftover split-allocation nodes, not the xas_nomem() spare
node I handled in xas_create_range().

So with the current code I don’t think I can safely declare that
xas_destroy() in __folio_split() is redundant- it still acts as the
last cleanup for the split helpers.

For v4 I’d therefore like to keep the scope focused on the syzkaller
leak and just drop the redundant "if (xa_alloc)" around xas_destroy()
in xas_create_range() as you suggested.

Separately, I agree it would be cleaner if the split helpers guaranteed
that xa_alloc is always cleared on return, so callers never have to
think about xas_destroy(). I can take a closer look at xas_split() /
xas_try_split() and, if it looks sound, propose a small follow-up
series that makes their cleanup behaviour explicit and then removes the
xas_destroy() from __folio_split().

Thanks,
Shardul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] lib: xarray: free unused spare node in xas_create_range()
  2025-12-04 14:15                 ` Shardul Bankar
@ 2025-12-04 21:15                   ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-04 21:15 UTC (permalink / raw)
  To: Shardul Bankar, willy, linux-mm, akpm
  Cc: dev.jain, linux-fsdevel, linux-kernel, janak, shardulsb08

On 12/4/25 15:15, Shardul Bankar wrote:
> On Mon, 2025-12-01 at 09:39 +0100, David Hildenbrand (Red Hat) wrote:
>> Please don't post new versions as reply to old versions.
>> ...
>>
>> ...
>> The first thing xas_destroy() does is check whether xa_alloc is set.
>>
>> I'd assume that the compiler is smart enough to inline xas_destroy()
>> completely here, so likely the xa_alloc check here can just be
>> dropped.
> 
> Got it, will share a v4 of the patch on a new chain with redundant
> xas_destroy() removed.
> 
>> Staring at xas_destroy() callers, we only have a single one outside
>> of
>> lib: mm/huge_memory.c:__folio_split()
>>
>> Is that one still required?
> 
> I checked the callers of xas_destroy(). Apart from the internal uses in
> lib/xarray.c and the unit tests in lib/test_xarray.c, the only external
> user is indeed mm/huge_memory.c:__folio_split().
> 
> That path is slightly different from the xas_nomem() retry loop I fixed
> in xas_create_range():
> 
> 	__folio_split() goes through xas_split_alloc() and then
> xas_split() / xas_try_split(), which allocate and consume nodes via
> xas->xa_alloc.
> 
> 	The final xas_destroy(&xas) in __folio_split() is there to
> drop any leftover split-allocation nodes, not the xas_nomem() spare
> node I handled in xas_create_range().
> 
> So with the current code I don’t think I can safely declare that
> xas_destroy() in __folio_split() is redundant- it still acts as the
> last cleanup for the split helpers.
> 
> For v4 I’d therefore like to keep the scope focused on the syzkaller
> leak and just drop the redundant "if (xa_alloc)" around xas_destroy()
> in xas_create_range() as you suggested.
> 
> Separately, I agree it would be cleaner if the split helpers guaranteed
> that xa_alloc is always cleared on return, so callers never have to
> think about xas_destroy(). I can take a closer look at xas_split() /
> xas_try_split() and, if it looks sound, propose a small follow-up
> series that makes their cleanup behaviour explicit and then removes the
> xas_destroy() from __folio_split().

That makes sense, thanks. Handling it internally is certainly harder to 
mess up by callers ...

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-12-04 21:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-23 13:27 [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path Shardul Bankar
2025-11-23 14:49 ` Dev Jain
2025-11-24 10:02 ` David Hildenbrand (Red Hat)
2025-11-24 11:46   ` Dev Jain
2025-11-24 15:23     ` Shardul Bankar
2025-11-24 16:11       ` [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop Shardul Bankar
2025-11-24 16:21         ` Matthew Wilcox
2025-11-24 17:37           ` Shardul Bankar
2025-12-01  7:45             ` [PATCH v3] lib: xarray: free unused spare node in xas_create_range() Shardul Bankar
2025-12-01  8:39               ` David Hildenbrand (Red Hat)
2025-12-04 14:15                 ` Shardul Bankar
2025-12-04 21:15                   ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox