On 24/11/25 3:32 pm, David Hildenbrand (Red Hat) wrote:
On 11/23/25 14:27, Shardul Bankar wrote:
When collapse_file() fails after xas_create_range() succeeds, the
rollback path does not clean up pre-allocated XArray nodes stored in
xas->xa_alloc. These nodes are allocated by xas_nomem() when
xas_create() fails with GFP_NOWAIT and need to be freed.

The leak occurs because:
1. xas_create_range() may call xas_nomem() which allocates a node
    and stores it in xas->xa_alloc

Do you mean that, if xas_create_range() failed, collapse_file() will call xas_nomem() to preallocate memory?

I don't immediately see how xas_create_range() would call xas_nomem().

2. If the collapse operation fails later, the rollback path jumps
    to the 'rollback:' label
3. The rollback path cleans up folios but does not call xas_destroy()
    to free the pre-allocated nodes in xas->xa_alloc

Note that after we call xas_nomem(), we retry xas_create_range() -- that previously failed to to -ENOMEM.

So the assumption is that the xas_create_range() call would consume that memory.

I'm sure there is some corner case where it is not the case (some concurrent action? not sure) 

Someone else can put slots in the xarray since we dropped the lock. I cannot prove this, but surely

disproving this is harder : )




Fix this by calling xas_destroy(&xas) at the beginning of the rollback
path to free any pre-allocated nodes. This is safe because xas_destroy()
only frees nodes in xas->xa_alloc that were never inserted into the
XArray tree.

Shouldn't we just call xas_destroy() in any case, also when everything succeeded? 

Yeah you are right. We should probably do

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..0794a99c807f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
do {
xas_lock_irq(&xas);
xas_create_range(&xas);
- if (!xas_error(&xas))
+ if (!xas_error(&xas)) {
+ xas_destroy(&xas);
break;
+ }
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL)) {
result = SCAN_FAIL;
+ xas_destroy(&xas);
goto rollback;
}
} while (1);