在 2025/12/4 22:26, Shardul Bankar 写道: > xas_create_range() is typically called in a retry loop that uses > xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare > xa_node and store it in xas->xa_alloc for use in the retry. > > If the lock is dropped after xas_nomem(), another thread can expand the > xarray tree in the meantime. On the next retry, xas_create_range() can > then succeed without consuming the spare node stored in xas->xa_alloc. > If the function returns without freeing this spare node, it leaks. > > xas_create_range() calls xas_create() multiple times in a loop for > different index ranges. A spare node that isn't needed for one range > iteration might be needed for the next, so we cannot free it after each > xas_create() call. We can only safely free it after xas_create_range() > completes. > > Fix this by calling xas_destroy() at the end of xas_create_range() to > free any unused spare node. This makes the API safer by default and > prevents callers from needing to remember cleanup. > > This fixes a memory leak in mm/khugepaged.c and potentially other > callers that use xas_nomem() with xas_create_range(). I encountered another memory leak issue in xas_create_range(). collapse_file() calls xas_create_range() to pre-create all slots needed. If collapse_file() finally fails, these pre-created slots are empty nodes. When the file is deleted, shmem_evict_inode()->shmem_truncate_range()->shmem_undo_range() calls xas_store(&xas, NULL) for each entries to delete nodes, but leaving those pre-created empty nodes leaked. I can reproduce it with following steps. 1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file 2) memset for the first 2MB 3) madvise(MADV_COLLAPSE) for the second 2MB. 4) unlink the file in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse due to the whole 2M region is empty, leading to the new created empty nodes leaked. To fix it, maybe we should add a new function xas_delete_range() to revert what xas_create_range() does when xas_create_range() runs into rollback path? > > Link:https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c > Link:https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3") > Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow") > Signed-off-by: Shardul Bankar > --- > v4: > - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy() > already checks xa_alloc internally. > v3: > - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox > - Fix in library function makes API safer by default, preventing callers from needing > to remember cleanup > - Use shared cleanup label that both restore: and success: paths jump to > - Clean up unused spare node on both success and error exit paths > v2: > - Call xas_destroy() on both success and failure > - Explained retry semantics and xa_alloc / concurrency risk > - Dropped cleanup_empty_nodes from previous proposal > > lib/xarray.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/lib/xarray.c b/lib/xarray.c > index 9a8b4916540c..f49ccfa5f57d 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas) > xas->xa_shift = shift; > xas->xa_sibs = sibs; > xas->xa_index = index; > - return; > + goto cleanup; > + > success: > xas->xa_index = index; > if (xas->xa_node) > xas_set_offset(xas); > + > +cleanup: > + /* Free any unused spare node from xas_nomem() */ > + xas_destroy(xas); > } > EXPORT_SYMBOL_GPL(xas_create_range); >