在 2025/12/18 20:49, David Hildenbrand (Red Hat) 写道:
On 12/18/25 13:18, Jinjiang Tu wrote:

在 2025/12/18 19:51, David Hildenbrand (Red Hat) 写道:
On 12/18/25 12:45, Jinjiang Tu wrote:
I encountered a memory leak issue caused by xas_create_range().

collapse_file() calls xas_create_range() to pre-create all slots needed.
If collapse_file() finally fails, these pre-created slots are empty nodes
and aren't destroyed.

I can reproduce it with following steps.
1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file
2) memset for the first 2MB
3) madvise(MADV_COLLAPSE) for the second 2MB
4) unlink the file

in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse
due to the whole 2M region is empty, the code is as following:

collapse_file()
    for (index = start; index < end;) {
        xas_set(&xas, index);
        folio = xas_load(&xas);

        VM_BUG_ON(index != xas.xa_index);
        if (is_shmem) {
            if (!folio) {
                /*
                 * Stop if extent has been truncated or
                 * hole-punched, and is now completely
                 * empty.
                 */
                if (index == start) {
                    if (!xas_next_entry(&xas, end - 1)) {
                        result = SCAN_TRUNCATED;
                        goto xa_locked;
                    }
                }
                ...
            }


collapse_file() rollback path doesn't destroy the pre-created empty nodes.

When the file is deleted, shmem_evict_inode()->shmem_truncate_range() traverses
all entries and calls xas_store(xas, NULL) to delete, if the leaf xa_node that
stores deleted entry becomes emtry, xas_store() will automatically delete the empty
node and delete it's  parent is empty too, until parent node isn't empty. shmem_evict_inode()
won't traverse the empty nodes created by xas_create_range() due to these nodes doesn't store
any entries. As a result, these empty nodes are leaked.

At first, I tried to destory the empty nodes when collapse_file() goes to rollback path. However,
collapse_file() only holds xarray lock and may release the lock, so we couldn't prevent concurrent
call of collapse_file(), so the deleted empty nodes may be needed by other collapse_file() calls.

IIUC, xas_create_range() is used to guarantee the xas_store(&xas, new_folio); succeeds. Could we
remove xas_create_range() call and just rollback when we fail to xas_store?

Hi,

thanks for the report.

Is that what [1] is fixing?

[1] https://lore.kernel.org/linux-mm/20251204142625.1763372-1- shardul.b@mpiricsoftware.com/

No, this patch fixes memory leak caused by xas->xa_alloc allocated by xas_nomem() and the xa_node
isn't installed into xarray.

In my case, the leaked xa_nodes have been installed into xarray by xas_create_range().

Thanks for checking. I thought that was also discussed as part of the other fix.

See [2] where we have

"Note: This fixes the leak of pre-allocated nodes. A separate fix will
be needed to clean up empty nodes that were inserted into the tree by
xas_create_range() but never populated."

Is that the issue you are describing? (sounds like it, but I only skimmed over the details).

CCing Shardul. 
Yes, the same issue. As I descirbed in the first email:
"
At first, I tried to destory the empty nodes when collapse_file() goes to rollback path. However,
collapse_file() only holds xarray lock and may release the lock, so we couldn't prevent concurrent
call of collapse_file(), so the deleted empty nodes may be needed by other collapse_file() calls. 
"
We couldn't bindly cleanup empty nodes in the rollback path.

I'm trying to move the xas_create_range() before xas_store() and always under xarray lock to make
rollback easier, the diff likes (applied on 6.6, haven't tested yet)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index c3f54d2eaf36..5ef393011a61 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1548,6 +1548,7 @@ void xas_destroy(struct xa_state *);
 void xas_pause(struct xa_state *);
 
 void xas_create_range(struct xa_state *);
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end);
 
 #ifdef CONFIG_XARRAY_MULTI
 int xa_get_order(struct xarray *, unsigned long index);
diff --git a/lib/xarray.c b/lib/xarray.c
index 32d4bac8c94c..724a7f35a26f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -745,6 +745,25 @@ void xas_create_range(struct xa_state *xas)
 }
 EXPORT_SYMBOL_GPL(xas_create_range);
 
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end)
+{
+	unsigned long index;
+	void *entry;
+ 
+	for (index = start; index < end; ++index) {
+		xas_set(xas, index);
+		entry = xas_load(xas);
+		if (entry)
+			continue;
+ 
+		if (!xas->xa_node || xas_invalid(xas))
+			continue;
+ 
+		if (!xas->xa_node->count)
+			xas_delete_node(xas);
+	}
+}
+
 static void update_node(struct xa_state *xas, struct xa_node *node,
 		int count, int values)
 {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e8cb826c1994..36f0600ef7b1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1832,7 +1832,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	struct folio *folio, *tmp, *new_folio;
 	pgoff_t index = 0, end = start + HPAGE_PMD_NR;
 	LIST_HEAD(pagelist);
-	XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
+	XA_STATE(xas, &mapping->i_pages, 0);
 	int nr_none = 0, result = SCAN_SUCCEED;
 	bool is_shmem = shmem_file(file);
 
@@ -1851,22 +1851,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	new_folio->index = start;
 	new_folio->mapping = mapping;
 
-	/*
-	 * Ensure we have slots for all the pages in the range.  This is
-	 * almost certainly a no-op because most of the pages must be present
-	 */
-	do {
-		xas_lock_irq(&xas);
-		xas_create_range(&xas);
-		if (!xas_error(&xas))
-			break;
-		xas_unlock_irq(&xas);
-		if (!xas_nomem(&xas, GFP_KERNEL)) {
-			result = SCAN_FAIL;
-			goto rollback;
-		}
-	} while (1);
-
 	for (index = start; index < end;) {
 		xas_set(&xas, index);
 		folio = xas_load(&xas);
@@ -2163,6 +2147,23 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 		xas_lock_irq(&xas);
 	}
 
+	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
+	xas_create_range(&xas);
+	if (xas_error(&xas)) {
+		if (nr_none) {
+			xas_set(&xas, start);
+			for (index = start; index < end; index++) {
+				if (xas_next(&xas) == XA_RETRY_ENTRY)
+					xas_store(&xas, NULL);
+			}
+		}
+		xas_destroy_range(&xas, start, end);
+		xas_unlock_irq(&xas);
+		result = SCAN_FAIL;
+
+		goto rollback;
+	}
+
 	if (is_shmem)
 		__lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
 	else
-- 
2.43.0


[2] https://lore.kernel.org/linux-mm/20251123132727.3262731-1-shardul.b@mpiricsoftware.com/