在 2025/12/18 20:49, David Hildenbrand (Red Hat) 写道: > On 12/18/25 13:18, Jinjiang Tu wrote: >> >> 在 2025/12/18 19:51, David Hildenbrand (Red Hat) 写道: >>> On 12/18/25 12:45, Jinjiang Tu wrote: >>>> I encountered a memory leak issue caused by xas_create_range(). >>>> >>>> collapse_file() calls xas_create_range() to pre-create all slots >>>> needed. >>>> If collapse_file() finally fails, these pre-created slots are empty >>>> nodes >>>> and aren't destroyed. >>>> >>>> I can reproduce it with following steps. >>>> 1) create file /tmp/test_madvise_collapse and ftruncate to 4MB >>>> size, and then mmap the file >>>> 2) memset for the first 2MB >>>> 3) madvise(MADV_COLLAPSE) for the second 2MB >>>> 4) unlink the file >>>> >>>> in 3), collapse_file() calls xas_create_range() to expand xarray >>>> depth, and fails to collapse >>>> due to the whole 2M region is empty, the code is as following: >>>> >>>> collapse_file() >>>>     for (index = start; index < end;) { >>>>         xas_set(&xas, index); >>>>         folio = xas_load(&xas); >>>> >>>>         VM_BUG_ON(index != xas.xa_index); >>>>         if (is_shmem) { >>>>             if (!folio) { >>>>                 /* >>>>                  * Stop if extent has been truncated or >>>>                  * hole-punched, and is now completely >>>>                  * empty. >>>>                  */ >>>>                 if (index == start) { >>>>                     if (!xas_next_entry(&xas, end - 1)) { >>>>                         result = SCAN_TRUNCATED; >>>>                         goto xa_locked; >>>>                     } >>>>                 } >>>>                 ... >>>>             } >>>> >>>> >>>> collapse_file() rollback path doesn't destroy the pre-created empty >>>> nodes. >>>> >>>> When the file is deleted, >>>> shmem_evict_inode()->shmem_truncate_range() traverses >>>> all entries and calls xas_store(xas, NULL) to delete, if the leaf >>>> xa_node that >>>> stores deleted entry becomes emtry, xas_store() will automatically >>>> delete the empty >>>> node and delete it's  parent is empty too, until parent node isn't >>>> empty. shmem_evict_inode() >>>> won't traverse the empty nodes created by xas_create_range() due to >>>> these nodes doesn't store >>>> any entries. As a result, these empty nodes are leaked. >>>> >>>> At first, I tried to destory the empty nodes when collapse_file() >>>> goes to rollback path. However, >>>> collapse_file() only holds xarray lock and may release the lock, so >>>> we couldn't prevent concurrent >>>> call of collapse_file(), so the deleted empty nodes may be needed >>>> by other collapse_file() calls. >>>> >>>> IIUC, xas_create_range() is used to guarantee the xas_store(&xas, >>>> new_folio); succeeds. Could we >>>> remove xas_create_range() call and just rollback when we fail to >>>> xas_store? >>> >>> Hi, >>> >>> thanks for the report. >>> >>> Is that what [1] is fixing? >>> >>> [1] https://lore.kernel.org/linux-mm/20251204142625.1763372-1- >>> shardul.b@mpiricsoftware.com/ >>> >> No, this patch fixes memory leak caused by xas->xa_alloc allocated by >> xas_nomem() and the xa_node >> isn't installed into xarray. >> >> In my case, the leaked xa_nodes have been installed into xarray by >> xas_create_range(). > > Thanks for checking. I thought that was also discussed as part of the > other fix. > > See [2] where we have > > "Note: This fixes the leak of pre-allocated nodes. A separate fix will > be needed to clean up empty nodes that were inserted into the tree by > xas_create_range() but never populated." > > Is that the issue you are describing? (sounds like it, but I only > skimmed over the details). > > CCing Shardul. Yes, the same issue. As I descirbed in the first email: " At first, I tried to destory the empty nodes when collapse_file() goes to rollback path. However, collapse_file() only holds xarray lock and may release the lock, so we couldn't prevent concurrent call of collapse_file(), so the deleted empty nodes may be needed by other collapse_file() calls. " We couldn't bindly cleanup empty nodes in the rollback path. I'm trying to move the xas_create_range() before xas_store() and always under xarray lock to make rollback easier, the diff likes (applied on 6.6, haven't tested yet) diff --git a/include/linux/xarray.h b/include/linux/xarray.h index c3f54d2eaf36..5ef393011a61 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1548,6 +1548,7 @@ void xas_destroy(struct xa_state *); void xas_pause(struct xa_state *); void xas_create_range(struct xa_state *); +void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end); #ifdef CONFIG_XARRAY_MULTI int xa_get_order(struct xarray *, unsigned long index); diff --git a/lib/xarray.c b/lib/xarray.c index 32d4bac8c94c..724a7f35a26f 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -745,6 +745,25 @@ void xas_create_range(struct xa_state *xas) } EXPORT_SYMBOL_GPL(xas_create_range); +void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end) +{ + unsigned long index; + void *entry; + + for (index = start; index < end; ++index) { + xas_set(xas, index); + entry = xas_load(xas); + if (entry) + continue; + + if (!xas->xa_node || xas_invalid(xas)) + continue; + + if (!xas->xa_node->count) + xas_delete_node(xas); + } +} + static void update_node(struct xa_state *xas, struct xa_node *node, int count, int values) { diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e8cb826c1994..36f0600ef7b1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1832,7 +1832,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, struct folio *folio, *tmp, *new_folio; pgoff_t index = 0, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); - XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); + XA_STATE(xas, &mapping->i_pages, 0); int nr_none = 0, result = SCAN_SUCCEED; bool is_shmem = shmem_file(file); @@ -1851,22 +1851,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, new_folio->index = start; new_folio->mapping = mapping; - /* - * Ensure we have slots for all the pages in the range. This is - * almost certainly a no-op because most of the pages must be present - */ - do { - xas_lock_irq(&xas); - xas_create_range(&xas); - if (!xas_error(&xas)) - break; - xas_unlock_irq(&xas); - if (!xas_nomem(&xas, GFP_KERNEL)) { - result = SCAN_FAIL; - goto rollback; - } - } while (1); - for (index = start; index < end;) { xas_set(&xas, index); folio = xas_load(&xas); @@ -2163,6 +2147,23 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_lock_irq(&xas); } + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_create_range(&xas); + if (xas_error(&xas)) { + if (nr_none) { + xas_set(&xas, start); + for (index = start; index < end; index++) { + if (xas_next(&xas) == XA_RETRY_ENTRY) + xas_store(&xas, NULL); + } + } + xas_destroy_range(&xas, start, end); + xas_unlock_irq(&xas); + result = SCAN_FAIL; + + goto rollback; + } + if (is_shmem) __lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR); else -- 2.43.0 > > > [2] > https://lore.kernel.org/linux-mm/20251123132727.3262731-1-shardul.b@mpiricsoftware.com/ >