From: Jinjiang Tu <tujinjiang@huawei.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>, <ziy@nvidia.com>,
<lorenzo.stoakes@oracle.com>, <baolin.wang@linux.alibaba.com>,
<Liam.Howlett@oracle.com>, <npache@redhat.com>,
<ryan.roberts@arm.com>, <dev.jain@arm.com>, <baohua@kernel.org>,
<lance.yang@linux.dev>, <linux-mm@kvack.org>,
<linux-fsdevel@vger.kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>,
Shardul Bankar <shardul.b@mpiricsoftware.com>
Subject: Re: [bug report] memory leak of xa_node in collapse_file() when rollbacks
Date: Thu, 18 Dec 2025 21:11:36 +0800 [thread overview]
Message-ID: <a629d3bb-c7e2-41e0-87e0-7a7a6367c1b6@huawei.com> (raw)
In-Reply-To: <05bbe26e-e71a-4a49-95d2-47373b828145@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 7593 bytes --]
在 2025/12/18 20:49, David Hildenbrand (Red Hat) 写道:
> On 12/18/25 13:18, Jinjiang Tu wrote:
>>
>> 在 2025/12/18 19:51, David Hildenbrand (Red Hat) 写道:
>>> On 12/18/25 12:45, Jinjiang Tu wrote:
>>>> I encountered a memory leak issue caused by xas_create_range().
>>>>
>>>> collapse_file() calls xas_create_range() to pre-create all slots
>>>> needed.
>>>> If collapse_file() finally fails, these pre-created slots are empty
>>>> nodes
>>>> and aren't destroyed.
>>>>
>>>> I can reproduce it with following steps.
>>>> 1) create file /tmp/test_madvise_collapse and ftruncate to 4MB
>>>> size, and then mmap the file
>>>> 2) memset for the first 2MB
>>>> 3) madvise(MADV_COLLAPSE) for the second 2MB
>>>> 4) unlink the file
>>>>
>>>> in 3), collapse_file() calls xas_create_range() to expand xarray
>>>> depth, and fails to collapse
>>>> due to the whole 2M region is empty, the code is as following:
>>>>
>>>> collapse_file()
>>>> for (index = start; index < end;) {
>>>> xas_set(&xas, index);
>>>> folio = xas_load(&xas);
>>>>
>>>> VM_BUG_ON(index != xas.xa_index);
>>>> if (is_shmem) {
>>>> if (!folio) {
>>>> /*
>>>> * Stop if extent has been truncated or
>>>> * hole-punched, and is now completely
>>>> * empty.
>>>> */
>>>> if (index == start) {
>>>> if (!xas_next_entry(&xas, end - 1)) {
>>>> result = SCAN_TRUNCATED;
>>>> goto xa_locked;
>>>> }
>>>> }
>>>> ...
>>>> }
>>>>
>>>>
>>>> collapse_file() rollback path doesn't destroy the pre-created empty
>>>> nodes.
>>>>
>>>> When the file is deleted,
>>>> shmem_evict_inode()->shmem_truncate_range() traverses
>>>> all entries and calls xas_store(xas, NULL) to delete, if the leaf
>>>> xa_node that
>>>> stores deleted entry becomes emtry, xas_store() will automatically
>>>> delete the empty
>>>> node and delete it's parent is empty too, until parent node isn't
>>>> empty. shmem_evict_inode()
>>>> won't traverse the empty nodes created by xas_create_range() due to
>>>> these nodes doesn't store
>>>> any entries. As a result, these empty nodes are leaked.
>>>>
>>>> At first, I tried to destory the empty nodes when collapse_file()
>>>> goes to rollback path. However,
>>>> collapse_file() only holds xarray lock and may release the lock, so
>>>> we couldn't prevent concurrent
>>>> call of collapse_file(), so the deleted empty nodes may be needed
>>>> by other collapse_file() calls.
>>>>
>>>> IIUC, xas_create_range() is used to guarantee the xas_store(&xas,
>>>> new_folio); succeeds. Could we
>>>> remove xas_create_range() call and just rollback when we fail to
>>>> xas_store?
>>>
>>> Hi,
>>>
>>> thanks for the report.
>>>
>>> Is that what [1] is fixing?
>>>
>>> [1] https://lore.kernel.org/linux-mm/20251204142625.1763372-1-
>>> shardul.b@mpiricsoftware.com/
>>>
>> No, this patch fixes memory leak caused by xas->xa_alloc allocated by
>> xas_nomem() and the xa_node
>> isn't installed into xarray.
>>
>> In my case, the leaked xa_nodes have been installed into xarray by
>> xas_create_range().
>
> Thanks for checking. I thought that was also discussed as part of the
> other fix.
>
> See [2] where we have
>
> "Note: This fixes the leak of pre-allocated nodes. A separate fix will
> be needed to clean up empty nodes that were inserted into the tree by
> xas_create_range() but never populated."
>
> Is that the issue you are describing? (sounds like it, but I only
> skimmed over the details).
>
> CCing Shardul.
Yes, the same issue. As I descirbed in the first email:
"
At first, I tried to destory the empty nodes when collapse_file() goes to rollback path. However,
collapse_file() only holds xarray lock and may release the lock, so we couldn't prevent concurrent
call of collapse_file(), so the deleted empty nodes may be needed by other collapse_file() calls.
"
We couldn't bindly cleanup empty nodes in the rollback path.
I'm trying to move the xas_create_range() before xas_store() and always under xarray lock to make
rollback easier, the diff likes (applied on 6.6, haven't tested yet)
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index c3f54d2eaf36..5ef393011a61 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1548,6 +1548,7 @@ void xas_destroy(struct xa_state *);
void xas_pause(struct xa_state *);
void xas_create_range(struct xa_state *);
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end);
#ifdef CONFIG_XARRAY_MULTI
int xa_get_order(struct xarray *, unsigned long index);
diff --git a/lib/xarray.c b/lib/xarray.c
index 32d4bac8c94c..724a7f35a26f 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -745,6 +745,25 @@ void xas_create_range(struct xa_state *xas)
}
EXPORT_SYMBOL_GPL(xas_create_range);
+void xas_destroy_range(struct xa_state *xas, unsigned long start, unsigned long end)
+{
+ unsigned long index;
+ void *entry;
+
+ for (index = start; index < end; ++index) {
+ xas_set(xas, index);
+ entry = xas_load(xas);
+ if (entry)
+ continue;
+
+ if (!xas->xa_node || xas_invalid(xas))
+ continue;
+
+ if (!xas->xa_node->count)
+ xas_delete_node(xas);
+ }
+}
+
static void update_node(struct xa_state *xas, struct xa_node *node,
int count, int values)
{
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e8cb826c1994..36f0600ef7b1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1832,7 +1832,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
struct folio *folio, *tmp, *new_folio;
pgoff_t index = 0, end = start + HPAGE_PMD_NR;
LIST_HEAD(pagelist);
- XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
+ XA_STATE(xas, &mapping->i_pages, 0);
int nr_none = 0, result = SCAN_SUCCEED;
bool is_shmem = shmem_file(file);
@@ -1851,22 +1851,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
new_folio->index = start;
new_folio->mapping = mapping;
- /*
- * Ensure we have slots for all the pages in the range. This is
- * almost certainly a no-op because most of the pages must be present
- */
- do {
- xas_lock_irq(&xas);
- xas_create_range(&xas);
- if (!xas_error(&xas))
- break;
- xas_unlock_irq(&xas);
- if (!xas_nomem(&xas, GFP_KERNEL)) {
- result = SCAN_FAIL;
- goto rollback;
- }
- } while (1);
-
for (index = start; index < end;) {
xas_set(&xas, index);
folio = xas_load(&xas);
@@ -2163,6 +2147,23 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
xas_lock_irq(&xas);
}
+ xas_set_order(&xas, start, HPAGE_PMD_ORDER);
+ xas_create_range(&xas);
+ if (xas_error(&xas)) {
+ if (nr_none) {
+ xas_set(&xas, start);
+ for (index = start; index < end; index++) {
+ if (xas_next(&xas) == XA_RETRY_ENTRY)
+ xas_store(&xas, NULL);
+ }
+ }
+ xas_destroy_range(&xas, start, end);
+ xas_unlock_irq(&xas);
+ result = SCAN_FAIL;
+
+ goto rollback;
+ }
+
if (is_shmem)
__lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
else
--
2.43.0
>
>
> [2]
> https://lore.kernel.org/linux-mm/20251123132727.3262731-1-shardul.b@mpiricsoftware.com/
>
[-- Attachment #2: Type: text/html, Size: 10351 bytes --]
next prev parent reply other threads:[~2025-12-18 13:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-18 11:45 Jinjiang Tu
2025-12-18 11:51 ` David Hildenbrand (Red Hat)
2025-12-18 12:18 ` Jinjiang Tu
2025-12-18 12:49 ` David Hildenbrand (Red Hat)
2025-12-18 13:11 ` Jinjiang Tu [this message]
2025-12-25 4:15 ` Shardul Bankar
2025-12-27 1:24 ` Jinjiang Tu
2025-12-30 21:03 ` David Hildenbrand (Red Hat)
2025-12-31 6:29 ` Jinjiang Tu
2025-12-18 12:35 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a629d3bb-c7e2-41e0-87e0-7a7a6367c1b6@huawei.com \
--to=tujinjiang@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shardul.b@mpiricsoftware.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox