From: Jinjiang Tu <tujinjiang@huawei.com>
To: Shardul Bankar <shardul.b@mpiricsoftware.com>,
<willy@infradead.org>, <akpm@linux-foundation.org>,
<linux-mm@kvack.org>
Cc: <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<dev.jain@arm.com>, <david@kernel.org>, <shardulsb08@gmail.com>,
<janak@mpiricsoftware.com>
Subject: Re: [PATCH v4] lib: xarray: free unused spare node in xas_create_range()
Date: Mon, 15 Dec 2025 10:19:35 +0800 [thread overview]
Message-ID: <89b96a9f-1d03-440a-93cd-2b9876be3122@huawei.com> (raw)
In-Reply-To: <20251204142625.1763372-1-shardul.b@mpiricsoftware.com>
[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]
在 2025/12/4 22:26, Shardul Bankar 写道:
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
I encountered another memory leak issue in xas_create_range().
collapse_file() calls xas_create_range() to pre-create all slots needed.
If collapse_file() finally fails, these pre-created slots are empty nodes.
When the file is deleted, shmem_evict_inode()->shmem_truncate_range()->shmem_undo_range()
calls xas_store(&xas, NULL) for each entries to delete nodes, but leaving those pre-created
empty nodes leaked.
I can reproduce it with following steps.
1) create file /tmp/test_madvise_collapse and ftruncate to 4MB size, and then mmap the file
2) memset for the first 2MB
3) madvise(MADV_COLLAPSE) for the second 2MB.
4) unlink the file
in 3), collapse_file() calls xas_create_range() to expand xarray depth, and fails to collapse
due to the whole 2M region is empty, leading to the new created empty nodes leaked.
To fix it, maybe we should add a new function xas_delete_range() to revert what xas_create_range()
does when xas_create_range() runs into rollback path?
>
> Link:https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Link:https://lore.kernel.org/all/20251201074540.3576327-1-shardul.b@mpiricsoftware.com/ ("v3")
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar<shardul.b@mpiricsoftware.com>
> ---
> v4:
> - Drop redundant `if (xa_alloc)` around xas_destroy(), as xas_destroy()
> already checks xa_alloc internally.
> v3:
> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
> - Fix in library function makes API safer by default, preventing callers from needing
> to remember cleanup
> - Use shared cleanup label that both restore: and success: paths jump to
> - Clean up unused spare node on both success and error exit paths
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
>
> lib/xarray.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..f49ccfa5f57d 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,16 @@ void xas_create_range(struct xa_state *xas)
> xas->xa_shift = shift;
> xas->xa_sibs = sibs;
> xas->xa_index = index;
> - return;
> + goto cleanup;
> +
> success:
> xas->xa_index = index;
> if (xas->xa_node)
> xas_set_offset(xas);
> +
> +cleanup:
> + /* Free any unused spare node from xas_nomem() */
> + xas_destroy(xas);
> }
> EXPORT_SYMBOL_GPL(xas_create_range);
>
[-- Attachment #2: Type: text/html, Size: 4539 bytes --]
next prev parent reply other threads:[~2025-12-15 2:19 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 14:26 Shardul Bankar
2025-12-05 7:22 ` David Hildenbrand (Red Hat)
2025-12-05 10:51 ` Shardul Bankar
2025-12-08 11:36 ` David Hildenbrand (Red Hat)
2025-12-08 8:37 ` Dev Jain
2025-12-15 2:19 ` Jinjiang Tu [this message]
2025-12-15 3:42 ` Jinjiang Tu
2025-12-31 6:29 ` Shardul Bankar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89b96a9f-1d03-440a-93cd-2b9876be3122@huawei.com \
--to=tujinjiang@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=janak@mpiricsoftware.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shardul.b@mpiricsoftware.com \
--cc=shardulsb08@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox