From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Matthew Wilcox <willy@infradead.org>,
Bas van Dijk <bas@dfinity.org>,
Eero Kelly <eero.kelly@dfinity.org>,
Andrew Battat <andrew.battat@dfinity.org>,
Adam Bratschi-Kaye <adam.bratschikaye@dfinity.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
Date: Tue, 3 Mar 2026 09:59:51 +0000 [thread overview]
Message-ID: <6d329e4d-ab33-480b-b1d8-646cf6aa1fba@lucifer.local> (raw)
In-Reply-To: <34AA9329-A6F3-48C4-A580-8BE3E4F9A3A0@nvidia.com>
On Mon, Mar 02, 2026 at 11:30:39AM -0500, Zi Yan wrote:
> On 2 Mar 2026, at 8:30, Lorenzo Stoakes wrote:
>
> > On Fri, Feb 27, 2026 at 08:06:14PM -0500, Zi Yan wrote:
> >> During a pagecache folio split, the values in the related xarray should not
> >> be changed from the original folio at xarray split time until all
> >> after-split folios are well formed and stored in the xarray. Current use
> >> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> >> show up at wrong indices in the xarray. When these misplaced after-split
> >> folios are unfrozen, before correct folios are stored via __xa_store(), and
> >> grabbed by folio_try_get(), they are returned to userspace at wrong file
> >> indices, causing data corruption.
> >>
> >> Fix it by using the original folio in xas_try_split() calls, so that
> >> folio_try_get() can get the right after-split folios after the original
> >> folio is unfrozen.
> >>
> >> Uniform split, split_huge_page*(), is not affected, since it uses
> >> xas_split_alloc() and xas_split() only once and stores the original folio
> >> in the xarray.
> >>
> >> Fixes below points to the commit introduces the code, but folio_split() is
> >> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> >> truncate operation").
> >>
> >> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> >> Reported-by: Bas van Dijk <bas@dfinity.org>
> >> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >> mm/huge_memory.c | 9 ++++++++-
> >> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 56db54fa48181..e4ed0404e8b55 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> >> const bool is_anon = folio_test_anon(folio);
> >> int old_order = folio_order(folio);
> >> int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
> >> + struct folio *origin_folio = folio;
> >
> > NIT: 'origin' folio is a bit ambigious, maybe old_folio, since it is of order old_order?
>
> OK, will rename it.
Thanks
>
> >
> >> int split_order;
> >>
> >> /*
> >> @@ -3672,7 +3673,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> >> xas_split(xas, folio, old_order);
> >
> > Aside, but this 'if (foo) bar(); else { ... }' pattern is horrible, think it's
> > justifiable to put both in {}... :)
>
> I can fix it along with this. It should not cause much trouble during backport.
Thanks!
>
> >
> >> else {
> >> xas_set_order(xas, folio->index, split_order);
> >> - xas_try_split(xas, folio, old_order);
> >> + /*
> >> + * use the original folio, so that a parallel
> >> + * folio_try_get() waits on it until xarray is
> >> + * updated with after-split folios and
> >> + * the original one is unfrozen.
> >> + */
> >> + xas_try_split(xas, origin_folio, old_order);
> >
> > Hmm, but won't we have already split the original folio by now? So is
> > origin_folio/old_folio a pointer to what was the original folio but now is
> > that but with weird tail page setup? :) like:
> >
> > |------------------------|
> > | f |
> > |------------------------|
> > ^old_folio ^ split_at
> >
> > |-----------|------------|
> > | f | f2 |
> > |-----------|------------|
> > ^old_folio
> >
> > |-----------|-----|------|
> > | f | f3 | f4 |
> > |-----------|-----|------|
> > ^old_folio
>
> This should be:
>
> |-----------|-----|------|
> | f | f2 | f3 |
> |-----------|-----|------|
> ^old_folio
>
> after split, the head page of f2 does not change,
> so f2 becomes f2,f3, where f3 is the tail page
> in the middle.
Right, I mean from the perspective of looking at f we'd only see f + some weird
stuff in tail pages, until order is updated?
>
> >
> > etc.
> >
> > So the xarray would contain:
> >
> > |-----------|-----|------|
> > | f | f | f |
> > |-----------|-----|------|
>
> This is the expected xarray state.
>
> >
> > Wouldn't it after this?
> >
> > Oh I guess before it'd contain:
> >
> > |-----------|-----|------|
> > | f | f4 | f4 |
> > |-----------|-----|------|
> >
> > Right?
>
> You got the gist of it. The reality (see the fix above) is
>
> |-----------|-----|------|
> | f | f2 | f3 |
> |-----------|-----|------|
>
> But another split comes at f3, the xarray becomes
>
> |-----------|-----|---|---|
> | f | f2 |f3 | f3|
> |-----------|-----|---|---|
>
> due to how xas_try_split() works. Yeah, feel free to
> blame me, since when I wrote xas_try_split(), I did
> not get into all the details. I am planning to
> change xas_try_split() so that the xarray will become
>
> |-----------|-----|---|---|
> | f | f2 |f3 | f4|
> |-----------|-----|---|---|
Ah ok I see :)
>
>
> >
> >
> > You saying you'll later put the correct xas entries in post-split. Where does
> > that happen?
>
> After __split_unmmaped_folio(), when __xa_store() is performed.
Thanks!
>
> >
> > And why was it a problem when these new folios were unfrozen?
> >
> > (Since the folio is a pointer to an offset in the vmemmap)
> >
> > I guess if you update that later in the xas, it's ok, and everything waits on
> > the right thing so this is probably fine, and the f4 f4 above is probably not
> > fine...
> >
> > I'm guessing the original folio is kept frozen during the operation?
>
> Right. f is kept frozen until the entire xarray is updated. But if the xarray
> is like (before the fix)
>
> |-----------|-----|---|---|
> | f | f2 |f3 | f3|
> |-----------|-----|---|---|
>
> the code after __split_unmmaped_folio()
> 1. unfreezes f2, __xa_store(f2)
> 2. unfreezes f3, __xa_store(f3)
> 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4,
>
> and a parallel folio_try_get() that looks at the second f3 at step 2
> sees f3 is unfrozen, then gives f3 to user but should have given
> f4. It only happens when the split is at the second half of the old
> folio.
Nasty...!
Great thanks for having the patience to explain it to me :)
>
> >
> > Anyway please help my confusion not so familiar with this code :)
> >
>
> Let me know if you have any more questions.
Perfect, appreciated :) I think we're good.
>
> >
> >> if (xas_error(xas))
> >> return xas_error(xas);
> >> }
> >> --
> >> 2.51.0
> >>
> >
> > Thanks, Lorenzo
>
>
> Best Regards,
> Yan, Zi
Cheers, Lorenzo
next prev parent reply other threads:[~2026-03-03 10:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-28 1:06 Zi Yan
2026-02-28 3:10 ` Lance Yang
2026-03-02 14:28 ` David Hildenbrand (Arm)
2026-03-02 15:11 ` Lance Yang
2026-03-02 16:36 ` Zi Yan
2026-03-02 20:54 ` [External Sender] " Bas van Dijk
2026-03-02 13:30 ` Lorenzo Stoakes
2026-03-02 16:30 ` Zi Yan
2026-03-03 9:59 ` Lorenzo Stoakes [this message]
2026-03-02 14:40 ` David Hildenbrand (Arm)
2026-03-02 16:34 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d329e4d-ab33-480b-b1d8-646cf6aa1fba@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=adam.bratschikaye@dfinity.org \
--cc=akpm@linux-foundation.org \
--cc=andrew.battat@dfinity.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bas@dfinity.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=eero.kelly@dfinity.org \
--cc=hughd@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox