linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Matthew Wilcox <willy@infradead.org>,
	Bas van Dijk <bas@dfinity.org>,
	Eero Kelly <eero.kelly@dfinity.org>,
	Andrew Battat <andrew.battat@dfinity.org>,
	Adam Bratschi-Kaye <adam.bratschikaye@dfinity.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
Date: Tue, 3 Mar 2026 09:59:51 +0000	[thread overview]
Message-ID: <6d329e4d-ab33-480b-b1d8-646cf6aa1fba@lucifer.local> (raw)
In-Reply-To: <34AA9329-A6F3-48C4-A580-8BE3E4F9A3A0@nvidia.com>

On Mon, Mar 02, 2026 at 11:30:39AM -0500, Zi Yan wrote:
> On 2 Mar 2026, at 8:30, Lorenzo Stoakes wrote:
>
> > On Fri, Feb 27, 2026 at 08:06:14PM -0500, Zi Yan wrote:
> >> During a pagecache folio split, the values in the related xarray should not
> >> be changed from the original folio at xarray split time until all
> >> after-split folios are well formed and stored in the xarray. Current use
> >> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> >> show up at wrong indices in the xarray. When these misplaced after-split
> >> folios are unfrozen, before correct folios are stored via __xa_store(), and
> >> grabbed by folio_try_get(), they are returned to userspace at wrong file
> >> indices, causing data corruption.
> >>
> >> Fix it by using the original folio in xas_try_split() calls, so that
> >> folio_try_get() can get the right after-split folios after the original
> >> folio is unfrozen.
> >>
> >> Uniform split, split_huge_page*(), is not affected, since it uses
> >> xas_split_alloc() and xas_split() only once and stores the original folio
> >> in the xarray.
> >>
> >> Fixes below points to the commit introduces the code, but folio_split() is
> >> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> >> truncate operation").
> >>
> >> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> >> Reported-by: Bas van Dijk <bas@dfinity.org>
> >> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>  mm/huge_memory.c | 9 ++++++++-
> >>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 56db54fa48181..e4ed0404e8b55 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> >>  	const bool is_anon = folio_test_anon(folio);
> >>  	int old_order = folio_order(folio);
> >>  	int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
> >> +	struct folio *origin_folio = folio;
> >
> > NIT: 'origin' folio is a bit ambigious, maybe old_folio, since it is of order old_order?
>
> OK, will rename it.

Thanks

>
> >
> >>  	int split_order;
> >>
> >>  	/*
> >> @@ -3672,7 +3673,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> >>  				xas_split(xas, folio, old_order);
> >
> > Aside, but this 'if (foo) bar(); else { ... }' pattern is horrible, think it's
> > justifiable to put both in {}... :)
>
> I can fix it along with this. It should not cause much trouble during backport.

Thanks!

>
> >
> >>  			else {
> >>  				xas_set_order(xas, folio->index, split_order);
> >> -				xas_try_split(xas, folio, old_order);
> >> +				/*
> >> +				 * use the original folio, so that a parallel
> >> +				 * folio_try_get() waits on it until xarray is
> >> +				 * updated with after-split folios and
> >> +				 * the original one is unfrozen.
> >> +				 */
> >> +				xas_try_split(xas, origin_folio, old_order);
> >
> > Hmm, but won't we have already split the original folio by now? So is
> > origin_folio/old_folio a pointer to what was the original folio but now is
> > that but with weird tail page setup? :) like:
> >
> > |------------------------|
> > |           f            |
> > |------------------------|
> > ^old_folio  ^ split_at
> >
> > |-----------|------------|
> > |     f     |     f2     |
> > |-----------|------------|
> > ^old_folio
> >
> > |-----------|-----|------|
> > |     f     |  f3 |  f4  |
> > |-----------|-----|------|
> > ^old_folio
>
> This should be:
>
> |-----------|-----|------|
> |     f     |  f2 |  f3  |
> |-----------|-----|------|
> ^old_folio
>
> after split, the head page of f2 does not change,
> so f2 becomes f2,f3, where f3 is the tail page
> in the middle.

Right, I mean from the perspective of looking at f we'd only see f + some weird
stuff in tail pages, until order is updated?

>
> >
> > etc.
> >
> > So the xarray would contain:
> >
> > |-----------|-----|------|
> > |    f      |  f  |   f  |
> > |-----------|-----|------|
>
> This is the expected xarray state.
>
> >
> > Wouldn't it after this?
> >
> > Oh I guess before it'd contain:
> >
> > |-----------|-----|------|
> > |     f     |  f4 |  f4  |
> > |-----------|-----|------|
> >
> > Right?
>
> You got the gist of it. The reality (see the fix above) is
>
> |-----------|-----|------|
> |     f     |  f2 |  f3  |
> |-----------|-----|------|
>
> But another split comes at f3, the xarray becomes
>
> |-----------|-----|---|---|
> |     f     |  f2 |f3 | f3|
> |-----------|-----|---|---|
>
> due to how xas_try_split() works. Yeah, feel free to
> blame me, since when I wrote xas_try_split(), I did
> not get into all the details. I am planning to
> change xas_try_split() so that the xarray will become
>
> |-----------|-----|---|---|
> |     f     |  f2 |f3 | f4|
> |-----------|-----|---|---|

Ah ok I see :)

>
>
> >
> >
> > You saying you'll later put the correct xas entries in post-split. Where does
> > that happen?
>
> After __split_unmmaped_folio(), when __xa_store() is performed.

Thanks!

>
> >
> > And why was it a problem when these new folios were unfrozen?
> >
> > (Since the folio is a pointer to an offset in the vmemmap)
> >
> > I guess if you update that later in the xas, it's ok, and everything waits on
> > the right thing so this is probably fine, and the f4 f4 above is probably not
> > fine...
> >
> > I'm guessing the original folio is kept frozen during the operation?
>
> Right. f is kept frozen until the entire xarray is updated. But if the xarray
> is like (before the fix)
>
> |-----------|-----|---|---|
> |     f     |  f2 |f3 | f3|
> |-----------|-----|---|---|
>
> the code after __split_unmmaped_folio()
> 1. unfreezes f2, __xa_store(f2)
> 2. unfreezes f3, __xa_store(f3)
> 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4,
>
> and a parallel folio_try_get() that looks at the second f3 at step 2
> sees f3 is unfrozen, then gives f3 to user but should have given
> f4. It only happens when the split is at the second half of the old
> folio.

Nasty...!

Great thanks for having the patience to explain it to me :)

>
> >
> > Anyway please help my confusion not so familiar with this code :)
> >
>
> Let me know if you have any more questions.

Perfect, appreciated :) I think we're good.

>
> >
> >>  				if (xas_error(xas))
> >>  					return xas_error(xas);
> >>  			}
> >> --
> >> 2.51.0
> >>
> >
> > Thanks, Lorenzo
>
>
> Best Regards,
> Yan, Zi

Cheers, Lorenzo


  reply	other threads:[~2026-03-03 10:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28  1:06 Zi Yan
2026-02-28  3:10 ` Lance Yang
2026-03-02 14:28   ` David Hildenbrand (Arm)
2026-03-02 15:11     ` Lance Yang
2026-03-02 16:36       ` Zi Yan
2026-03-02 20:54         ` [External Sender] " Bas van Dijk
2026-03-02 13:30 ` Lorenzo Stoakes
2026-03-02 16:30   ` Zi Yan
2026-03-03  9:59     ` Lorenzo Stoakes [this message]
2026-03-02 14:40 ` David Hildenbrand (Arm)
2026-03-02 16:34   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d329e4d-ab33-480b-b1d8-646cf6aa1fba@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=adam.bratschikaye@dfinity.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.battat@dfinity.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bas@dfinity.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=eero.kelly@dfinity.org \
    --cc=hughd@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox