* [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
@ 2026-02-26 20:34 Bas van Dijk
2026-02-26 20:49 ` Zi Yan
0 siblings, 1 reply; 7+ messages in thread
From: Bas van Dijk @ 2026-02-26 20:34 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, Matthew Wilcox (Oracle)
Cc: regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
[-- Attachment #1: Type: text/plain, Size: 783 bytes --]
#regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
Hey folks,
We discovered madvise(MADV_REMOVE) on a 4KiB range within a
huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
Using the reproducible test in
https://github.com/dfinity/thp-madv-remove-test this was bisected to the
first bad commit:
commit 7460b470a131f985a70302a322617121efdd7caa
Author: Zi Yan <ziy@nvidia.com>
Date: Fri Mar 7 12:40:00 2025 -0500
mm/truncate: use folio_split() in truncate operation
v7.0-rc1 still has the regression.
The repo mentioned above explains how to reproduce the regression and
contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
well as a successful run on its parent 4b94c18d1519.
Best regards,
Bas van Dijk
DFINITY Foundation
[-- Attachment #2: Type: text/html, Size: 1151 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
2026-02-26 20:34 [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131) Bas van Dijk
@ 2026-02-26 20:49 ` Zi Yan
2026-02-26 21:06 ` Zi Yan
0 siblings, 1 reply; 7+ messages in thread
From: Zi Yan @ 2026-02-26 20:49 UTC (permalink / raw)
To: Bas van Dijk
Cc: Andrew Morton, Matthew Wilcox (Oracle),
regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
On 26 Feb 2026, at 15:34, Bas van Dijk wrote:
> #regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
>
> Hey folks,
>
> We discovered madvise(MADV_REMOVE) on a 4KiB range within a
> huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
>
> Using the reproducible test in
> https://github.com/dfinity/thp-madv-remove-test this was bisected to the
> first bad commit:
>
> commit 7460b470a131f985a70302a322617121efdd7caa
> Author: Zi Yan <ziy@nvidia.com>
> Date: Fri Mar 7 12:40:00 2025 -0500
>
> mm/truncate: use folio_split() in truncate operation
>
> v7.0-rc1 still has the regression.
>
> The repo mentioned above explains how to reproduce the regression and
> contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
> well as a successful run on its parent 4b94c18d1519.
Thanks for the report. I will look into it.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
2026-02-26 20:49 ` Zi Yan
@ 2026-02-26 21:06 ` Zi Yan
2026-02-26 21:16 ` [External Sender] " Bas van Dijk
0 siblings, 1 reply; 7+ messages in thread
From: Zi Yan @ 2026-02-26 21:06 UTC (permalink / raw)
To: Bas van Dijk
Cc: Andrew Morton, Matthew Wilcox (Oracle),
regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
On 26 Feb 2026, at 15:49, Zi Yan wrote:
> On 26 Feb 2026, at 15:34, Bas van Dijk wrote:
>
>> #regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
>>
>> Hey folks,
>>
>> We discovered madvise(MADV_REMOVE) on a 4KiB range within a
>> huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
>>
>> Using the reproducible test in
>> https://github.com/dfinity/thp-madv-remove-test this was bisected to the
>> first bad commit:
>>
>> commit 7460b470a131f985a70302a322617121efdd7caa
>> Author: Zi Yan <ziy@nvidia.com>
>> Date: Fri Mar 7 12:40:00 2025 -0500
>>
>> mm/truncate: use folio_split() in truncate operation
>>
>> v7.0-rc1 still has the regression.
>>
>> The repo mentioned above explains how to reproduce the regression and
>> contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
>> well as a successful run on its parent 4b94c18d1519.
>
> Thanks for the report. I will look into it.
Can you also share your kernel config file? I just ran the reproducer and
could not trigger the corruption.
Thanks.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External Sender] Re: [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
2026-02-26 21:06 ` Zi Yan
@ 2026-02-26 21:16 ` Bas van Dijk
2026-02-27 19:29 ` [External Sender] " Zi Yan
0 siblings, 1 reply; 7+ messages in thread
From: Bas van Dijk @ 2026-02-26 21:16 UTC (permalink / raw)
To: Zi Yan
Cc: Andrew Morton, Matthew Wilcox (Oracle),
regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
On Thu, Feb 26, 2026 at 10:06 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 26 Feb 2026, at 15:49, Zi Yan wrote:
>
> > On 26 Feb 2026, at 15:34, Bas van Dijk wrote:
> >
> >> #regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
> >>
> >> Hey folks,
> >>
> >> We discovered madvise(MADV_REMOVE) on a 4KiB range within a
> >> huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
> >>
> >> Using the reproducible test in
> >> https://github.com/dfinity/thp-madv-remove-test this was bisected to the
> >> first bad commit:
> >>
> >> commit 7460b470a131f985a70302a322617121efdd7caa
> >> Author: Zi Yan <ziy@nvidia.com>
> >> Date: Fri Mar 7 12:40:00 2025 -0500
> >>
> >> mm/truncate: use folio_split() in truncate operation
> >>
> >> v7.0-rc1 still has the regression.
> >>
> >> The repo mentioned above explains how to reproduce the regression and
> >> contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
> >> well as a successful run on its parent 4b94c18d1519.
> >
> > Thanks for the report. I will look into it.
>
> Can you also share your kernel config file? I just ran the reproducer and
> could not trigger the corruption.
Sure, I just ran `nix build
.#linux_6_14_first_bad_7460b470a131.configfile -o kernel.config` which
produced:
https://github.com/dfinity/thp-madv-remove-test/blob/master/kernel.config
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External Sender] [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
2026-02-26 21:16 ` [External Sender] " Bas van Dijk
@ 2026-02-27 19:29 ` Zi Yan
2026-02-27 23:32 ` Bas van Dijk
0 siblings, 1 reply; 7+ messages in thread
From: Zi Yan @ 2026-02-27 19:29 UTC (permalink / raw)
To: Bas van Dijk
Cc: Andrew Morton, Matthew Wilcox (Oracle),
regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
On 26 Feb 2026, at 16:16, Bas van Dijk wrote:
> On Thu, Feb 26, 2026 at 10:06 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 26 Feb 2026, at 15:49, Zi Yan wrote:
>>
>>> On 26 Feb 2026, at 15:34, Bas van Dijk wrote:
>>>
>>>> #regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
>>>>
>>>> Hey folks,
>>>>
>>>> We discovered madvise(MADV_REMOVE) on a 4KiB range within a
>>>> huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
>>>>
>>>> Using the reproducible test in
>>>> https://github.com/dfinity/thp-madv-remove-test this was bisected to the
>>>> first bad commit:
>>>>
>>>> commit 7460b470a131f985a70302a322617121efdd7caa
>>>> Author: Zi Yan <ziy@nvidia.com>
>>>> Date: Fri Mar 7 12:40:00 2025 -0500
>>>>
>>>> mm/truncate: use folio_split() in truncate operation
>>>>
>>>> v7.0-rc1 still has the regression.
>>>>
>>>> The repo mentioned above explains how to reproduce the regression and
>>>> contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
>>>> well as a successful run on its parent 4b94c18d1519.
>>>
>>> Thanks for the report. I will look into it.
>>
>> Can you also share your kernel config file? I just ran the reproducer and
>> could not trigger the corruption.
>
> Sure, I just ran `nix build
> .#linux_6_14_first_bad_7460b470a131.configfile -o kernel.config` which
> produced:
>
> https://github.com/dfinity/thp-madv-remove-test/blob/master/kernel.config
Hi Bas,
Can you try the patch below? It fixes the issue locally. I was able to
use your app to reproduce the issue after change my shmem THP config
from never to always.
Thanks.
From 03b75f017ffe6cf556fefbd44f44655bf4a9af48 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Fri, 27 Feb 2026 14:11:36 -0500
Subject: [PATCH] mm/huge_memory: fix folio_split() race condition with
folio_try_get()
During a pagecache folio split, the values in the related xarray should not
be changed from the original folio at xarray split time until all
after-split folios are ready and stored in the xarray. Otherwise, a
parallel folio_try_get() can see stale values in the xarray and a stale
value can be a unfrozen after-split folio. This leads to a wrong folio
returned to userspace.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/huge_memory.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d4ca8cfd7f9d..3d5bf3bb8a3e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3628,6 +3628,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
const bool is_anon = folio_test_anon(folio);
int old_order = folio_order(folio);
int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
+ struct folio *origin_folio = folio;
int split_order;
/*
@@ -3653,7 +3654,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
xas_split(xas, folio, old_order);
else {
xas_set_order(xas, folio->index, split_order);
- xas_try_split(xas, folio, old_order);
+ /*
+ * use the original folio, so that a parallel
+ * folio_try_get() waits on it until xarray is
+ * updated with after-split folios and
+ * the original one is unfreezed
+ */
+ xas_try_split(xas, origin_folio, old_order);
if (xas_error(xas))
return xas_error(xas);
}
--
2.51.0
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External Sender] [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
2026-02-27 19:29 ` [External Sender] " Zi Yan
@ 2026-02-27 23:32 ` Bas van Dijk
2026-02-27 23:44 ` Zi Yan
0 siblings, 1 reply; 7+ messages in thread
From: Bas van Dijk @ 2026-02-27 23:32 UTC (permalink / raw)
To: Zi Yan
Cc: Andrew Morton, Matthew Wilcox (Oracle),
regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
On Fri, Feb 27, 2026 at 8:29 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 26 Feb 2026, at 16:16, Bas van Dijk wrote:
>
> > On Thu, Feb 26, 2026 at 10:06 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On 26 Feb 2026, at 15:49, Zi Yan wrote:
> >>
> >>> On 26 Feb 2026, at 15:34, Bas van Dijk wrote:
> >>>
> >>>> #regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
> >>>>
> >>>> Hey folks,
> >>>>
> >>>> We discovered madvise(MADV_REMOVE) on a 4KiB range within a
> >>>> huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
> >>>>
> >>>> Using the reproducible test in
> >>>> https://github.com/dfinity/thp-madv-remove-test this was bisected to the
> >>>> first bad commit:
> >>>>
> >>>> commit 7460b470a131f985a70302a322617121efdd7caa
> >>>> Author: Zi Yan <ziy@nvidia.com>
> >>>> Date: Fri Mar 7 12:40:00 2025 -0500
> >>>>
> >>>> mm/truncate: use folio_split() in truncate operation
> >>>>
> >>>> v7.0-rc1 still has the regression.
> >>>>
> >>>> The repo mentioned above explains how to reproduce the regression and
> >>>> contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
> >>>> well as a successful run on its parent 4b94c18d1519.
> >>>
> >>> Thanks for the report. I will look into it.
> >>
> >> Can you also share your kernel config file? I just ran the reproducer and
> >> could not trigger the corruption.
> >
> > Sure, I just ran `nix build
> > .#linux_6_14_first_bad_7460b470a131.configfile -o kernel.config` which
> > produced:
> >
> > https://github.com/dfinity/thp-madv-remove-test/blob/master/kernel.config
>
> Hi Bas,
>
> Can you try the patch below?
The test passes twice with the patch manually applied to the latest
master (4d349ee5c778). Thank you!
I had trouble applying the patch using `git am` to 7460b470a131 or
7.0-rc1 but this is the first time I've used `git am`, so I might have
done something wrong.
> I was able to use your app to reproduce the issue after change my shmem THP config from never to always.
Yes I had to write "advise" to
/sys/kernel/mm/transparent_hugepage/shmem_enabled since it's set to
"never" by default in NixOS. See:
https://github.com/dfinity/thp-madv-remove-test/blob/d859609820113c69023848452bdba8b619d78a8a/flake.nix#L93
It would be great if the patch could be backported to 6.17 used in
Ubuntu 24.04 LTS since that's what we use for the Internet Computer
and where our tests first started crashing.
Cheers,
Bas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [External Sender] [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131)
2026-02-27 23:32 ` Bas van Dijk
@ 2026-02-27 23:44 ` Zi Yan
0 siblings, 0 replies; 7+ messages in thread
From: Zi Yan @ 2026-02-27 23:44 UTC (permalink / raw)
To: Bas van Dijk
Cc: Andrew Morton, Matthew Wilcox (Oracle),
regressions, linux-mm, linux-fsdevel, Eero Kelly, Andrew Battat,
Adam Bratschi-Kaye
On 27 Feb 2026, at 18:32, Bas van Dijk wrote:
> On Fri, Feb 27, 2026 at 8:29 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 26 Feb 2026, at 16:16, Bas van Dijk wrote:
>>
>>> On Thu, Feb 26, 2026 at 10:06 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> On 26 Feb 2026, at 15:49, Zi Yan wrote:
>>>>
>>>>> On 26 Feb 2026, at 15:34, Bas van Dijk wrote:
>>>>>
>>>>>> #regzbot introduced: 7460b470a131f985a70302a322617121efdd7caa
>>>>>>
>>>>>> Hey folks,
>>>>>>
>>>>>> We discovered madvise(MADV_REMOVE) on a 4KiB range within a
>>>>>> huge-page-backed MAP_SHARED memfd region corrupts nearby pages.
>>>>>>
>>>>>> Using the reproducible test in
>>>>>> https://github.com/dfinity/thp-madv-remove-test this was bisected to the
>>>>>> first bad commit:
>>>>>>
>>>>>> commit 7460b470a131f985a70302a322617121efdd7caa
>>>>>> Author: Zi Yan <ziy@nvidia.com>
>>>>>> Date: Fri Mar 7 12:40:00 2025 -0500
>>>>>>
>>>>>> mm/truncate: use folio_split() in truncate operation
>>>>>>
>>>>>> v7.0-rc1 still has the regression.
>>>>>>
>>>>>> The repo mentioned above explains how to reproduce the regression and
>>>>>> contains the necessary logs of failed runs on 7460b470a131 and v7.0-rc1, as
>>>>>> well as a successful run on its parent 4b94c18d1519.
>>>>>
>>>>> Thanks for the report. I will look into it.
>>>>
>>>> Can you also share your kernel config file? I just ran the reproducer and
>>>> could not trigger the corruption.
>>>
>>> Sure, I just ran `nix build
>>> .#linux_6_14_first_bad_7460b470a131.configfile -o kernel.config` which
>>> produced:
>>>
>>> https://github.com/dfinity/thp-madv-remove-test/blob/master/kernel.config
>>
>> Hi Bas,
>>
>> Can you try the patch below?
>
> The test passes twice with the patch manually applied to the latest
> master (4d349ee5c778). Thank you!
Great. I will send a proper patch and cc stable to get it backported to all
stable kernels. Thank you for the report and testing.
>
> I had trouble applying the patch using `git am` to 7460b470a131 or
> 7.0-rc1 but this is the first time I've used `git am`, so I might have
> done something wrong.
My fix is based on linux-mm tree, so there could be some difference.
>
>> I was able to use your app to reproduce the issue after change my shmem THP config from never to always.
>
> Yes I had to write "advise" to
> /sys/kernel/mm/transparent_hugepage/shmem_enabled since it's set to
> "never" by default in NixOS. See:
> https://github.com/dfinity/thp-madv-remove-test/blob/d859609820113c69023848452bdba8b619d78a8a/flake.nix#L93
>
> It would be great if the patch could be backported to 6.17 used in
> Ubuntu 24.04 LTS since that's what we use for the Internet Computer
> and where our tests first started crashing.
The one below applies directly to 6.17.13, in case you want to use it
locally.
From 03b75f017ffe6cf556fefbd44f44655bf4a9af48 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Fri, 27 Feb 2026 14:11:36 -0500
Subject: [PATCH] mm/huge_memory: fix folio_split() race condition with
folio_try_get()
During a pagecache folio split, the values in the related xarray should not
be changed from the original folio at xarray split time until all
after-split folios are ready and stored in the xarray. Otherwise, a
parallel folio_try_get() can see stale values in the xarray and a stale
value can be a unfrozen after-split folio. This leads to a wrong folio
returned to userspace.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/huge_memory.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d4ca8cfd7f9d..3d5bf3bb8a3e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3628,6 +3628,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
const bool is_anon = folio_test_anon(folio);
int old_order = folio_order(folio);
int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
+ struct folio *origin_folio = folio;
int split_order;
/*
@@ -3653,7 +3654,13 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
xas_split(xas, folio, old_order);
else {
xas_set_order(xas, folio->index, split_order);
- xas_try_split(xas, folio, old_order);
+ /*
+ * use the original folio, so that a parallel
+ * folio_try_get() waits on it until xarray is
+ * updated with after-split folios and
+ * the original one is unfreezed
+ */
+ xas_try_split(xas, origin_folio, old_order);
if (xas_error(xas))
return xas_error(xas);
}
--
2.51.0
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-02-27 23:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-26 20:34 [REGRESSION] madvise(MADV_REMOVE) corrupts pages in THP-backed MAP_SHARED memfd (bisected to 7460b470a131) Bas van Dijk
2026-02-26 20:49 ` Zi Yan
2026-02-26 21:06 ` Zi Yan
2026-02-26 21:16 ` [External Sender] " Bas van Dijk
2026-02-27 19:29 ` [External Sender] " Zi Yan
2026-02-27 23:32 ` Bas van Dijk
2026-02-27 23:44 ` Zi Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox