* [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage
@ 2025-09-28 4:48 Lance Yang
2025-09-29 4:44 ` Dev Jain
2025-09-29 7:25 ` David Hildenbrand
0 siblings, 2 replies; 10+ messages in thread
From: Lance Yang @ 2025-09-28 4:48 UTC (permalink / raw)
To: akpm, david, lorenzo.stoakes
Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel,
Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost,
joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable,
Lance Yang
From: Lance Yang <lance.yang@linux.dev>
When splitting an mTHP and replacing a zero-filled subpage with the shared
zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty
bit.
For userspace tools like CRIU, which rely on the soft-dirty mechanism for
incremental snapshots, losing this bit means modified pages are missed,
leading to inconsistent memory state after restore.
Preserve the soft-dirty bit from the old PTE when creating the zeropage
mapping to ensure modified pages are correctly tracked.
Cc: <stable@vger.kernel.org>
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
mm/migrate.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index ce83c2c3c287..bf364ba07a3f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
pvmw->vma->vm_page_prot));
+
+ if (pte_swp_soft_dirty(ptep_get(pvmw->pte)))
+ newpte = pte_mksoft_dirty(newpte);
+
set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio));
--
2.49.0
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-28 4:48 [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage Lance Yang @ 2025-09-29 4:44 ` Dev Jain 2025-09-29 10:15 ` Lance Yang 2025-09-29 7:25 ` David Hildenbrand 1 sibling, 1 reply; 10+ messages in thread From: Dev Jain @ 2025-09-29 4:44 UTC (permalink / raw) To: Lance Yang, akpm, david, lorenzo.stoakes Cc: ziy, baolin.wang, baohua, ryan.roberts, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable On 28/09/25 10:18 am, Lance Yang wrote: > From: Lance Yang <lance.yang@linux.dev> > > When splitting an mTHP and replacing a zero-filled subpage with the shared > zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty > bit. > > For userspace tools like CRIU, which rely on the soft-dirty mechanism for > incremental snapshots, losing this bit means modified pages are missed, > leading to inconsistent memory state after restore. > > Preserve the soft-dirty bit from the old PTE when creating the zeropage > mapping to ensure modified pages are correctly tracked. > > Cc: <stable@vger.kernel.org> > Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp") > Signed-off-by: Lance Yang <lance.yang@linux.dev> > --- > mm/migrate.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/migrate.c b/mm/migrate.c > index ce83c2c3c287..bf364ba07a3f 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw, > > newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), > pvmw->vma->vm_page_prot)); > + > + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) > + newpte = pte_mksoft_dirty(newpte); > + > set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); > > dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); I think this should work. You can pass old_pte = ptep_get(pvmw->pte) to this function to avoid calling ptep_get() multiple times. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 4:44 ` Dev Jain @ 2025-09-29 10:15 ` Lance Yang 0 siblings, 0 replies; 10+ messages in thread From: Lance Yang @ 2025-09-29 10:15 UTC (permalink / raw) To: Dev Jain Cc: ziy, baolin.wang, baohua, ryan.roberts, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes, david On 2025/9/29 12:44, Dev Jain wrote: > > On 28/09/25 10:18 am, Lance Yang wrote: >> From: Lance Yang <lance.yang@linux.dev> >> >> When splitting an mTHP and replacing a zero-filled subpage with the >> shared >> zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty >> bit. >> >> For userspace tools like CRIU, which rely on the soft-dirty mechanism for >> incremental snapshots, losing this bit means modified pages are missed, >> leading to inconsistent memory state after restore. >> >> Preserve the soft-dirty bit from the old PTE when creating the zeropage >> mapping to ensure modified pages are correctly tracked. >> >> Cc: <stable@vger.kernel.org> >> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >> when splitting isolated thp") >> Signed-off-by: Lance Yang <lance.yang@linux.dev> >> --- >> mm/migrate.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index ce83c2c3c287..bf364ba07a3f 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct >> page_vma_mapped_walk *pvmw, >> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >> pvmw->vma->vm_page_prot)); >> + >> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >> + newpte = pte_mksoft_dirty(newpte); >> + >> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); >> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); > > I think this should work. > > You can pass old_pte = ptep_get(pvmw->pte) to this function to avoid > calling ptep_get() > multiple times. Good catch! Will do in v2, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-28 4:48 [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage Lance Yang 2025-09-29 4:44 ` Dev Jain @ 2025-09-29 7:25 ` David Hildenbrand 2025-09-29 10:29 ` Lance Yang 1 sibling, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2025-09-29 7:25 UTC (permalink / raw) To: Lance Yang, akpm, lorenzo.stoakes Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable On 28.09.25 06:48, Lance Yang wrote: > From: Lance Yang <lance.yang@linux.dev> > > When splitting an mTHP and replacing a zero-filled subpage with the shared > zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty > bit. > > For userspace tools like CRIU, which rely on the soft-dirty mechanism for > incremental snapshots, losing this bit means modified pages are missed, > leading to inconsistent memory state after restore. > > Preserve the soft-dirty bit from the old PTE when creating the zeropage > mapping to ensure modified pages are correctly tracked. > > Cc: <stable@vger.kernel.org> > Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp") > Signed-off-by: Lance Yang <lance.yang@linux.dev> > --- > mm/migrate.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/migrate.c b/mm/migrate.c > index ce83c2c3c287..bf364ba07a3f 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw, > > newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), > pvmw->vma->vm_page_prot)); > + > + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) > + newpte = pte_mksoft_dirty(newpte); > + > set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); > > dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); It's interesting that there isn't a single occurrence of the stof-dirty flag in khugepaged code. I guess it all works because we do the _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); and the pmd_mkdirty() will imply marking it soft-dirty. Now to the problem at hand: I don't think this is particularly problematic in the common case: if the page is zero, it likely was never written to (that's what the unerused shrinker is targeted at), so the soft-dirty setting on the PMD is actually just an over-indication for this page. For example, when we just install the shared zeropage directly in do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. Now, one could argue that if the content was changed from non-zero to zero, it ould actually be soft-dirty. Long-story short: I don't think this matters much in practice, but it's an easy fix. As said by dev, please avoid double ptep_get() if possible. Acked-by: David Hildenbrand <david@redhat.com> @Lance, can you double-check that the uffd-wp bit is handled correctly? I strongly assume we lose that as well here. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 7:25 ` David Hildenbrand @ 2025-09-29 10:29 ` Lance Yang 2025-09-29 11:29 ` Lance Yang 0 siblings, 1 reply; 10+ messages in thread From: Lance Yang @ 2025-09-29 10:29 UTC (permalink / raw) To: David Hildenbrand Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes On 2025/9/29 15:25, David Hildenbrand wrote: > On 28.09.25 06:48, Lance Yang wrote: >> From: Lance Yang <lance.yang@linux.dev> >> >> When splitting an mTHP and replacing a zero-filled subpage with the >> shared >> zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty >> bit. >> >> For userspace tools like CRIU, which rely on the soft-dirty mechanism for >> incremental snapshots, losing this bit means modified pages are missed, >> leading to inconsistent memory state after restore. >> >> Preserve the soft-dirty bit from the old PTE when creating the zeropage >> mapping to ensure modified pages are correctly tracked. >> >> Cc: <stable@vger.kernel.org> >> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >> when splitting isolated thp") >> Signed-off-by: Lance Yang <lance.yang@linux.dev> >> --- >> mm/migrate.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index ce83c2c3c287..bf364ba07a3f 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct >> page_vma_mapped_walk *pvmw, >> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >> pvmw->vma->vm_page_prot)); >> + >> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >> + newpte = pte_mksoft_dirty(newpte); >> + >> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); >> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); > > It's interesting that there isn't a single occurrence of the stof-dirty > flag in khugepaged code. I guess it all works because we do the > > _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); > > and the pmd_mkdirty() will imply marking it soft-dirty. > > Now to the problem at hand: I don't think this is particularly > problematic in the common case: if the page is zero, it likely was never > written to (that's what the unerused shrinker is targeted at), so the > soft-dirty setting on the PMD is actually just an over-indication for > this page. Cool. Thanks for the insight! Good to know that ;) > > For example, when we just install the shared zeropage directly in > do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. > > Now, one could argue that if the content was changed from non-zero to > zero, it ould actually be soft-dirty. Exactly. A false negative could be a problem for the userspace tools, IMO. > > Long-story short: I don't think this matters much in practice, but it's > an easy fix. > > As said by dev, please avoid double ptep_get() if possible. Sure, will do. I'll refactor it in the next version. > > Acked-by: David Hildenbrand <david@redhat.com> Thanks! > > > @Lance, can you double-check that the uffd-wp bit is handled correctly? > I strongly assume we lose that as well here. Certainly, I'll check the uffd-wp bit as well and get back to you soon. Cheers, Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 10:29 ` Lance Yang @ 2025-09-29 11:29 ` Lance Yang 2025-09-29 12:08 ` David Hildenbrand 0 siblings, 1 reply; 10+ messages in thread From: Lance Yang @ 2025-09-29 11:29 UTC (permalink / raw) To: David Hildenbrand Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes On 2025/9/29 18:29, Lance Yang wrote: > > > On 2025/9/29 15:25, David Hildenbrand wrote: >> On 28.09.25 06:48, Lance Yang wrote: >>> From: Lance Yang <lance.yang@linux.dev> >>> >>> When splitting an mTHP and replacing a zero-filled subpage with the >>> shared >>> zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty >>> bit. >>> >>> For userspace tools like CRIU, which rely on the soft-dirty mechanism >>> for >>> incremental snapshots, losing this bit means modified pages are missed, >>> leading to inconsistent memory state after restore. >>> >>> Preserve the soft-dirty bit from the old PTE when creating the zeropage >>> mapping to ensure modified pages are correctly tracked. >>> >>> Cc: <stable@vger.kernel.org> >>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >>> when splitting isolated thp") >>> Signed-off-by: Lance Yang <lance.yang@linux.dev> >>> --- >>> mm/migrate.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/mm/migrate.c b/mm/migrate.c >>> index ce83c2c3c287..bf364ba07a3f 100644 >>> --- a/mm/migrate.c >>> +++ b/mm/migrate.c >>> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct >>> page_vma_mapped_walk *pvmw, >>> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >>> pvmw->vma->vm_page_prot)); >>> + >>> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >>> + newpte = pte_mksoft_dirty(newpte); >>> + >>> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); >>> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); >> >> It's interesting that there isn't a single occurrence of the stof- >> dirty flag in khugepaged code. I guess it all works because we do the >> >> _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); >> >> and the pmd_mkdirty() will imply marking it soft-dirty. >> >> Now to the problem at hand: I don't think this is particularly >> problematic in the common case: if the page is zero, it likely was >> never written to (that's what the unerused shrinker is targeted at), >> so the soft-dirty setting on the PMD is actually just an over- >> indication for this page. > > Cool. Thanks for the insight! Good to know that ;) > >> >> For example, when we just install the shared zeropage directly in >> do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. >> >> Now, one could argue that if the content was changed from non-zero to >> zero, it ould actually be soft-dirty. > > Exactly. A false negative could be a problem for the userspace tools, IMO. > >> >> Long-story short: I don't think this matters much in practice, but >> it's an easy fix. >> >> As said by dev, please avoid double ptep_get() if possible. > > Sure, will do. I'll refactor it in the next version. > >> >> Acked-by: David Hildenbrand <david@redhat.com> > > Thanks! > >> >> >> @Lance, can you double-check that the uffd-wp bit is handled >> correctly? I strongly assume we lose that as well here. Yes, the uffd-wp bit was indeed being dropped, but ... The shared zeropage is read-only, which triggers a fault. IIUC, The kernel then falls back to checking the VM_UFFD_WP flag on the VMA and correctly generates a uffd-wp event, masking the fact that the uffd-wp bit on the PTE was lost. IMHO, explicitly preserving the uffd-wp bit on the PTE is still necessary, since we're not sure if losing that bit is safe in all cases :) > > Certainly, I'll check the uffd-wp bit as well and get back to you soon. > > Cheers, > Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 11:29 ` Lance Yang @ 2025-09-29 12:08 ` David Hildenbrand 2025-09-29 13:22 ` Lance Yang 0 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2025-09-29 12:08 UTC (permalink / raw) To: Lance Yang Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes On 29.09.25 13:29, Lance Yang wrote: > > > On 2025/9/29 18:29, Lance Yang wrote: >> >> >> On 2025/9/29 15:25, David Hildenbrand wrote: >>> On 28.09.25 06:48, Lance Yang wrote: >>>> From: Lance Yang <lance.yang@linux.dev> >>>> >>>> When splitting an mTHP and replacing a zero-filled subpage with the >>>> shared >>>> zeropage, try_to_map_unused_to_zeropage() currently drops the soft-dirty >>>> bit. >>>> >>>> For userspace tools like CRIU, which rely on the soft-dirty mechanism >>>> for >>>> incremental snapshots, losing this bit means modified pages are missed, >>>> leading to inconsistent memory state after restore. >>>> >>>> Preserve the soft-dirty bit from the old PTE when creating the zeropage >>>> mapping to ensure modified pages are correctly tracked. >>>> >>>> Cc: <stable@vger.kernel.org> >>>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >>>> when splitting isolated thp") >>>> Signed-off-by: Lance Yang <lance.yang@linux.dev> >>>> --- >>>> mm/migrate.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>> index ce83c2c3c287..bf364ba07a3f 100644 >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct >>>> page_vma_mapped_walk *pvmw, >>>> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >>>> pvmw->vma->vm_page_prot)); >>>> + >>>> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >>>> + newpte = pte_mksoft_dirty(newpte); >>>> + >>>> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); >>>> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); >>> >>> It's interesting that there isn't a single occurrence of the stof- >>> dirty flag in khugepaged code. I guess it all works because we do the >>> >>> _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); >>> >>> and the pmd_mkdirty() will imply marking it soft-dirty. >>> >>> Now to the problem at hand: I don't think this is particularly >>> problematic in the common case: if the page is zero, it likely was >>> never written to (that's what the unerused shrinker is targeted at), >>> so the soft-dirty setting on the PMD is actually just an over- >>> indication for this page. >> >> Cool. Thanks for the insight! Good to know that ;) >> >>> >>> For example, when we just install the shared zeropage directly in >>> do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. >>> >>> Now, one could argue that if the content was changed from non-zero to >>> zero, it ould actually be soft-dirty. >> >> Exactly. A false negative could be a problem for the userspace tools, IMO. >> >>> >>> Long-story short: I don't think this matters much in practice, but >>> it's an easy fix. >>> >>> As said by dev, please avoid double ptep_get() if possible. >> >> Sure, will do. I'll refactor it in the next version. >> >>> >>> Acked-by: David Hildenbrand <david@redhat.com> >> >> Thanks! >> >>> >>> >>> @Lance, can you double-check that the uffd-wp bit is handled >>> correctly? I strongly assume we lose that as well here. > > Yes, the uffd-wp bit was indeed being dropped, but ... > > The shared zeropage is read-only, which triggers a fault. IIUC, > The kernel then falls back to checking the VM_UFFD_WP flag on > the VMA and correctly generates a uffd-wp event, masking the > fact that the uffd-wp bit on the PTE was lost. That's not how VM_UFFD_WP works :) -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 12:08 ` David Hildenbrand @ 2025-09-29 13:22 ` Lance Yang 2025-09-29 16:11 ` David Hildenbrand 0 siblings, 1 reply; 10+ messages in thread From: Lance Yang @ 2025-09-29 13:22 UTC (permalink / raw) To: David Hildenbrand Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes On 2025/9/29 20:08, David Hildenbrand wrote: > On 29.09.25 13:29, Lance Yang wrote: >> >> >> On 2025/9/29 18:29, Lance Yang wrote: >>> >>> >>> On 2025/9/29 15:25, David Hildenbrand wrote: >>>> On 28.09.25 06:48, Lance Yang wrote: >>>>> From: Lance Yang <lance.yang@linux.dev> >>>>> >>>>> When splitting an mTHP and replacing a zero-filled subpage with the >>>>> shared >>>>> zeropage, try_to_map_unused_to_zeropage() currently drops the soft- >>>>> dirty >>>>> bit. >>>>> >>>>> For userspace tools like CRIU, which rely on the soft-dirty mechanism >>>>> for >>>>> incremental snapshots, losing this bit means modified pages are >>>>> missed, >>>>> leading to inconsistent memory state after restore. >>>>> >>>>> Preserve the soft-dirty bit from the old PTE when creating the >>>>> zeropage >>>>> mapping to ensure modified pages are correctly tracked. >>>>> >>>>> Cc: <stable@vger.kernel.org> >>>>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >>>>> when splitting isolated thp") >>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev> >>>>> --- >>>>> mm/migrate.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>> index ce83c2c3c287..bf364ba07a3f 100644 >>>>> --- a/mm/migrate.c >>>>> +++ b/mm/migrate.c >>>>> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct >>>>> page_vma_mapped_walk *pvmw, >>>>> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >>>>> pvmw->vma->vm_page_prot)); >>>>> + >>>>> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >>>>> + newpte = pte_mksoft_dirty(newpte); >>>>> + >>>>> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); >>>>> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); >>>> >>>> It's interesting that there isn't a single occurrence of the stof- >>>> dirty flag in khugepaged code. I guess it all works because we do the >>>> >>>> _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); >>>> >>>> and the pmd_mkdirty() will imply marking it soft-dirty. >>>> >>>> Now to the problem at hand: I don't think this is particularly >>>> problematic in the common case: if the page is zero, it likely was >>>> never written to (that's what the unerused shrinker is targeted at), >>>> so the soft-dirty setting on the PMD is actually just an over- >>>> indication for this page. >>> >>> Cool. Thanks for the insight! Good to know that ;) >>> >>>> >>>> For example, when we just install the shared zeropage directly in >>>> do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. >>>> >>>> Now, one could argue that if the content was changed from non-zero to >>>> zero, it ould actually be soft-dirty. >>> >>> Exactly. A false negative could be a problem for the userspace tools, >>> IMO. >>> >>>> >>>> Long-story short: I don't think this matters much in practice, but >>>> it's an easy fix. >>>> >>>> As said by dev, please avoid double ptep_get() if possible. >>> >>> Sure, will do. I'll refactor it in the next version. >>> >>>> >>>> Acked-by: David Hildenbrand <david@redhat.com> >>> >>> Thanks! >>> >>>> >>>> >>>> @Lance, can you double-check that the uffd-wp bit is handled >>>> correctly? I strongly assume we lose that as well here. >> >> Yes, the uffd-wp bit was indeed being dropped, but ... >> >> The shared zeropage is read-only, which triggers a fault. IIUC, >> The kernel then falls back to checking the VM_UFFD_WP flag on >> the VMA and correctly generates a uffd-wp event, masking the >> fact that the uffd-wp bit on the PTE was lost. > > That's not how VM_UFFD_WP works :) My bad! Please accept my apologies for the earlier confusion :( I messed up my test environment (forgot to enable mTHP), which led me to a completely wrong conclusion... You're spot on. With mTHP enabled, the WP fault was not caught on the shared zeropage after it replaced a zero-filled subpage during an mTHP split. This is because do_wp_page() requires userfaultfd_pte_wp() to be true, which in turn needs both userfaultfd_wp(vma) and pte_uffd_wp(pte). static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { return userfaultfd_wp(vma) && pte_uffd_wp(pte); } userfaultfd_pte_wp() fails as we lose the uffd-wp bit on the PTE ... Please correct me if I missed something important! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 13:22 ` Lance Yang @ 2025-09-29 16:11 ` David Hildenbrand 2025-09-30 1:53 ` Lance Yang 0 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2025-09-29 16:11 UTC (permalink / raw) To: Lance Yang Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes On 29.09.25 15:22, Lance Yang wrote: > > > On 2025/9/29 20:08, David Hildenbrand wrote: >> On 29.09.25 13:29, Lance Yang wrote: >>> >>> >>> On 2025/9/29 18:29, Lance Yang wrote: >>>> >>>> >>>> On 2025/9/29 15:25, David Hildenbrand wrote: >>>>> On 28.09.25 06:48, Lance Yang wrote: >>>>>> From: Lance Yang <lance.yang@linux.dev> >>>>>> >>>>>> When splitting an mTHP and replacing a zero-filled subpage with the >>>>>> shared >>>>>> zeropage, try_to_map_unused_to_zeropage() currently drops the soft- >>>>>> dirty >>>>>> bit. >>>>>> >>>>>> For userspace tools like CRIU, which rely on the soft-dirty mechanism >>>>>> for >>>>>> incremental snapshots, losing this bit means modified pages are >>>>>> missed, >>>>>> leading to inconsistent memory state after restore. >>>>>> >>>>>> Preserve the soft-dirty bit from the old PTE when creating the >>>>>> zeropage >>>>>> mapping to ensure modified pages are correctly tracked. >>>>>> >>>>>> Cc: <stable@vger.kernel.org> >>>>>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >>>>>> when splitting isolated thp") >>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev> >>>>>> --- >>>>>> mm/migrate.c | 4 ++++ >>>>>> 1 file changed, 4 insertions(+) >>>>>> >>>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>>> index ce83c2c3c287..bf364ba07a3f 100644 >>>>>> --- a/mm/migrate.c >>>>>> +++ b/mm/migrate.c >>>>>> @@ -322,6 +322,10 @@ static bool try_to_map_unused_to_zeropage(struct >>>>>> page_vma_mapped_walk *pvmw, >>>>>> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >>>>>> pvmw->vma->vm_page_prot)); >>>>>> + >>>>>> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >>>>>> + newpte = pte_mksoft_dirty(newpte); >>>>>> + >>>>>> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); >>>>>> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); >>>>> >>>>> It's interesting that there isn't a single occurrence of the stof- >>>>> dirty flag in khugepaged code. I guess it all works because we do the >>>>> >>>>> _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); >>>>> >>>>> and the pmd_mkdirty() will imply marking it soft-dirty. >>>>> >>>>> Now to the problem at hand: I don't think this is particularly >>>>> problematic in the common case: if the page is zero, it likely was >>>>> never written to (that's what the unerused shrinker is targeted at), >>>>> so the soft-dirty setting on the PMD is actually just an over- >>>>> indication for this page. >>>> >>>> Cool. Thanks for the insight! Good to know that ;) >>>> >>>>> >>>>> For example, when we just install the shared zeropage directly in >>>>> do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. >>>>> >>>>> Now, one could argue that if the content was changed from non-zero to >>>>> zero, it ould actually be soft-dirty. >>>> >>>> Exactly. A false negative could be a problem for the userspace tools, >>>> IMO. >>>> >>>>> >>>>> Long-story short: I don't think this matters much in practice, but >>>>> it's an easy fix. >>>>> >>>>> As said by dev, please avoid double ptep_get() if possible. >>>> >>>> Sure, will do. I'll refactor it in the next version. >>>> >>>>> >>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>> >>>> Thanks! >>>> >>>>> >>>>> >>>>> @Lance, can you double-check that the uffd-wp bit is handled >>>>> correctly? I strongly assume we lose that as well here. >>> >>> Yes, the uffd-wp bit was indeed being dropped, but ... >>> >>> The shared zeropage is read-only, which triggers a fault. IIUC, >>> The kernel then falls back to checking the VM_UFFD_WP flag on >>> the VMA and correctly generates a uffd-wp event, masking the >>> fact that the uffd-wp bit on the PTE was lost. >> >> That's not how VM_UFFD_WP works :) > > My bad! Please accept my apologies for the earlier confusion :( > > I messed up my test environment (forgot to enable mTHP), which > led me to a completely wrong conclusion... > > You're spot on. With mTHP enabled, the WP fault was not caught > on the shared zeropage after it replaced a zero-filled subpage > during an mTHP split. > > This is because do_wp_page() requires userfaultfd_pte_wp() to > be true, which in turn needs both userfaultfd_wp(vma) and > pte_uffd_wp(pte). > > static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, > pte_t pte) > { > return userfaultfd_wp(vma) && pte_uffd_wp(pte); > } > > userfaultfd_pte_wp() fails as we lose the uffd-wp bit on the PTE ... That's my understanding. And FWIW, that's a much more important fix. (in contrast to soft-dirty, uffd-wp actually is precise) Can you test+send a fix ... please? :) -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage 2025-09-29 16:11 ` David Hildenbrand @ 2025-09-30 1:53 ` Lance Yang 0 siblings, 0 replies; 10+ messages in thread From: Lance Yang @ 2025-09-30 1:53 UTC (permalink / raw) To: David Hildenbrand Cc: ziy, baolin.wang, baohua, ryan.roberts, dev.jain, npache, riel, Liam.Howlett, vbabka, harry.yoo, jannh, matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple, usamaarif642, yuzhao, linux-kernel, linux-mm, ioworker0, stable, akpm, lorenzo.stoakes On 2025/9/30 00:11, David Hildenbrand wrote: > On 29.09.25 15:22, Lance Yang wrote: >> >> >> On 2025/9/29 20:08, David Hildenbrand wrote: >>> On 29.09.25 13:29, Lance Yang wrote: >>>> >>>> >>>> On 2025/9/29 18:29, Lance Yang wrote: >>>>> >>>>> >>>>> On 2025/9/29 15:25, David Hildenbrand wrote: >>>>>> On 28.09.25 06:48, Lance Yang wrote: >>>>>>> From: Lance Yang <lance.yang@linux.dev> >>>>>>> >>>>>>> When splitting an mTHP and replacing a zero-filled subpage with the >>>>>>> shared >>>>>>> zeropage, try_to_map_unused_to_zeropage() currently drops the soft- >>>>>>> dirty >>>>>>> bit. >>>>>>> >>>>>>> For userspace tools like CRIU, which rely on the soft-dirty >>>>>>> mechanism >>>>>>> for >>>>>>> incremental snapshots, losing this bit means modified pages are >>>>>>> missed, >>>>>>> leading to inconsistent memory state after restore. >>>>>>> >>>>>>> Preserve the soft-dirty bit from the old PTE when creating the >>>>>>> zeropage >>>>>>> mapping to ensure modified pages are correctly tracked. >>>>>>> >>>>>>> Cc: <stable@vger.kernel.org> >>>>>>> Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage >>>>>>> when splitting isolated thp") >>>>>>> Signed-off-by: Lance Yang <lance.yang@linux.dev> >>>>>>> --- >>>>>>> mm/migrate.c | 4 ++++ >>>>>>> 1 file changed, 4 insertions(+) >>>>>>> >>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>>>> index ce83c2c3c287..bf364ba07a3f 100644 >>>>>>> --- a/mm/migrate.c >>>>>>> +++ b/mm/migrate.c >>>>>>> @@ -322,6 +322,10 @@ static bool >>>>>>> try_to_map_unused_to_zeropage(struct >>>>>>> page_vma_mapped_walk *pvmw, >>>>>>> newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), >>>>>>> pvmw->vma->vm_page_prot)); >>>>>>> + >>>>>>> + if (pte_swp_soft_dirty(ptep_get(pvmw->pte))) >>>>>>> + newpte = pte_mksoft_dirty(newpte); >>>>>>> + >>>>>>> set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, >>>>>>> newpte); >>>>>>> dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); >>>>>> >>>>>> It's interesting that there isn't a single occurrence of the stof- >>>>>> dirty flag in khugepaged code. I guess it all works because we do the >>>>>> >>>>>> _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); >>>>>> >>>>>> and the pmd_mkdirty() will imply marking it soft-dirty. >>>>>> >>>>>> Now to the problem at hand: I don't think this is particularly >>>>>> problematic in the common case: if the page is zero, it likely was >>>>>> never written to (that's what the unerused shrinker is targeted at), >>>>>> so the soft-dirty setting on the PMD is actually just an over- >>>>>> indication for this page. >>>>> >>>>> Cool. Thanks for the insight! Good to know that ;) >>>>> >>>>>> >>>>>> For example, when we just install the shared zeropage directly in >>>>>> do_anonymous_page(), we obviously also don't set it dirty/soft-dirty. >>>>>> >>>>>> Now, one could argue that if the content was changed from non-zero to >>>>>> zero, it ould actually be soft-dirty. >>>>> >>>>> Exactly. A false negative could be a problem for the userspace tools, >>>>> IMO. >>>>> >>>>>> >>>>>> Long-story short: I don't think this matters much in practice, but >>>>>> it's an easy fix. >>>>>> >>>>>> As said by dev, please avoid double ptep_get() if possible. >>>>> >>>>> Sure, will do. I'll refactor it in the next version. >>>>> >>>>>> >>>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>>> >>>>> Thanks! >>>>> >>>>>> >>>>>> >>>>>> @Lance, can you double-check that the uffd-wp bit is handled >>>>>> correctly? I strongly assume we lose that as well here. >>>> >>>> Yes, the uffd-wp bit was indeed being dropped, but ... >>>> >>>> The shared zeropage is read-only, which triggers a fault. IIUC, >>>> The kernel then falls back to checking the VM_UFFD_WP flag on >>>> the VMA and correctly generates a uffd-wp event, masking the >>>> fact that the uffd-wp bit on the PTE was lost. >>> >>> That's not how VM_UFFD_WP works :) >> >> My bad! Please accept my apologies for the earlier confusion :( >> >> I messed up my test environment (forgot to enable mTHP), which >> led me to a completely wrong conclusion... >> >> You're spot on. With mTHP enabled, the WP fault was not caught >> on the shared zeropage after it replaced a zero-filled subpage >> during an mTHP split. >> >> This is because do_wp_page() requires userfaultfd_pte_wp() to >> be true, which in turn needs both userfaultfd_wp(vma) and >> pte_uffd_wp(pte). >> >> static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, >> pte_t pte) >> { >> return userfaultfd_wp(vma) && pte_uffd_wp(pte); >> } >> >> userfaultfd_pte_wp() fails as we lose the uffd-wp bit on the PTE ... > > That's my understanding. And FWIW, that's a much more important fix. (in > contrast to soft-dirty, uffd-wp actually is precise) Got it, and thanks for setting me straight on that! > > Can you test+send a fix ... please? :) > Certainly, I'm on it ;) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-09-30 1:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-09-28 4:48 [PATCH 1/1] mm/rmap: fix soft-dirty bit loss when remapping zero-filled mTHP subpage to shared zeropage Lance Yang 2025-09-29 4:44 ` Dev Jain 2025-09-29 10:15 ` Lance Yang 2025-09-29 7:25 ` David Hildenbrand 2025-09-29 10:29 ` Lance Yang 2025-09-29 11:29 ` Lance Yang 2025-09-29 12:08 ` David Hildenbrand 2025-09-29 13:22 ` Lance Yang 2025-09-29 16:11 ` David Hildenbrand 2025-09-30 1:53 ` Lance Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox