From: Barry Song <21cnbao@gmail.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Dev Jain <dev.jain@arm.com>,
akpm@linux-foundation.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@kernel.org,
harry.yoo@oracle.com, jannh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable <stable@kernel.org>
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Thu, 26 Feb 2026 18:20:07 +0800 [thread overview]
Message-ID: <CAGsJ_4x_ggrEs+TkxNpdyR4avPwcf7pyB_sUO=EOR7t53kCJXw@mail.gmail.com> (raw)
In-Reply-To: <bdfae0dc-b7f3-4c1e-b8ad-8a0cdf50c1e6@kernel.org>
On Thu, Feb 26, 2026 at 6:09 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/26/26 08:01, Barry Song wrote:
> > On Wed, Feb 25, 2026 at 12:01 AM David Hildenbrand (Arm)
> > <david@kernel.org> wrote:
> >>
> >> On 2/24/26 12:43, Lorenzo Stoakes wrote:
> >>>
> >>> Sorry I misread the original mail rushing through this is old... so this is less
> >>> pressing than I thought (for some reason I thought it was merged last cycle...!)
> >>> but it's a good example of how stuff can go unnoticed for a while.
> >>>
> >>> In that case maybe a revert is a bit much and we just want the simplest possible
> >>> fix for backporting.
> >
> > Apologies for the mess I caused, and thanks to Dev for catching this bug.
> >
> >>
> >> Dev volunteered to un-messify some of the stuff here. In particular, to
> >> extend batching to all cases, not just some hand-selected ones.
> >>
> >> Support for file folios is on the way.
> >>
> >>>
> >>> But is the proposed 'just assume wrprotect' sensible? David?
> >>
> >> In general, I think so. If PTEs were writable, they certainly have
> >> PAE set. The write-fault handler can fully recover from that (as PAE is
> >> set). If it's ever a performance problem (doubt), we can revisit.
> >>
> >> I'm wondering whether we should just perform the wrprotect earlier:
> >>
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index 0f00570d1b9e..19b875ee3fad 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >>
> >> /* Nuke the page table entry. */
> >> pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
> >> +
> >> + /*
> >> + * Our batch might include writable and read-only
> >> + * PTEs. When we have to restore the mapping, just
> >> + * assume read-only to not accidentally upgrade
> >> + * write permissions for PTEs that must not be
> >> + * writable.
> >> + */
> >> + pteval = pte_wrprotect(pteval);
> >> +
> >> /*
> >> * We clear the PTE but do not flush so potentially
> >> * a remote CPU could still be writing to the folio
> >>
> >>
> >> Given that nobody asks for writability (pte_write()) later.
> >>
> >> Or does someone care?
> >>
> >> Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
> >> not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
> >> architecture (write-only)? I don't think so.
> >>
> >>
> >> We have the following options:
> >>
> >> 1) pte_wrprotect(): fake that all was read-only.
> >>
> >> Either we do it like Dev suggests, or we do it as above early.
> >>
> >> The downside is that any code that might later want to know "was
> >> this possibly writable" would get that information. Well, it wouldn't
> >> get that information reliably *today* already (and that sounds a bit shaky).
> >>
> >> 2) Tell batching logic to honor pte_write()
> >>
> >> Sounds suboptimal for some cases that really don't care in the future.
> >
> > I'm still curious what the downside would be to applying the
> > simple fix instead of introducing more "hacks". As I assume,
> > cases where a folio has both writable and non-writable PTEs
> > are not common?
>
> With "in the future" I thought about file folios, where I'd assume ti
> could happen more often.
>
> For lazyfree, I agree.
>
> In the end, batching as much as possible is nice, but obviously, once it
> gets too shaky in corner cases we might not care that much.
Assuming 90% of folios have consistent PTEs, perhaps we don’t
need to worry too much about the remaining 10% of inconsistent
folios. We’ve already gained performance benefits for the
consistent 90%, and while the remaining 10% may not receive the
full batch, they are still getting some batching.
I don’t have the exact number, but it’s likely 90% or higher :-)
>
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index bff8f222004e..48ad3435593a 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1955,7 +1955,7 @@ static inline unsigned int
> > folio_unmap_pte_batch(struct folio *folio,
> > if (userfaultfd_wp(vma))
> > return 1;
> >
> > - return folio_pte_batch(folio, pvmw->pte, pte, max_nr);
> > + return folio_pte_batch_flags(folio, NULL, pvmw->pte, &pte,
> > max_nr, FPB_RESPECT_WRITE);
> > }
>
> If we already go for this approach assume we should then just set
> FPB_RESPECT_SOFT_DIRTY as well and have it all handled properly.
I would vote for this, as supporting those inconsistent PTE
cases could become an endless and painful task :-)
Thanks
Barry
next prev parent reply other threads:[~2026-02-26 10:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 11:09 Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43 ` Lorenzo Stoakes
2026-02-24 16:01 ` David Hildenbrand (Arm)
2026-02-25 5:11 ` Dev Jain
2026-02-26 10:21 ` David Hildenbrand (Arm)
2026-02-26 10:27 ` Dev Jain
2026-02-26 7:01 ` Barry Song
2026-02-26 10:09 ` David Hildenbrand (Arm)
2026-02-26 10:20 ` Barry Song [this message]
2026-02-26 7:09 ` Lance Yang
2026-02-26 10:06 ` David Hildenbrand (Arm)
2026-02-26 10:28 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGsJ_4x_ggrEs+TkxNpdyR4avPwcf7pyB_sUO=EOR7t53kCJXw@mail.gmail.com' \
--to=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=stable@kernel.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox