From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Barry Song <21cnbao@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Dev Jain <dev.jain@arm.com>,
akpm@linux-foundation.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@kernel.org, harry.yoo@oracle.com,
jannh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable <stable@kernel.org>
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Thu, 26 Feb 2026 11:09:14 +0100 [thread overview]
Message-ID: <bdfae0dc-b7f3-4c1e-b8ad-8a0cdf50c1e6@kernel.org> (raw)
In-Reply-To: <CAGsJ_4x=n9BPHN9Eg9KYY40T68wWuNJUN1x8dokJqFXSKu3KyQ@mail.gmail.com>
On 2/26/26 08:01, Barry Song wrote:
> On Wed, Feb 25, 2026 at 12:01 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/24/26 12:43, Lorenzo Stoakes wrote:
>>>
>>> Sorry I misread the original mail rushing through this is old... so this is less
>>> pressing than I thought (for some reason I thought it was merged last cycle...!)
>>> but it's a good example of how stuff can go unnoticed for a while.
>>>
>>> In that case maybe a revert is a bit much and we just want the simplest possible
>>> fix for backporting.
>
> Apologies for the mess I caused, and thanks to Dev for catching this bug.
>
>>
>> Dev volunteered to un-messify some of the stuff here. In particular, to
>> extend batching to all cases, not just some hand-selected ones.
>>
>> Support for file folios is on the way.
>>
>>>
>>> But is the proposed 'just assume wrprotect' sensible? David?
>>
>> In general, I think so. If PTEs were writable, they certainly have
>> PAE set. The write-fault handler can fully recover from that (as PAE is
>> set). If it's ever a performance problem (doubt), we can revisit.
>>
>> I'm wondering whether we should just perform the wrprotect earlier:
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 0f00570d1b9e..19b875ee3fad 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>
>> /* Nuke the page table entry. */
>> pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
>> +
>> + /*
>> + * Our batch might include writable and read-only
>> + * PTEs. When we have to restore the mapping, just
>> + * assume read-only to not accidentally upgrade
>> + * write permissions for PTEs that must not be
>> + * writable.
>> + */
>> + pteval = pte_wrprotect(pteval);
>> +
>> /*
>> * We clear the PTE but do not flush so potentially
>> * a remote CPU could still be writing to the folio
>>
>>
>> Given that nobody asks for writability (pte_write()) later.
>>
>> Or does someone care?
>>
>> Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
>> not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
>> architecture (write-only)? I don't think so.
>>
>>
>> We have the following options:
>>
>> 1) pte_wrprotect(): fake that all was read-only.
>>
>> Either we do it like Dev suggests, or we do it as above early.
>>
>> The downside is that any code that might later want to know "was
>> this possibly writable" would get that information. Well, it wouldn't
>> get that information reliably *today* already (and that sounds a bit shaky).
>>
>> 2) Tell batching logic to honor pte_write()
>>
>> Sounds suboptimal for some cases that really don't care in the future.
>
> I'm still curious what the downside would be to applying the
> simple fix instead of introducing more "hacks". As I assume,
> cases where a folio has both writable and non-writable PTEs
> are not common?
With "in the future" I thought about file folios, where I'd assume ti
could happen more often.
For lazyfree, I agree.
In the end, batching as much as possible is nice, but obviously, once it
gets too shaky in corner cases we might not care that much.
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index bff8f222004e..48ad3435593a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1955,7 +1955,7 @@ static inline unsigned int
> folio_unmap_pte_batch(struct folio *folio,
> if (userfaultfd_wp(vma))
> return 1;
>
> - return folio_pte_batch(folio, pvmw->pte, pte, max_nr);
> + return folio_pte_batch_flags(folio, NULL, pvmw->pte, &pte,
> max_nr, FPB_RESPECT_WRITE);
> }
If we already go for this approach assume we should then just set
FPB_RESPECT_SOFT_DIRTY as well and have it all handled properly.
--
Cheers,
David
next prev parent reply other threads:[~2026-02-26 10:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 11:09 Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43 ` Lorenzo Stoakes
2026-02-24 16:01 ` David Hildenbrand (Arm)
2026-02-25 5:11 ` Dev Jain
2026-02-26 10:21 ` David Hildenbrand (Arm)
2026-02-26 10:27 ` Dev Jain
2026-02-26 7:01 ` Barry Song
2026-02-26 10:09 ` David Hildenbrand (Arm) [this message]
2026-02-26 10:20 ` Barry Song
2026-02-26 7:09 ` Lance Yang
2026-02-26 10:06 ` David Hildenbrand (Arm)
2026-02-26 10:28 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bdfae0dc-b7f3-4c1e-b8ad-8a0cdf50c1e6@kernel.org \
--to=david@kernel.org \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=stable@kernel.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox