Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dev Jain <dev.jain@arm.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, riel@surriel.com,
	Liam.Howlett@oracle.com, vbabka@kernel.org, harry.yoo@oracle.com,
	jannh@google.com, baohua@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable <stable@kernel.org>
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Thu, 26 Feb 2026 15:57:08 +0530	[thread overview]
Message-ID: <697fe0bf-fd5b-4ea2-b430-37e38dd30590@arm.com> (raw)
In-Reply-To: <bc4ad689-6dd5-4637-92dd-d16ac8ddd460@kernel.org>



On 26/02/26 3:51 pm, David Hildenbrand (Arm) wrote:
> On 2/25/26 06:11, Dev Jain wrote:
>>
>>
>> On 24/02/26 9:31 pm, David Hildenbrand (Arm) wrote:
>>> On 2/24/26 12:43, Lorenzo Stoakes wrote:
>>>>
>>>> Sorry I misread the original mail rushing through this is old... so this is less
>>>> pressing than I thought (for some reason I thought it was merged last cycle...!)
>>>> but it's a good example of how stuff can go unnoticed for a while.
>>>>
>>>> In that case maybe a revert is a bit much and we just want the simplest possible
>>>> fix for backporting.
>>>
>>> Dev volunteered to un-messify some of the stuff here. In particular, to
>>> extend batching to all cases, not just some hand-selected ones.
>>>
>>> Support for file folios is on the way.
>>
>> Typo - anonymous non-lazyfree folios : )
> 
> Heh, no, not what I meant. We do have file folio support on the way (see
> the other patch set).

Ah I thought that got merged already : )

> 
>>
>>>
>>>>
>>>> But is the proposed 'just assume wrprotect' sensible? David?
>>>
>>> In general, I think so. If PTEs were writable, they certainly have
>>> PAE set. The write-fault handler can fully recover from that (as PAE is
>>> set). If it's ever a performance problem (doubt), we can revisit.
>>>
>>> I'm wondering whether we should just perform the wrprotect earlier:
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 0f00570d1b9e..19b875ee3fad 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>  
>>>                         /* Nuke the page table entry. */
>>>                         pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
>>> +
>>> +                       /*
>>> +                        * Our batch might include writable and read-only
>>> +                        * PTEs. When we have to restore the mapping, just
>>> +                        * assume read-only to not accidentally upgrade
>>> +                        * write permissions for PTEs that must not be
>>> +                        * writable.
>>> +                        */
>>> +                       pteval = pte_wrprotect(pteval);
>>> +
>>>                         /*
>>>                          * We clear the PTE but do not flush so potentially
>>>                          * a remote CPU could still be writing to the folio
>>>
>>>
>>> Given that nobody asks for writability (pte_write()) later.
>>>
>>> Or does someone care?
>>>
>>> Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
>>> not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
>>> architecture (write-only)? I don't think so.
>>>
>>>
>>> We have the following options:
>>>
>>> 1) pte_wrprotect(): fake that all was read-only.
>>>
>>> Either we do it like Dev suggests, or we do it as above early.
>>>
>>> The downside is that any code that might later want to know "was
>>> this possibly writable" would get that information. Well, it wouldn't
>>> get that information reliably *today* already (and that sounds a bit shaky).
>>
>> I would vote for this, since if we were to follow the current patch,
>> the extension to anon folios will make it worse (pte_wrprotect at 5 places
>> - the 3 additional places being in the if conditions consisting of
>> folio_dup_swap, arch_unmap_one, folio_try_share_anon_rmap_pte)
>> The downside being that if we fail in this rmap path, the ptes are all
>> write-protected. But then the page is already there - the fault is going
>> to be processed fast.
> 
> Right, we should only have a single "revert pte", and not have to redo
> that from multiple locations.
> 
>>
>>>
>>> 2) Tell batching logic to honor pte_write()
>>>
>>> Sounds suboptimal for some cases that really don't care in the future.
> 
> As per discussion with Barry, we might just want to do that now as an
> easy and obviously correct fix.
> 
> It's a shame we stop being able to use folio_pte_batch() and have to
> create an inlined version.
> 
>>>
>>> 3) Tell batching logic to tell us if any pte was writable: FPB_MERGE_WRITE
>>>
>>> ... then we know for sure whether any PTE was writable and we could
>>
>> Well, we don't need this? The problem here is that we are making a decision
>> on the basis of the writability of the *first* pte of the batch - so if
>> the first pte is writable, only then we have the problem we have been
>> talking about.
> 
> That's what I was referring above as "being shaky".
> 
> Some code has to be taught that "there is something writable here, so
> assume it was accessible in a certain way", other code has to be taught
> that "there is something read-only here, so make sure you don't
> accidentally make something writable".
> 
> One way to handle it is to say that "the resulting pte is writable, so
> assume it was accessible", to then say "but just assume it is read-only
> as we are not sure whether everything is writable".
> 
>>
>> We could have had a FPB_MERGE_WRPROTECT (which I know, is totally
>> incompatible with FPB_MERGE_WRITE) - that would tell whether at least one
>> pte in the patch was non-writable, in which case we will be able to avoid
>> the restoration of the entire batch to writeprotected if all the ptes
>> were writable (which I am assuming is the common case). But of course this
>> is not possible to do with the current shape of folio_pte_batch_flags. We
>> will have to revert the FPB_MERGE_* stuff to just collect the "at least one
>> is writable, at least one is dirty, at least one is young, at least one is
>> non-writable" etc information from the function and let the caller handle
>> it. That will kill all the work you did in simplifying that function :)
> Yeah, let's not go down that path. :)
> 
> To fix what we currently have in the tree, probably we should really
> just set FPB_RESPECT_WRITE|FPB_RESPECT_SOFT_DIRTY, saying that this is
> "obviously correct", and revisit it once we expect more cases where
> batching over these PTEs would provide more value.

Yup makes sense, I'll do this.

> 
> For lazyfree, likely it doesn't make a difference.
>

next prev parent reply	other threads:[~2026-02-26 10:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 11:09 Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43   ` Lorenzo Stoakes
2026-02-24 16:01     ` David Hildenbrand (Arm)
2026-02-25  5:11       ` Dev Jain
2026-02-26 10:21         ` David Hildenbrand (Arm)
2026-02-26 10:27           ` Dev Jain [this message]
2026-02-26  7:01       ` Barry Song
2026-02-26 10:09         ` David Hildenbrand (Arm)
2026-02-26 10:20           ` Barry Song
2026-02-26  7:09       ` Lance Yang
2026-02-26 10:06         ` David Hildenbrand (Arm)
2026-02-26 10:28           ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=697fe0bf-fd5b-4ea2-b430-37e38dd30590@arm.com \
    --to=dev.jain@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@kernel.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox