Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Dev Jain <dev.jain@arm.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, riel@surriel.com,
	Liam.Howlett@oracle.com, vbabka@kernel.org, harry.yoo@oracle.com,
	jannh@google.com, baohua@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable <stable@kernel.org>
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Thu, 26 Feb 2026 11:21:45 +0100	[thread overview]
Message-ID: <bc4ad689-6dd5-4637-92dd-d16ac8ddd460@kernel.org> (raw)
In-Reply-To: <40c4917a-cf50-43f6-8ef0-de5a2c7a638f@arm.com>

On 2/25/26 06:11, Dev Jain wrote:
> 
> 
> On 24/02/26 9:31 pm, David Hildenbrand (Arm) wrote:
>> On 2/24/26 12:43, Lorenzo Stoakes wrote:
>>>
>>> Sorry I misread the original mail rushing through this is old... so this is less
>>> pressing than I thought (for some reason I thought it was merged last cycle...!)
>>> but it's a good example of how stuff can go unnoticed for a while.
>>>
>>> In that case maybe a revert is a bit much and we just want the simplest possible
>>> fix for backporting.
>>
>> Dev volunteered to un-messify some of the stuff here. In particular, to
>> extend batching to all cases, not just some hand-selected ones.
>>
>> Support for file folios is on the way.
> 
> Typo - anonymous non-lazyfree folios : )

Heh, no, not what I meant. We do have file folio support on the way (see
the other patch set).

> 
>>
>>>
>>> But is the proposed 'just assume wrprotect' sensible? David?
>>
>> In general, I think so. If PTEs were writable, they certainly have
>> PAE set. The write-fault handler can fully recover from that (as PAE is
>> set). If it's ever a performance problem (doubt), we can revisit.
>>
>> I'm wondering whether we should just perform the wrprotect earlier:
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 0f00570d1b9e..19b875ee3fad 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>  
>>                         /* Nuke the page table entry. */
>>                         pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
>> +
>> +                       /*
>> +                        * Our batch might include writable and read-only
>> +                        * PTEs. When we have to restore the mapping, just
>> +                        * assume read-only to not accidentally upgrade
>> +                        * write permissions for PTEs that must not be
>> +                        * writable.
>> +                        */
>> +                       pteval = pte_wrprotect(pteval);
>> +
>>                         /*
>>                          * We clear the PTE but do not flush so potentially
>>                          * a remote CPU could still be writing to the folio
>>
>>
>> Given that nobody asks for writability (pte_write()) later.
>>
>> Or does someone care?
>>
>> Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
>> not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
>> architecture (write-only)? I don't think so.
>>
>>
>> We have the following options:
>>
>> 1) pte_wrprotect(): fake that all was read-only.
>>
>> Either we do it like Dev suggests, or we do it as above early.
>>
>> The downside is that any code that might later want to know "was
>> this possibly writable" would get that information. Well, it wouldn't
>> get that information reliably *today* already (and that sounds a bit shaky).
> 
> I would vote for this, since if we were to follow the current patch,
> the extension to anon folios will make it worse (pte_wrprotect at 5 places
> - the 3 additional places being in the if conditions consisting of
> folio_dup_swap, arch_unmap_one, folio_try_share_anon_rmap_pte)
> The downside being that if we fail in this rmap path, the ptes are all
> write-protected. But then the page is already there - the fault is going
> to be processed fast.

Right, we should only have a single "revert pte", and not have to redo
that from multiple locations.

> 
>>
>> 2) Tell batching logic to honor pte_write()
>>
>> Sounds suboptimal for some cases that really don't care in the future.

As per discussion with Barry, we might just want to do that now as an
easy and obviously correct fix.

It's a shame we stop being able to use folio_pte_batch() and have to
create an inlined version.

>>
>> 3) Tell batching logic to tell us if any pte was writable: FPB_MERGE_WRITE
>>
>> ... then we know for sure whether any PTE was writable and we could
> 
> Well, we don't need this? The problem here is that we are making a decision
> on the basis of the writability of the *first* pte of the batch - so if
> the first pte is writable, only then we have the problem we have been
> talking about.

That's what I was referring above as "being shaky".

Some code has to be taught that "there is something writable here, so
assume it was accessible in a certain way", other code has to be taught
that "there is something read-only here, so make sure you don't
accidentally make something writable".

One way to handle it is to say that "the resulting pte is writable, so
assume it was accessible", to then say "but just assume it is read-only
as we are not sure whether everything is writable".

> 
> We could have had a FPB_MERGE_WRPROTECT (which I know, is totally
> incompatible with FPB_MERGE_WRITE) - that would tell whether at least one
> pte in the patch was non-writable, in which case we will be able to avoid
> the restoration of the entire batch to writeprotected if all the ptes
> were writable (which I am assuming is the common case). But of course this
> is not possible to do with the current shape of folio_pte_batch_flags. We
> will have to revert the FPB_MERGE_* stuff to just collect the "at least one
> is writable, at least one is dirty, at least one is young, at least one is
> non-writable" etc information from the function and let the caller handle
> it. That will kill all the work you did in simplifying that function :)
Yeah, let's not go down that path. :)

To fix what we currently have in the tree, probably we should really
just set FPB_RESPECT_WRITE|FPB_RESPECT_SOFT_DIRTY, saying that this is
"obviously correct", and revisit it once we expect more cases where
batching over these PTEs would provide more value.

For lazyfree, likely it doesn't make a difference.

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-02-26 10:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 11:09 Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43   ` Lorenzo Stoakes
2026-02-24 16:01     ` David Hildenbrand (Arm)
2026-02-25  5:11       ` Dev Jain
2026-02-26 10:21         ` David Hildenbrand (Arm) [this message]
2026-02-26 10:27           ` Dev Jain
2026-02-26  7:01       ` Barry Song
2026-02-26 10:09         ` David Hildenbrand (Arm)
2026-02-26 10:20           ` Barry Song
2026-02-26  7:09       ` Lance Yang
2026-02-26 10:06         ` David Hildenbrand (Arm)
2026-02-26 10:28           ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc4ad689-6dd5-4637-92dd-d16ac8ddd460@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@kernel.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox