From: Dev Jain <dev.jain@arm.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: akpm@linux-foundation.org, riel@surriel.com,
Liam.Howlett@oracle.com, vbabka@kernel.org, harry.yoo@oracle.com,
jannh@google.com, baohua@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable <stable@kernel.org>
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Wed, 25 Feb 2026 10:41:04 +0530 [thread overview]
Message-ID: <40c4917a-cf50-43f6-8ef0-de5a2c7a638f@arm.com> (raw)
In-Reply-To: <36e676b4-dc6f-45f7-b885-8685227ac6a8@kernel.org>
On 24/02/26 9:31 pm, David Hildenbrand (Arm) wrote:
> On 2/24/26 12:43, Lorenzo Stoakes wrote:
>> On Tue, Feb 24, 2026 at 11:31:24AM +0000, Lorenzo Stoakes wrote:
>>> Thanks Dev.
>>>
>>> Andrew - why was commit 354dffd29575 ("mm: support batched unmap for lazyfree
>>> large folios during reclamation") merged?
>>>
>>> It had enormous amounts of review commentary at
>>> https://lore.kernel.org/all/146b4cb1-aa1e-4519-9e03-f98cfb1135d2@redhat.com/ and
>>> no tags, this should be a signal to wait for a respin _at least_, and really if
>>> late in cycle suggests it should wait a cycle.
>>>
>>> I've said going forward I'm going to check THP series for tags and if not
>>> present NAK if they hit mm-stable, I guess I'll extend that to rmap also.
>>
>> Sorry I misread the original mail rushing through this is old... so this is less
>> pressing than I thought (for some reason I thought it was merged last cycle...!)
>> but it's a good example of how stuff can go unnoticed for a while.
>>
>> In that case maybe a revert is a bit much and we just want the simplest possible
>> fix for backporting.
>
> Dev volunteered to un-messify some of the stuff here. In particular, to
> extend batching to all cases, not just some hand-selected ones.
>
> Support for file folios is on the way.
Typo - anonymous non-lazyfree folios : )
>
>>
>> But is the proposed 'just assume wrprotect' sensible? David?
>
> In general, I think so. If PTEs were writable, they certainly have
> PAE set. The write-fault handler can fully recover from that (as PAE is
> set). If it's ever a performance problem (doubt), we can revisit.
>
> I'm wondering whether we should just perform the wrprotect earlier:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0f00570d1b9e..19b875ee3fad 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>
> /* Nuke the page table entry. */
> pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
> +
> + /*
> + * Our batch might include writable and read-only
> + * PTEs. When we have to restore the mapping, just
> + * assume read-only to not accidentally upgrade
> + * write permissions for PTEs that must not be
> + * writable.
> + */
> + pteval = pte_wrprotect(pteval);
> +
> /*
> * We clear the PTE but do not flush so potentially
> * a remote CPU could still be writing to the folio
>
>
> Given that nobody asks for writability (pte_write()) later.
>
> Or does someone care?
>
> Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
> not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
> architecture (write-only)? I don't think so.
>
>
> We have the following options:
>
> 1) pte_wrprotect(): fake that all was read-only.
>
> Either we do it like Dev suggests, or we do it as above early.
>
> The downside is that any code that might later want to know "was
> this possibly writable" would get that information. Well, it wouldn't
> get that information reliably *today* already (and that sounds a bit shaky).
I would vote for this, since if we were to follow the current patch,
the extension to anon folios will make it worse (pte_wrprotect at 5 places
- the 3 additional places being in the if conditions consisting of
folio_dup_swap, arch_unmap_one, folio_try_share_anon_rmap_pte)
The downside being that if we fail in this rmap path, the ptes are all
write-protected. But then the page is already there - the fault is going
to be processed fast.
>
> 2) Tell batching logic to honor pte_write()
>
> Sounds suboptimal for some cases that really don't care in the future.
>
> 3) Tell batching logic to tell us if any pte was writable: FPB_MERGE_WRITE
>
> ... then we know for sure whether any PTE was writable and we could
Well, we don't need this? The problem here is that we are making a decision
on the basis of the writability of the *first* pte of the batch - so if
the first pte is writable, only then we have the problem we have been
talking about.
We could have had a FPB_MERGE_WRPROTECT (which I know, is totally
incompatible with FPB_MERGE_WRITE) - that would tell whether at least one
pte in the patch was non-writable, in which case we will be able to avoid
the restoration of the entire batch to writeprotected if all the ptes
were writable (which I am assuming is the common case). But of course this
is not possible to do with the current shape of folio_pte_batch_flags. We
will have to revert the FPB_MERGE_* stuff to just collect the "at least one
is writable, at least one is dirty, at least one is young, at least one is
non-writable" etc information from the function and let the caller handle
it. That will kill all the work you did in simplifying that function :)
>
> (a) Pass it as we did before around to all checks, like pte_accessible().
>
> (b) Have an explicit restore PTE where we play save.
>
>
> I raised to Dev in private that softdirty handling is also shaky, as we
> batch over that. Meaning that we could lose or gain softdirty PTE bits in
> a batch.
>
> For lazyfree and file folios it doesn't really matter I guess. But it will
> matter once we unlock it for all anon folios.
>
>
> 1) sounds simplest, 3) sounds cleanest long-term.
>
prev parent reply other threads:[~2026-02-25 5:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 11:09 Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43 ` Lorenzo Stoakes
2026-02-24 16:01 ` David Hildenbrand (Arm)
2026-02-25 5:11 ` Dev Jain [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40c4917a-cf50-43f6-8ef0-de5a2c7a638f@arm.com \
--to=dev.jain@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=stable@kernel.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox