Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, riel@surriel.com,
	Liam.Howlett@oracle.com, vbabka@kernel.org, harry.yoo@oracle.com,
	jannh@google.com, baohua@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable <stable@kernel.org>
Subject: Re: [PATCH] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Tue, 24 Feb 2026 17:01:50 +0100	[thread overview]
Message-ID: <36e676b4-dc6f-45f7-b885-8685227ac6a8@kernel.org> (raw)
In-Reply-To: <61161337-0d0b-4597-aad6-b5a1aa1ad41f@lucifer.local>

On 2/24/26 12:43, Lorenzo Stoakes wrote:
> On Tue, Feb 24, 2026 at 11:31:24AM +0000, Lorenzo Stoakes wrote:
>> Thanks Dev.
>>
>> Andrew - why was commit 354dffd29575 ("mm: support batched unmap for lazyfree
>> large folios during reclamation") merged?
>>
>> It had enormous amounts of review commentary at
>> https://lore.kernel.org/all/146b4cb1-aa1e-4519-9e03-f98cfb1135d2@redhat.com/ and
>> no tags, this should be a signal to wait for a respin _at least_, and really if
>> late in cycle suggests it should wait a cycle.
>>
>> I've said going forward I'm going to check THP series for tags and if not
>> present NAK if they hit mm-stable, I guess I'll extend that to rmap also.
> 
> Sorry I misread the original mail rushing through this is old... so this is less
> pressing than I thought (for some reason I thought it was merged last cycle...!)
> but it's a good example of how stuff can go unnoticed for a while.
> 
> In that case maybe a revert is a bit much and we just want the simplest possible
> fix for backporting.

Dev volunteered to un-messify some of the stuff here. In particular, to
extend batching to all cases, not just some hand-selected ones.

Support for file folios is on the way.

> 
> But is the proposed 'just assume wrprotect' sensible? David?

In general, I think so. If PTEs were writable, they certainly have
PAE set. The write-fault handler can fully recover from that (as PAE is
set). If it's ever a performance problem (doubt), we can revisit.

I'm wondering whether we should just perform the wrprotect earlier:

diff --git a/mm/rmap.c b/mm/rmap.c
index 0f00570d1b9e..19b875ee3fad 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2150,6 +2150,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,

                        /* Nuke the page table entry. */
                        pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
+
+                       /*
+                        * Our batch might include writable and read-only
+                        * PTEs. When we have to restore the mapping, just
+                        * assume read-only to not accidentally upgrade
+                        * write permissions for PTEs that must not be
+                        * writable.
+                        */
+                       pteval = pte_wrprotect(pteval);
+
                        /*
                         * We clear the PTE but do not flush so potentially
                         * a remote CPU could still be writing to the folio

Given that nobody asks for writability (pte_write()) later.

Or does someone care?

Staring at set_tlb_ubc_flush_pending()->pte_accessible() I am
not 100% sure. Could pte_wrprotect() turn a PTE inaccessible on some
architecture (write-only)? I don't think so.

We have the following options:

1) pte_wrprotect(): fake that all was read-only.

Either we do it like Dev suggests, or we do it as above early.

The downside is that any code that might later want to know "was
this possibly writable" would get that information. Well, it wouldn't
get that information reliably *today* already (and that sounds a bit shaky).

2) Tell batching logic to honor pte_write()

Sounds suboptimal for some cases that really don't care in the future.

3) Tell batching logic to tell us if any pte was writable: FPB_MERGE_WRITE

... then we know for sure whether any PTE was writable and we could

(a) Pass it as we did before around to all checks, like pte_accessible().

(b) Have an explicit restore PTE where we play save.

I raised to Dev in private that softdirty handling is also shaky, as we
batch over that. Meaning that we could lose or gain softdirty PTE bits in
a batch.

For lazyfree and file folios it doesn't really matter I guess. But it will
matter once we unlock it for all anon folios.

1) sounds simplest, 3) sounds cleanest long-term.

-- 
Cheers,

David

     prev parent reply	other threads:[~2026-02-24 16:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 11:09 Dev Jain
2026-02-24 11:31 ` Lorenzo Stoakes
2026-02-24 11:43   ` Lorenzo Stoakes
2026-02-24 16:01     ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36e676b4-dc6f-45f7-b885-8685227ac6a8@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=riel@surriel.com \
    --cc=stable@kernel.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox