From: David Hildenbrand <david@redhat.com>
To: Dev Jain <dev.jain@arm.com>, akpm@linux-foundation.org
Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Lokesh Gidra <lokeshgidra@google.com>
Subject: Re: [PATCH] mm: move rmap of mTHP upon CoW reuse
Date: Thu, 25 Sep 2025 11:16:22 +0200 [thread overview]
Message-ID: <072b8684-47fe-4a2a-bf69-f6d348f0489b@redhat.com> (raw)
In-Reply-To: <20250925085429.41607-1-dev.jain@arm.com>
On 25.09.25 10:54, Dev Jain wrote:
> At wp-fault time, when we find that a folio is exclusively mapped, we move
> folio->mapping to the faulting VMA's anon_vma, so that rmap overhead
> reduces. This is currently done for small folios (base pages) and
> PMD-mapped THPs. Do this for mTHP too.
I deliberately didn't add this back then because I was not able to
convince myself easily that it is ok in all corner cases. So this needs
some thought.
We know that the folio is exclusively mapped to a single MM and that
there are no unexpected references from others (GUP pins, whatsoever).
But a large folio might be
(a) mapped into multiple VMAs (e.g., partial mprotect()) in the same MM
(b) mapped into multiple page tables (e.g., mremap()) in the same MM
Regarding (a), are we 100% sure that "vma->anon_vma" will be the same
for all VMAs? I would hope so, because we can only end up this way by
splitting a VMA that had an origin_vma->anon_vma.
Once scenario I was concerned about is VM_DONTCOPY, where we don't end
up calling anon_vma_fork() for a VMA (but for another split one maybe).
But likely that case is fine, because we don't actually copy the PTEs in
the other case.
Regarding (b), we could have a page table walker walk over the folio
(possibly inspecting folio->mapping) through a different page table.
I think the problem I foresaw with that was regarding RMAP walks that
don't hold the folio lock: that problem might be solved with [1]. Not
sure if there is anybody else depending on folio->mapping not changing
while holding the PTL.
[1]
https://lkml.kernel.org/r/20250918055135.2881413-2-lokeshgidra@google.com
Regarding (b), I think I was also concerned about concurrent fork() at
some point where a single page table lock would not be sufficient, but
that cannot happen while we are processing a page fault, not even with
the VMA lock held (we fixed this at some point).
If you take a look at dup_mmap(), we have this:
for_each_vma(vmi, mpnt) {
...
vma_start_write(mpnt);
So we allow concurrent page faults for non-processed VMAs IIRC. Maybe
that's a problem, maybe not. (my intuition told me: it's a problem). To
handle that, if required, we would just write-lock all VMAs upfront,
before doing any copying. (I suggested that in the past, I think there
is some smaller overhead involved with iterating VMAs twice).
Long story short: you should do all of that investigation and understand
why it is okay and document it in a patch. :)
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> mm-selftests pass.
>
> mm/memory.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 7e32eb79ba99..ec04d2cec6b1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4014,6 +4014,11 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
> * an additional folio reference and never ended up here.
> */
> exclusive = true;
> +
> + if (folio_trylock(folio)) {
We should likely do that only if the folio->mapping is not already
properly set up, not for each reuse of a page.
> + folio_move_anon_rmap(folio, vma);
> + folio_unlock(folio);
> + }
> unlock:
> folio_unlock_large_mapcount(folio);
> return exclusive;
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-09-25 9:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-25 8:54 Dev Jain
2025-09-25 9:16 ` David Hildenbrand [this message]
2025-09-25 10:33 ` Dev Jain
2025-09-25 10:37 ` David Hildenbrand
2025-09-25 10:50 ` Dev Jain
2025-09-25 10:57 ` David Hildenbrand
2025-09-25 9:31 ` Kiryl Shutsemau
2025-09-25 9:33 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=072b8684-47fe-4a2a-bf69-f6d348f0489b@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dev.jain@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox