* [PATCH] mm: move rmap of mTHP upon CoW reuse @ 2025-09-25 8:54 Dev Jain 2025-09-25 9:16 ` David Hildenbrand 2025-09-25 9:31 ` Kiryl Shutsemau 0 siblings, 2 replies; 8+ messages in thread From: Dev Jain @ 2025-09-25 8:54 UTC (permalink / raw) To: akpm, david Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel, Dev Jain At wp-fault time, when we find that a folio is exclusively mapped, we move folio->mapping to the faulting VMA's anon_vma, so that rmap overhead reduces. This is currently done for small folios (base pages) and PMD-mapped THPs. Do this for mTHP too. Signed-off-by: Dev Jain <dev.jain@arm.com> --- mm-selftests pass. mm/memory.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 7e32eb79ba99..ec04d2cec6b1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4014,6 +4014,11 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, * an additional folio reference and never ended up here. */ exclusive = true; + + if (folio_trylock(folio)) { + folio_move_anon_rmap(folio, vma); + folio_unlock(folio); + } unlock: folio_unlock_large_mapcount(folio); return exclusive; -- 2.30.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 8:54 [PATCH] mm: move rmap of mTHP upon CoW reuse Dev Jain @ 2025-09-25 9:16 ` David Hildenbrand 2025-09-25 10:33 ` Dev Jain 2025-09-25 9:31 ` Kiryl Shutsemau 1 sibling, 1 reply; 8+ messages in thread From: David Hildenbrand @ 2025-09-25 9:16 UTC (permalink / raw) To: Dev Jain, akpm Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel, Lokesh Gidra On 25.09.25 10:54, Dev Jain wrote: > At wp-fault time, when we find that a folio is exclusively mapped, we move > folio->mapping to the faulting VMA's anon_vma, so that rmap overhead > reduces. This is currently done for small folios (base pages) and > PMD-mapped THPs. Do this for mTHP too. I deliberately didn't add this back then because I was not able to convince myself easily that it is ok in all corner cases. So this needs some thought. We know that the folio is exclusively mapped to a single MM and that there are no unexpected references from others (GUP pins, whatsoever). But a large folio might be (a) mapped into multiple VMAs (e.g., partial mprotect()) in the same MM (b) mapped into multiple page tables (e.g., mremap()) in the same MM Regarding (a), are we 100% sure that "vma->anon_vma" will be the same for all VMAs? I would hope so, because we can only end up this way by splitting a VMA that had an origin_vma->anon_vma. Once scenario I was concerned about is VM_DONTCOPY, where we don't end up calling anon_vma_fork() for a VMA (but for another split one maybe). But likely that case is fine, because we don't actually copy the PTEs in the other case. Regarding (b), we could have a page table walker walk over the folio (possibly inspecting folio->mapping) through a different page table. I think the problem I foresaw with that was regarding RMAP walks that don't hold the folio lock: that problem might be solved with [1]. Not sure if there is anybody else depending on folio->mapping not changing while holding the PTL. [1] https://lkml.kernel.org/r/20250918055135.2881413-2-lokeshgidra@google.com Regarding (b), I think I was also concerned about concurrent fork() at some point where a single page table lock would not be sufficient, but that cannot happen while we are processing a page fault, not even with the VMA lock held (we fixed this at some point). If you take a look at dup_mmap(), we have this: for_each_vma(vmi, mpnt) { ... vma_start_write(mpnt); So we allow concurrent page faults for non-processed VMAs IIRC. Maybe that's a problem, maybe not. (my intuition told me: it's a problem). To handle that, if required, we would just write-lock all VMAs upfront, before doing any copying. (I suggested that in the past, I think there is some smaller overhead involved with iterating VMAs twice). Long story short: you should do all of that investigation and understand why it is okay and document it in a patch. :) > > Signed-off-by: Dev Jain <dev.jain@arm.com> > --- > mm-selftests pass. > > mm/memory.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 7e32eb79ba99..ec04d2cec6b1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4014,6 +4014,11 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, > * an additional folio reference and never ended up here. > */ > exclusive = true; > + > + if (folio_trylock(folio)) { We should likely do that only if the folio->mapping is not already properly set up, not for each reuse of a page. > + folio_move_anon_rmap(folio, vma); > + folio_unlock(folio); > + } > unlock: > folio_unlock_large_mapcount(folio); > return exclusive; -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 9:16 ` David Hildenbrand @ 2025-09-25 10:33 ` Dev Jain 2025-09-25 10:37 ` David Hildenbrand 0 siblings, 1 reply; 8+ messages in thread From: Dev Jain @ 2025-09-25 10:33 UTC (permalink / raw) To: David Hildenbrand, akpm Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel, Lokesh Gidra On 25/09/25 2:46 pm, David Hildenbrand wrote: > On 25.09.25 10:54, Dev Jain wrote: >> At wp-fault time, when we find that a folio is exclusively mapped, we >> move >> folio->mapping to the faulting VMA's anon_vma, so that rmap overhead >> reduces. This is currently done for small folios (base pages) and >> PMD-mapped THPs. Do this for mTHP too. > > I deliberately didn't add this back then because I was not able to > convince myself easily that it is ok in all corner cases. So this > needs some thought. Thanks for your detailed reply. > > > We know that the folio is exclusively mapped to a single MM and that > there are no unexpected references from others (GUP pins, whatsoever). > > But a large folio might be > > (a) mapped into multiple VMAs (e.g., partial mprotect()) in the same MM I think we have the same problem then for PMD-THPs? I see that vma_adjust_trans_huge() only does a PMD split and not folio split. > [--- snip ---] > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 10:33 ` Dev Jain @ 2025-09-25 10:37 ` David Hildenbrand 2025-09-25 10:50 ` Dev Jain 0 siblings, 1 reply; 8+ messages in thread From: David Hildenbrand @ 2025-09-25 10:37 UTC (permalink / raw) To: Dev Jain, akpm Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel, Lokesh Gidra On 25.09.25 12:33, Dev Jain wrote: > > On 25/09/25 2:46 pm, David Hildenbrand wrote: >> On 25.09.25 10:54, Dev Jain wrote: >>> At wp-fault time, when we find that a folio is exclusively mapped, we >>> move >>> folio->mapping to the faulting VMA's anon_vma, so that rmap overhead >>> reduces. This is currently done for small folios (base pages) and >>> PMD-mapped THPs. Do this for mTHP too. >> >> I deliberately didn't add this back then because I was not able to >> convince myself easily that it is ok in all corner cases. So this >> needs some thought. > > Thanks for your detailed reply. > > >> >> >> We know that the folio is exclusively mapped to a single MM and that >> there are no unexpected references from others (GUP pins, whatsoever). >> >> But a large folio might be >> >> (a) mapped into multiple VMAs (e.g., partial mprotect()) in the same MM > > I think we have the same problem then for PMD-THPs? I see that > vma_adjust_trans_huge() only does a PMD split and not folio split. Sure, we can end up in this reuse function here for any large anon folio, including PMD ones after a PMD->PTE remapping. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 10:37 ` David Hildenbrand @ 2025-09-25 10:50 ` Dev Jain 2025-09-25 10:57 ` David Hildenbrand 0 siblings, 1 reply; 8+ messages in thread From: Dev Jain @ 2025-09-25 10:50 UTC (permalink / raw) To: David Hildenbrand, akpm Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel, Lokesh Gidra On 25/09/25 4:07 pm, David Hildenbrand wrote: > On 25.09.25 12:33, Dev Jain wrote: >> >> On 25/09/25 2:46 pm, David Hildenbrand wrote: >>> On 25.09.25 10:54, Dev Jain wrote: >>>> At wp-fault time, when we find that a folio is exclusively mapped, we >>>> move >>>> folio->mapping to the faulting VMA's anon_vma, so that rmap overhead >>>> reduces. This is currently done for small folios (base pages) and >>>> PMD-mapped THPs. Do this for mTHP too. >>> >>> I deliberately didn't add this back then because I was not able to >>> convince myself easily that it is ok in all corner cases. So this >>> needs some thought. >> >> Thanks for your detailed reply. >> >> >>> >>> >>> We know that the folio is exclusively mapped to a single MM and that >>> there are no unexpected references from others (GUP pins, whatsoever). >>> >>> But a large folio might be >>> >>> (a) mapped into multiple VMAs (e.g., partial mprotect()) in the same MM >> >> I think we have the same problem then for PMD-THPs? I see that >> vma_adjust_trans_huge() only does a PMD split and not folio split. > > Sure, we can end up in this reuse function here for any large anon > folio, including PMD ones after a PMD->PTE remapping. Ah alright, I was thinking that something may go wrong through folio_move_anon_rmap() in do_huge_pmd_wp_page, but that case will *not* have the PMD split guaranteeing that it lies in the same VMA. Interesting. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 10:50 ` Dev Jain @ 2025-09-25 10:57 ` David Hildenbrand 0 siblings, 0 replies; 8+ messages in thread From: David Hildenbrand @ 2025-09-25 10:57 UTC (permalink / raw) To: Dev Jain, akpm Cc: lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel, Lokesh Gidra On 25.09.25 12:50, Dev Jain wrote: > > On 25/09/25 4:07 pm, David Hildenbrand wrote: >> On 25.09.25 12:33, Dev Jain wrote: >>> >>> On 25/09/25 2:46 pm, David Hildenbrand wrote: >>>> On 25.09.25 10:54, Dev Jain wrote: >>>>> At wp-fault time, when we find that a folio is exclusively mapped, we >>>>> move >>>>> folio->mapping to the faulting VMA's anon_vma, so that rmap overhead >>>>> reduces. This is currently done for small folios (base pages) and >>>>> PMD-mapped THPs. Do this for mTHP too. >>>> >>>> I deliberately didn't add this back then because I was not able to >>>> convince myself easily that it is ok in all corner cases. So this >>>> needs some thought. >>> >>> Thanks for your detailed reply. >>> >>> >>>> >>>> >>>> We know that the folio is exclusively mapped to a single MM and that >>>> there are no unexpected references from others (GUP pins, whatsoever). >>>> >>>> But a large folio might be >>>> >>>> (a) mapped into multiple VMAs (e.g., partial mprotect()) in the same MM >>> >>> I think we have the same problem then for PMD-THPs? I see that >>> vma_adjust_trans_huge() only does a PMD split and not folio split. >> >> Sure, we can end up in this reuse function here for any large anon >> folio, including PMD ones after a PMD->PTE remapping. > > Ah alright, I was thinking that something may go wrong through > folio_move_anon_rmap() in do_huge_pmd_wp_page, but Right, there we have sine VMA and single PTL. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 8:54 [PATCH] mm: move rmap of mTHP upon CoW reuse Dev Jain 2025-09-25 9:16 ` David Hildenbrand @ 2025-09-25 9:31 ` Kiryl Shutsemau 2025-09-25 9:33 ` David Hildenbrand 1 sibling, 1 reply; 8+ messages in thread From: Kiryl Shutsemau @ 2025-09-25 9:31 UTC (permalink / raw) To: Dev Jain Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel On Thu, Sep 25, 2025 at 02:24:29PM +0530, Dev Jain wrote: > At wp-fault time, when we find that a folio is exclusively mapped, we move > folio->mapping to the faulting VMA's anon_vma, so that rmap overhead > reduces. This is currently done for small folios (base pages) and > PMD-mapped THPs. Do this for mTHP too. > > Signed-off-by: Dev Jain <dev.jain@arm.com> > --- > mm-selftests pass. > > mm/memory.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index 7e32eb79ba99..ec04d2cec6b1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4014,6 +4014,11 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, > * an additional folio reference and never ended up here. > */ > exclusive = true; > + > + if (folio_trylock(folio)) { > + folio_move_anon_rmap(folio, vma); > + folio_unlock(folio); > + } Maybe take the folio lock earlier in wp_can_reuse_anon_folio() to cover large folio handling too and avoid trylock here. Something like this (untest): diff --git a/mm/memory.c b/mm/memory.c index 812a7d9f6531..d95cf670b6a8 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3843,6 +3843,7 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, * an additional folio reference and never ended up here. */ exclusive = true; + folio_move_anon_rmap(folio, vma); unlock: folio_unlock_large_mapcount(folio); return exclusive; @@ -3858,8 +3859,15 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, static bool wp_can_reuse_anon_folio(struct folio *folio, struct vm_area_struct *vma) { - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio)) - return __wp_can_reuse_large_anon_folio(folio, vma); + bool exclusive = false; + + if (!folio_trylock(folio)) + return false; + + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio)) { + exclusive = __wp_can_reuse_large_anon_folio(folio, vma); + goto unlock; + } /* * We have to verify under folio lock: these early checks are @@ -3869,7 +3877,8 @@ static bool wp_can_reuse_anon_folio(struct folio *folio, * KSM doesn't necessarily raise the folio refcount. */ if (folio_test_ksm(folio) || folio_ref_count(folio) > 3) - return false; + goto unlock; + if (!folio_test_lru(folio)) /* * We cannot easily detect+handle references from @@ -3877,23 +3886,23 @@ static bool wp_can_reuse_anon_folio(struct folio *folio, */ lru_add_drain(); if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio)) - return false; - if (!folio_trylock(folio)) - return false; + goto unlock; + if (folio_test_swapcache(folio)) folio_free_swap(folio); - if (folio_test_ksm(folio) || folio_ref_count(folio) != 1) { - folio_unlock(folio); - return false; - } + if (folio_test_ksm(folio) || folio_ref_count(folio) != 1) + goto unlock; + /* * Ok, we've got the only folio reference from our mapping * and the folio is locked, it's dark out, and we're wearing * sunglasses. Hit it. */ folio_move_anon_rmap(folio, vma); + exclusive = true; +unlock: folio_unlock(folio); - return true; + return ret; } /* -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: move rmap of mTHP upon CoW reuse 2025-09-25 9:31 ` Kiryl Shutsemau @ 2025-09-25 9:33 ` David Hildenbrand 0 siblings, 0 replies; 8+ messages in thread From: David Hildenbrand @ 2025-09-25 9:33 UTC (permalink / raw) To: Kiryl Shutsemau, Dev Jain Cc: akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko, linux-mm, linux-kernel On 25.09.25 11:31, Kiryl Shutsemau wrote: > On Thu, Sep 25, 2025 at 02:24:29PM +0530, Dev Jain wrote: >> At wp-fault time, when we find that a folio is exclusively mapped, we move >> folio->mapping to the faulting VMA's anon_vma, so that rmap overhead >> reduces. This is currently done for small folios (base pages) and >> PMD-mapped THPs. Do this for mTHP too. >> >> Signed-off-by: Dev Jain <dev.jain@arm.com> >> --- >> mm-selftests pass. >> >> mm/memory.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index 7e32eb79ba99..ec04d2cec6b1 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4014,6 +4014,11 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, >> * an additional folio reference and never ended up here. >> */ >> exclusive = true; >> + >> + if (folio_trylock(folio)) { >> + folio_move_anon_rmap(folio, vma); >> + folio_unlock(folio); >> + } > > Maybe take the folio lock earlier in wp_can_reuse_anon_folio() to cover > large folio handling too and avoid trylock here. > > Something like this (untest): > > diff --git a/mm/memory.c b/mm/memory.c > index 812a7d9f6531..d95cf670b6a8 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3843,6 +3843,7 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, > * an additional folio reference and never ended up here. > */ > exclusive = true; > + folio_move_anon_rmap(folio, vma); > unlock: > folio_unlock_large_mapcount(folio); > return exclusive; > @@ -3858,8 +3859,15 @@ static bool __wp_can_reuse_large_anon_folio(struct folio *folio, > static bool wp_can_reuse_anon_folio(struct folio *folio, > struct vm_area_struct *vma) > { > - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio)) > - return __wp_can_reuse_large_anon_folio(folio, vma); > + bool exclusive = false; > + > + if (!folio_trylock(folio)) > + return false; No, there is no need for that. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-25 10:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-09-25 8:54 [PATCH] mm: move rmap of mTHP upon CoW reuse Dev Jain 2025-09-25 9:16 ` David Hildenbrand 2025-09-25 10:33 ` Dev Jain 2025-09-25 10:37 ` David Hildenbrand 2025-09-25 10:50 ` Dev Jain 2025-09-25 10:57 ` David Hildenbrand 2025-09-25 9:31 ` Kiryl Shutsemau 2025-09-25 9:33 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox