* [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
@ 2025-01-23 16:43 Roman Gushchin
2025-01-23 19:42 ` Liam R. Howlett
2025-01-23 22:26 ` Hugh Dickins
0 siblings, 2 replies; 5+ messages in thread
From: Roman Gushchin @ 2025-01-23 16:43 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, Andrew Morton, Roman Gushchin, Jann Horn,
Peter Zijlstra, Will Deacon, Aneesh Kumar K.V, Nick Piggin,
Hugh Dickins, linux-arch
Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
added a forced tlbflush to tlb_vma_end(), which is required to avoid a
race between munmap() and unmap_mapping_range(). However it added some
overhead to other paths where tlb_vma_end() is used, but vmas are not
removed, e.g. madvise(MADV_DONTNEED).
Fix this by moving the tlb flush out of tlb_end_vma() into
free_pgtables(), somewhat similar to the stable version of the
original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush
for PFNMAP mappings before unlink_file_vma()").
Note, that if tlb->fullmm is set, no flush is required, as the whole
mm is about to be destroyed.
---
v3:
- added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
v2:
- moved vma_pfn flag handling into tlb.h (by Peter Z.)
- added comments (by Peter Z.)
- fixed the vma_pfn flag setting (by Hugh D.)
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
---
include/asm-generic/tlb.h | 49 ++++++++++++++++++++++++++++-----------
mm/memory.c | 7 ++++++
2 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 709830274b75..cdc95b69b91d 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -380,8 +380,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->cleared_pmds = 0;
tlb->cleared_puds = 0;
tlb->cleared_p4ds = 0;
+
+ /*
+ * vma_pfn can only be set in tlb_start_vma(), so let's
+ * initialize it here. Also a tlb flush issued by
+ * tlb_flush_mmu_pfnmap() will cancel the vma_pfn state,
+ * so that unnecessary subsequent flushes are avoided.
+ */
+ tlb->vma_pfn = 0;
/*
- * Do not reset mmu_gather::vma_* fields here, we do not
+ * Do not reset other mmu_gather::vma_* fields here, we do not
* call into tlb_start_vma() again to set them if there is an
* intermediate flush.
*/
@@ -449,7 +457,14 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma)
*/
tlb->vma_huge = is_vm_hugetlb_page(vma);
tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
- tlb->vma_pfn = !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP));
+
+ /*
+ * vma_pfn is checked and cleared by tlb_flush_mmu_pfnmap()
+ * for a set of vma's, so it should be set if at least one vma
+ * has VM_PFNMAP or VM_MIXEDMAP flags set.
+ */
+ if (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))
+ tlb->vma_pfn = 1;
}
static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -466,6 +481,20 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
__tlb_reset_range(tlb);
}
+static inline void tlb_flush_mmu_pfnmap(struct mmu_gather *tlb)
+{
+ /*
+ * VM_PFNMAP and VM_MIXEDMAP maps are fragile because the core mm
+ * doesn't track the page mapcount -- there might not be page-frames
+ * for these PFNs after all. Force flush TLBs for such ranges to avoid
+ * munmap() vs unmap_mapping_range() races.
+ * Ensure we have no stale TLB entries by the time this mapping is
+ * removed from the rmap.
+ */
+ if (unlikely(!tlb->fullmm && tlb->vma_pfn))
+ tlb_flush_mmu_tlbonly(tlb);
+}
+
static inline void tlb_remove_page_size(struct mmu_gather *tlb,
struct page *page, int page_size)
{
@@ -549,22 +578,14 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *
static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
{
- if (tlb->fullmm)
+ if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS))
return;
/*
- * VM_PFNMAP is more fragile because the core mm will not track the
- * page mapcount -- there might not be page-frames for these PFNs after
- * all. Force flush TLBs for such ranges to avoid munmap() vs
- * unmap_mapping_range() races.
+ * Do a TLB flush and reset the range at VMA boundaries; this avoids
+ * the ranges growing with the unused space between consecutive VMAs.
*/
- if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) {
- /*
- * Do a TLB flush and reset the range at VMA boundaries; this avoids
- * the ranges growing with the unused space between consecutive VMAs.
- */
- tlb_flush_mmu_tlbonly(tlb);
- }
+ tlb_flush_mmu_tlbonly(tlb);
}
/*
diff --git a/mm/memory.c b/mm/memory.c
index 398c031be9ba..c2a9effb2e32 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -365,6 +365,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
{
struct unlink_vma_file_batch vb;
+ /*
+ * VM_PFNMAP and VM_MIXEDMAP maps require a special handling here:
+ * force flush TLBs for such ranges to avoid munmap() vs
+ * unmap_mapping_range() races.
+ */
+ tlb_flush_mmu_pfnmap(tlb);
+
do {
unsigned long addr = vma->vm_start;
struct vm_area_struct *next;
--
2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
2025-01-23 16:43 [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() Roman Gushchin
@ 2025-01-23 19:42 ` Liam R. Howlett
2025-01-23 21:19 ` Roman Gushchin
2025-01-23 22:26 ` Hugh Dickins
1 sibling, 1 reply; 5+ messages in thread
From: Liam R. Howlett @ 2025-01-23 19:42 UTC (permalink / raw)
To: Roman Gushchin
Cc: linux-kernel, linux-mm, Andrew Morton, Jann Horn, Peter Zijlstra,
Will Deacon, Aneesh Kumar K.V, Nick Piggin, Hugh Dickins,
linux-arch
* Roman Gushchin <roman.gushchin@linux.dev> [250123 11:44]:
> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
> race between munmap() and unmap_mapping_range(). However it added some
> overhead to other paths where tlb_vma_end() is used, but vmas are not
> removed, e.g. madvise(MADV_DONTNEED).
>
> Fix this by moving the tlb flush out of tlb_end_vma() into
> free_pgtables(), somewhat similar to the stable version of the
> original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush
> for PFNMAP mappings before unlink_file_vma()").
>
> Note, that if tlb->fullmm is set, no flush is required, as the whole
> mm is about to be destroyed.
>
> ---
Hugh didn't mean to add a ---, he meant to move the version info between
the Cc list and the patch so that it's not in the git history.
You can find examples on the ML.
>
> v3:
> - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>
> v2:
> - moved vma_pfn flag handling into tlb.h (by Peter Z.)
> - added comments (by Peter Z.)
> - fixed the vma_pfn flag setting (by Hugh D.)
>
> Suggested-by: Jann Horn <jannh@google.com>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: some email discussion url lore.kernel.org..
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
v3...
v2...
> include/asm-generic/tlb.h | 49 ++++++++++++++++++++++++++++-----------
> mm/memory.c | 7 ++++++
> 2 files changed, 42 insertions(+), 14 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 709830274b75..cdc95b69b91d 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -380,8 +380,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
> tlb->cleared_pmds = 0;
> tlb->cleared_puds = 0;
> tlb->cleared_p4ds = 0;
> +
> + /*
> + * vma_pfn can only be set in tlb_start_vma(), so let's
> + * initialize it here. Also a tlb flush issued by
> + * tlb_flush_mmu_pfnmap() will cancel the vma_pfn state,
> + * so that unnecessary subsequent flushes are avoided.
> + */
> + tlb->vma_pfn = 0;
> /*
> - * Do not reset mmu_gather::vma_* fields here, we do not
> + * Do not reset other mmu_gather::vma_* fields here, we do not
> * call into tlb_start_vma() again to set them if there is an
> * intermediate flush.
> */
> @@ -449,7 +457,14 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma)
> */
> tlb->vma_huge = is_vm_hugetlb_page(vma);
> tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> - tlb->vma_pfn = !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP));
> +
> + /*
> + * vma_pfn is checked and cleared by tlb_flush_mmu_pfnmap()
> + * for a set of vma's, so it should be set if at least one vma
> + * has VM_PFNMAP or VM_MIXEDMAP flags set.
> + */
> + if (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))
> + tlb->vma_pfn = 1;
> }
>
> static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> @@ -466,6 +481,20 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> __tlb_reset_range(tlb);
> }
>
> +static inline void tlb_flush_mmu_pfnmap(struct mmu_gather *tlb)
> +{
> + /*
> + * VM_PFNMAP and VM_MIXEDMAP maps are fragile because the core mm
> + * doesn't track the page mapcount -- there might not be page-frames
> + * for these PFNs after all. Force flush TLBs for such ranges to avoid
> + * munmap() vs unmap_mapping_range() races.
> + * Ensure we have no stale TLB entries by the time this mapping is
> + * removed from the rmap.
> + */
> + if (unlikely(!tlb->fullmm && tlb->vma_pfn))
> + tlb_flush_mmu_tlbonly(tlb);
> +}
> +
> static inline void tlb_remove_page_size(struct mmu_gather *tlb,
> struct page *page, int page_size)
> {
> @@ -549,22 +578,14 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *
>
> static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> {
> - if (tlb->fullmm)
> + if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS))
> return;
>
> /*
> - * VM_PFNMAP is more fragile because the core mm will not track the
> - * page mapcount -- there might not be page-frames for these PFNs after
> - * all. Force flush TLBs for such ranges to avoid munmap() vs
> - * unmap_mapping_range() races.
> + * Do a TLB flush and reset the range at VMA boundaries; this avoids
> + * the ranges growing with the unused space between consecutive VMAs.
> */
> - if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) {
> - /*
> - * Do a TLB flush and reset the range at VMA boundaries; this avoids
> - * the ranges growing with the unused space between consecutive VMAs.
> - */
> - tlb_flush_mmu_tlbonly(tlb);
> - }
> + tlb_flush_mmu_tlbonly(tlb);
> }
>
> /*
> diff --git a/mm/memory.c b/mm/memory.c
> index 398c031be9ba..c2a9effb2e32 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -365,6 +365,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> {
> struct unlink_vma_file_batch vb;
>
> + /*
> + * VM_PFNMAP and VM_MIXEDMAP maps require a special handling here:
> + * force flush TLBs for such ranges to avoid munmap() vs
> + * unmap_mapping_range() races.
> + */
> + tlb_flush_mmu_pfnmap(tlb);
> +
> do {
> unsigned long addr = vma->vm_start;
> struct vm_area_struct *next;
> --
> 2.48.1.262.g85cc9f2d1e-goog
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
2025-01-23 19:42 ` Liam R. Howlett
@ 2025-01-23 21:19 ` Roman Gushchin
0 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2025-01-23 21:19 UTC (permalink / raw)
To: Liam R. Howlett, linux-kernel, linux-mm, Andrew Morton,
Jann Horn, Peter Zijlstra, Will Deacon, Aneesh Kumar K.V,
Nick Piggin, Hugh Dickins, linux-arch
On Thu, Jan 23, 2025 at 02:42:56PM -0500, Liam R. Howlett wrote:
> * Roman Gushchin <roman.gushchin@linux.dev> [250123 11:44]:
> > Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
> > added a forced tlbflush to tlb_vma_end(), which is required to avoid a
> > race between munmap() and unmap_mapping_range(). However it added some
> > overhead to other paths where tlb_vma_end() is used, but vmas are not
> > removed, e.g. madvise(MADV_DONTNEED).
> >
> > Fix this by moving the tlb flush out of tlb_end_vma() into
> > free_pgtables(), somewhat similar to the stable version of the
> > original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush
> > for PFNMAP mappings before unlink_file_vma()").
> >
> > Note, that if tlb->fullmm is set, no flush is required, as the whole
> > mm is about to be destroyed.
> >
> > ---
>
> Hugh didn't mean to add a ---, he meant to move the version info between
> the Cc list and the patch so that it's not in the git history.
Ah, ok, somehow it wasn't a problem previously for me, usually Andrew
was dropping the part while merging into the mm tree.
Thanks for clarifying.
>
> You can find examples on the ML.
>
> >
> > v3:
> > - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
> >
> > v2:
> > - moved vma_pfn flag handling into tlb.h (by Peter Z.)
> > - added comments (by Peter Z.)
> > - fixed the vma_pfn flag setting (by Hugh D.)
> >
>
>
> > Suggested-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
>
> Link: some email discussion url lore.kernel.org..
It was based on some non-public discussion unfortunately, so no lore link.
Thank you!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
2025-01-23 16:43 [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() Roman Gushchin
2025-01-23 19:42 ` Liam R. Howlett
@ 2025-01-23 22:26 ` Hugh Dickins
2025-01-27 1:31 ` Hugh Dickins
1 sibling, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2025-01-23 22:26 UTC (permalink / raw)
To: Roman Gushchin
Cc: linux-kernel, linux-mm, Andrew Morton, Jann Horn, Peter Zijlstra,
Will Deacon, Aneesh Kumar K.V, Nick Piggin, Hugh Dickins,
linux-arch
On Thu, 23 Jan 2025, Roman Gushchin wrote:
> Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
> added a forced tlbflush to tlb_vma_end(), which is required to avoid a
> race between munmap() and unmap_mapping_range(). However it added some
> overhead to other paths where tlb_vma_end() is used, but vmas are not
> removed, e.g. madvise(MADV_DONTNEED).
>
> Fix this by moving the tlb flush out of tlb_end_vma() into
> free_pgtables(), somewhat similar to the stable version of the
> original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush
> for PFNMAP mappings before unlink_file_vma()").
>
> Note, that if tlb->fullmm is set, no flush is required, as the whole
> mm is about to be destroyed.
>
> ---
As Liam said, thanks.
>
> v3:
> - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
>
> v2:
> - moved vma_pfn flag handling into tlb.h (by Peter Z.)
> - added comments (by Peter Z.)
> - fixed the vma_pfn flag setting (by Hugh D.)
>
> Suggested-by: Jann Horn <jannh@google.com>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> ---
> include/asm-generic/tlb.h | 49 ++++++++++++++++++++++++++++-----------
> mm/memory.c | 7 ++++++
> 2 files changed, 42 insertions(+), 14 deletions(-)
The code looks right to me now, but not the comments (I usually
prefer no comment to a wrong or difficult to get right comment).
Except when I try to write a simple enough correct comment,
I find the code has to be changed, and that then suggests
further changes... Sigh.
(We could also go down a path of saying that all of the vma_pfn stuff
would be better under #fidef CONFIG_MMU_GATHER_MERGE_VMAS; but I
think we shall only confuse ourselves that way - it shouldn't be
enough to matter, so long as it does not add any extra TLB flushes.)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 709830274b75..cdc95b69b91d 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -380,8 +380,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
> tlb->cleared_pmds = 0;
> tlb->cleared_puds = 0;
> tlb->cleared_p4ds = 0;
> +
> + /*
> + * vma_pfn can only be set in tlb_start_vma(), so let's
> + * initialize it here. Also a tlb flush issued by
> + * tlb_flush_mmu_pfnmap() will cancel the vma_pfn state,
> + * so that unnecessary subsequent flushes are avoided.
No, that misses the point (or misses half of the point): the
tlb_flush_mmu_pfnmap() itself won't need to flush if for other reasons
we've done a TLB flush since the last VM_PFNMAP|VM_MIXEDMAP vma.
What I want to write is:
* vma_pfn can only be set in tlb_start_vma(), so initialize it here.
* And then any call to tlb_flush_mmu_tlbonly() will reset it again,
* so that unnecessary subsequent flushes are avoided.
But once I look at tlb_flush_mmu_tlbonly(), I'm reminded that actually
it does nothing, if none of cleared_ptes etc. is set: so would not reset
vma_pfn in that case; which is okay-ish, but makes writing the comment hard.
So maybe tlb_flush_mmu_tlbonly() should do an explicit "tlb->vma_pfn = 0"
before returning early; but that then raises the question of whether it
would not be better just to initialize vma_pfn to 0 in __tlb_gather_mmu(),
not touch it in __tlb_reset_range(), but reset it to 0 at the start of
tlb_flush_mmu_tlbonly().
But it also makes me realize that tlb_flush_mmu_tlbonly() avoiding
__tlb_reset_range() when nothing was cleared, is not all that good:
because if flushing a small range is better than flushing a large range,
then it would be good to reset the range even when nothing was cleared
(though it looks stupid to reset all the fields just found 0 already).
Arrgh. This is not what you wnat to hear, to get your slowdown fix in.
Simplest just to ignore the existing range deficiency, I suppose. But
where to reset vma_pfn? I'm torn, have to let you and others decide.
Hugh
> + */
> + tlb->vma_pfn = 0;
> /*
> - * Do not reset mmu_gather::vma_* fields here, we do not
> + * Do not reset other mmu_gather::vma_* fields here, we do not
> * call into tlb_start_vma() again to set them if there is an
> * intermediate flush.
> */
> @@ -449,7 +457,14 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma)
> */
> tlb->vma_huge = is_vm_hugetlb_page(vma);
> tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
> - tlb->vma_pfn = !!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP));
> +
> + /*
> + * vma_pfn is checked and cleared by tlb_flush_mmu_pfnmap()
> + * for a set of vma's, so it should be set if at least one vma
> + * has VM_PFNMAP or VM_MIXEDMAP flags set.
> + */
> + if (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))
> + tlb->vma_pfn = 1;
> }
>
> static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> @@ -466,6 +481,20 @@ static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
> __tlb_reset_range(tlb);
> }
>
> +static inline void tlb_flush_mmu_pfnmap(struct mmu_gather *tlb)
> +{
> + /*
> + * VM_PFNMAP and VM_MIXEDMAP maps are fragile because the core mm
> + * doesn't track the page mapcount -- there might not be page-frames
> + * for these PFNs after all. Force flush TLBs for such ranges to avoid
> + * munmap() vs unmap_mapping_range() races.
> + * Ensure we have no stale TLB entries by the time this mapping is
> + * removed from the rmap.
> + */
> + if (unlikely(!tlb->fullmm && tlb->vma_pfn))
> + tlb_flush_mmu_tlbonly(tlb);
> +}
> +
> static inline void tlb_remove_page_size(struct mmu_gather *tlb,
> struct page *page, int page_size)
> {
> @@ -549,22 +578,14 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *
>
> static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
> {
> - if (tlb->fullmm)
> + if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS))
> return;
>
> /*
> - * VM_PFNMAP is more fragile because the core mm will not track the
> - * page mapcount -- there might not be page-frames for these PFNs after
> - * all. Force flush TLBs for such ranges to avoid munmap() vs
> - * unmap_mapping_range() races.
> + * Do a TLB flush and reset the range at VMA boundaries; this avoids
> + * the ranges growing with the unused space between consecutive VMAs.
> */
> - if (tlb->vma_pfn || !IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) {
> - /*
> - * Do a TLB flush and reset the range at VMA boundaries; this avoids
> - * the ranges growing with the unused space between consecutive VMAs.
> - */
> - tlb_flush_mmu_tlbonly(tlb);
> - }
> + tlb_flush_mmu_tlbonly(tlb);
> }
>
> /*
> diff --git a/mm/memory.c b/mm/memory.c
> index 398c031be9ba..c2a9effb2e32 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -365,6 +365,13 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> {
> struct unlink_vma_file_batch vb;
>
> + /*
> + * VM_PFNMAP and VM_MIXEDMAP maps require a special handling here:
> + * force flush TLBs for such ranges to avoid munmap() vs
> + * unmap_mapping_range() races.
> + */
> + tlb_flush_mmu_pfnmap(tlb);
> +
> do {
> unsigned long addr = vma->vm_start;
> struct vm_area_struct *next;
> --
> 2.48.1.262.g85cc9f2d1e-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
2025-01-23 22:26 ` Hugh Dickins
@ 2025-01-27 1:31 ` Hugh Dickins
0 siblings, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2025-01-27 1:31 UTC (permalink / raw)
To: Roman Gushchin
Cc: linux-kernel, linux-mm, Hugh Dickins, Andrew Morton, Jann Horn,
Peter Zijlstra, Will Deacon, Aneesh Kumar K.V, Nick Piggin,
linux-arch
On Thu, 23 Jan 2025, Hugh Dickins wrote:
> On Thu, 23 Jan 2025, Roman Gushchin wrote:
>
> > Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
> > added a forced tlbflush to tlb_vma_end(), which is required to avoid a
> > race between munmap() and unmap_mapping_range(). However it added some
> > overhead to other paths where tlb_vma_end() is used, but vmas are not
> > removed, e.g. madvise(MADV_DONTNEED).
> >
> > Fix this by moving the tlb flush out of tlb_end_vma() into
> > free_pgtables(), somewhat similar to the stable version of the
> > original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush
> > for PFNMAP mappings before unlink_file_vma()").
> >
> > Note, that if tlb->fullmm is set, no flush is required, as the whole
> > mm is about to be destroyed.
> >
> > ---
>
> As Liam said, thanks.
>
> >
> > v3:
> > - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.)
> >
> > v2:
> > - moved vma_pfn flag handling into tlb.h (by Peter Z.)
> > - added comments (by Peter Z.)
> > - fixed the vma_pfn flag setting (by Hugh D.)
> >
> > Suggested-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Nick Piggin <npiggin@gmail.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: linux-arch@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > ---
> > include/asm-generic/tlb.h | 49 ++++++++++++++++++++++++++++-----------
> > mm/memory.c | 7 ++++++
> > 2 files changed, 42 insertions(+), 14 deletions(-)
>
I had quite a wobble on Friday, couldn't be sure of anything at all.
But I've now spent longer, quietly thinking about this (v3) patch,
and the various races; and with Jann's help, I do feel much more
confident about it all today.
> The code looks right to me now, but not the comments (I usually
> prefer no comment to a wrong or difficult to get right comment).
Yes, the code does look right to me now. And although I can still quibble
about the comments, I'd better not waste your time with that. Let me say
Acked-by: Hugh Dickins <hughd@google.com>
while recognizing that this may not be the patch which goes into
the tree, since Peter has other ideas on the naming and wording.
>
> Except when I try to write a simple enough correct comment,
> I find the code has to be changed, and that then suggests
> further changes... Sigh.
>
> (We could also go down a path of saying that all of the vma_pfn stuff
> would be better under #fidef CONFIG_MMU_GATHER_MERGE_VMAS; but I
> think we shall only confuse ourselves that way - it shouldn't be
> enough to matter, so long as it does not add any extra TLB flushes.)
>
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index 709830274b75..cdc95b69b91d 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -380,8 +380,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
> > tlb->cleared_pmds = 0;
> > tlb->cleared_puds = 0;
> > tlb->cleared_p4ds = 0;
> > +
> > + /*
> > + * vma_pfn can only be set in tlb_start_vma(), so let's
> > + * initialize it here. Also a tlb flush issued by
> > + * tlb_flush_mmu_pfnmap() will cancel the vma_pfn state,
> > + * so that unnecessary subsequent flushes are avoided.
>
> No, that misses the point (or misses half of the point): the
> tlb_flush_mmu_pfnmap() itself won't need to flush if for other reasons
> we've done a TLB flush since the last VM_PFNMAP|VM_MIXEDMAP vma.
>
> What I want to write is:
> * vma_pfn can only be set in tlb_start_vma(), so initialize it here.
> * And then any call to tlb_flush_mmu_tlbonly() will reset it again,
> * so that unnecessary subsequent flushes are avoided.
>
> But once I look at tlb_flush_mmu_tlbonly(), I'm reminded that actually
> it does nothing, if none of cleared_ptes etc. is set: so would not reset
> vma_pfn in that case; which is okay-ish, but makes writing the comment hard.
>
> So maybe tlb_flush_mmu_tlbonly() should do an explicit "tlb->vma_pfn = 0"
> before returning early; but that then raises the question of whether it
> would not be better just to initialize vma_pfn to 0 in __tlb_gather_mmu(),
> not touch it in __tlb_reset_range(), but reset it to 0 at the start of
> tlb_flush_mmu_tlbonly().
>
> But it also makes me realize that tlb_flush_mmu_tlbonly() avoiding
> __tlb_reset_range() when nothing was cleared, is not all that good:
> because if flushing a small range is better than flushing a large range,
> then it would be good to reset the range even when nothing was cleared
> (though it looks stupid to reset all the fields just found 0 already).
My paragraph (about the existing code, independent of your patch) looks
nonsense to me now: if there was nothing to be cleared, then the range
would not have been updated, so would not benefit from being reset.
It's still true that there would sometimes be an optimization in setting
"tlb->vma_pfn = 0" in every tlb_flush_mmu_tlbonly(); but it's merely an
optimization, for an unusual case, which you may find demands yet more
thought than it deserves; my guess is that you will prefer not to add
that change, and that's fine by me.
So, if you did respin and change the comment (but I don't insist), maybe:
* vma_pfn can only be set in tlb_start_vma(), so initialize it here.
* And then it will be reset again after any call to tlb_flush(),
* so that unnecessary subsequent flushes are avoided.
Hugh
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-27 1:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-23 16:43 [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() Roman Gushchin
2025-01-23 19:42 ` Liam R. Howlett
2025-01-23 21:19 ` Roman Gushchin
2025-01-23 22:26 ` Hugh Dickins
2025-01-27 1:31 ` Hugh Dickins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox