From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
maple-tree@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>,
Charan Teja Kalla <quic_charante@quicinc.com>,
shikemeng@huaweicloud.com, kasong@tencent.com, nphamcs@gmail.com,
bhe@redhat.com, baohua@kernel.org, chrisl@kernel.org,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v1 4/9] mm/memory: Add tree limit to free_pgtables()
Date: Tue, 13 Jan 2026 14:48:52 -0500 [thread overview]
Message-ID: <nns65h74umsr3ahzx2td4pxbye26ootx7nibkhihxiczoyxodz@k4z7jrjr72ez> (raw)
In-Reply-To: <e44e1da5-7ee4-478a-bcc5-7726f3b52d9f@lucifer.local>
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [250911 05:08]:
> On Tue, Sep 09, 2025 at 03:09:40PM -0400, Liam R. Howlett wrote:
> > The ceiling and tree search limit need to be different arguments for the
> > future change in the failed fork attempt.
> >
> > Add some documentation around free_pgtables() and the limits in an
> > attempt to clarify the floor and ceiling use as well as the new
> > tree_max.
> >
> > Test code also updated.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
> LGTM other than the nits below, so with those addressed feel free to add:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> > ---
> > mm/internal.h | 4 +++-
> > mm/memory.c | 28 +++++++++++++++++++++++++---
> > mm/mmap.c | 2 +-
> > mm/vma.c | 3 ++-
> > tools/testing/vma/vma_internal.h | 3 ++-
> > 5 files changed, 33 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 63e3ec8d63be7..d295252407fee 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -444,7 +444,9 @@ void folio_activate(struct folio *folio);
> >
> > void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> > struct vm_area_struct *start_vma, unsigned long floor,
> > - unsigned long ceiling, bool mm_wr_locked);
> > + unsigned long ceiling, unsigned long tree_max,
> > + bool mm_wr_locked);
> > +
> > void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte);
> >
> > struct zap_details;
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 3e0404bd57a02..24716b3713f66 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -369,12 +369,34 @@ void free_pgd_range(struct mmu_gather *tlb,
> > } while (pgd++, addr = next, addr != end);
> > }
> >
> > +/*
>
> NIT: /** right? This looks like a kernel-doc to me!
>
> > + * free_pgtables() - Free a range of page tables
> > + * @tlb: The mmu gather
> > + * @mas: The maple state
> > + * @vma: The first vma
> > + * @floor: The lowest page table address
> > + * @ceiling: The highest page table address
> > + * @tree_max: The highest tree search address
> > + * @mm_wr_locked: boolean indicating if the mm is write locked
> > + *
> > + * Note: Floor and ceiling are provided to indicate the absolute range of the
> > + * page tables that should be removed. This can differ from the vma mappings on
> > + * some archs that may have mappings that need to be removed outside the vmas.
> > + * Note that the prev->vm_end and next->vm_start are often used.
>
> Great write-up though you are missing some horrified noises re: the arches doing
> that, I guess the reader has to play them back in their head... ;)
>
> > + *
> > + * The tree_max differs from the ceiling when a dup_mmap() failed and the tree
> > + * has unrelated data to the mm_struct being torn down.
> > + */
>
> Ohhh nice nice thanks for adding this comment!
>
> > void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> > struct vm_area_struct *vma, unsigned long floor,
> > - unsigned long ceiling, bool mm_wr_locked)
> > + unsigned long ceiling, unsigned long tree_max,
> > + bool mm_wr_locked)
> > {
> > struct unlink_vma_file_batch vb;
> >
> > + /* underflow can happen and is fine */
>
> I am taking this as a sign that underflow is in fact fine _everywhere_
> including in internal plumbing! :P
>
> > + WARN_ON_ONCE(tree_max - 1 > ceiling - 1);
>
> Hmm, so if tree_max == 1, and ceilling == 0, we're ok with that (would
> resolve to tree_max = 0 and ceiling == ULONG_MAX), I guess relating to the
> 'interpret 0 as everything' semantics I think we have for ceiling?
>
> I guess it's because these are exclusive.
>
> So perhaps worth updating comment to:
>
> /*
> * these values are exclusive bounds, with 0 being interpreted as the
> * entire range, so underflow is fine.
> */
>
> or similar, just to really underline that...
The tree_max = 0 makes no sense and will be evaluated as a noop in the
code - we will fail to find any vmas to iterate over. We cannot have a
vma starting at 0 and running to 0.
tree_max is not an exclusive bound before this function - which uses the
maple state (mas_find) vs the vma iterator, which does the translation
for us.
ceiling has always had the potential of being 0 from the #define of
USER_PGTABLES_CEILING.
If we were to use the vma iterator, then we're having an inclusive limit
and the start/end (named floor and ceiling here..) difference. So I'm
just trying to keep everything here on the same level of reference. We
need to subtract from both values (especially since they were the same
variable before). This seems to make code easier to read, especially in
this diff.
>
> > +
> > tlb_free_vmas(tlb);
> >
> > do {
> > @@ -385,7 +407,7 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> > * Note: USER_PGTABLES_CEILING may be passed as ceiling and may
> > * be 0. This will underflow and is okay.
> > */
> > - next = mas_find(mas, ceiling - 1);
> > + next = mas_find(mas, tree_max - 1);
> > if (unlikely(xa_is_zero(next)))
> > next = NULL;
> >
> > @@ -405,7 +427,7 @@ void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> > */
> > while (next && next->vm_start <= vma->vm_end + PMD_SIZE) {
> > vma = next;
> > - next = mas_find(mas, ceiling - 1);
> > + next = mas_find(mas, tree_max - 1);
> > if (unlikely(xa_is_zero(next)))
> > next = NULL;
> > if (mm_wr_locked)
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index a290448a53bb2..0f4808f135fe6 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1311,7 +1311,7 @@ void exit_mmap(struct mm_struct *mm)
> > mt_clear_in_rcu(&mm->mm_mt);
> > vma_iter_set(&vmi, vma->vm_end);
> > free_pgtables(&tlb, &vmi.mas, vma, FIRST_USER_ADDRESS,
> > - USER_PGTABLES_CEILING, true);
> > + USER_PGTABLES_CEILING, USER_PGTABLES_CEILING, true);
>
> NIT: Might be nice, while we're here, to add your (very nice) convention of
> prefacing boolean params with the param name, so here:
>
> ..., /* mm_wr_locked= */ true);
Ah... long line and I kill it later, so I left it off.
>
> > tlb_finish_mmu(&tlb);
> >
> > /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index a648e0555c873..1bae142bbc0f1 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -486,6 +486,7 @@ void unmap_region(struct ma_state *mas, struct vm_area_struct *vma,
> > /* mm_wr_locked = */ true);
> > mas_set(mas, vma->vm_end);
> > free_pgtables(&tlb, mas, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
> > + next ? next->vm_start : USER_PGTABLES_CEILING,
> > next ? next->vm_start : USER_PGTABLES_CEILING,
> > /* mm_wr_locked = */ true);
> > tlb_finish_mmu(&tlb);
> > @@ -1232,7 +1233,7 @@ static inline void vms_clear_ptes(struct vma_munmap_struct *vms,
> > mas_set(mas_detach, 1);
> > /* start and end may be different if there is no prev or next vma. */
> > free_pgtables(&tlb, mas_detach, vms->vma, vms->unmap_start,
> > - vms->unmap_end, mm_wr_locked);
> > + vms->unmap_end, vms->unmap_end, mm_wr_locked);
> > tlb_finish_mmu(&tlb);
> > vms->clear_ptes = false;
> > }
> > diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> > index 07167446dcf42..823d379e1fac2 100644
> > --- a/tools/testing/vma/vma_internal.h
> > +++ b/tools/testing/vma/vma_internal.h
> > @@ -900,7 +900,8 @@ static inline void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas,
> >
> > static inline void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas,
> > struct vm_area_struct *vma, unsigned long floor,
> > - unsigned long ceiling, bool mm_wr_locked)
> > + unsigned long ceiling, unsigned long tree_max,
> > + bool mm_wr_locked)
> > {
> > (void)tlb;
> > (void)mas;
> > --
> > 2.47.2
> >
next prev parent reply other threads:[~2026-01-13 19:49 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 19:09 [PATCH v1 0/9] Remove XA_ZERO from error recovery of dup_mmap() Liam R. Howlett
2025-09-09 19:09 ` [PATCH v1 1/9] mm/mmap: Move exit_mmap() trace point Liam R. Howlett
2025-09-09 20:07 ` Suren Baghdasaryan
2025-09-10 12:51 ` Pedro Falcato
2025-09-11 9:19 ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 2/9] mm/mmap: Abstract vma clean up from exit_mmap() Liam R. Howlett
2025-09-09 20:09 ` Suren Baghdasaryan
2026-01-13 18:50 ` Liam R. Howlett
2025-09-10 12:54 ` Pedro Falcato
2025-09-11 9:21 ` David Hildenbrand
2026-01-13 18:51 ` Liam R. Howlett
2025-09-09 19:09 ` [PATCH v1 3/9] mm/vma: Add limits to unmap_region() for vmas Liam R. Howlett
2025-09-09 20:09 ` Suren Baghdasaryan
2026-01-13 18:51 ` Liam R. Howlett
2025-09-10 12:56 ` Pedro Falcato
2025-09-11 8:56 ` Lorenzo Stoakes
2025-09-11 9:22 ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 4/9] mm/memory: Add tree limit to free_pgtables() Liam R. Howlett
2025-09-09 21:05 ` Suren Baghdasaryan
2026-01-13 19:14 ` Liam R. Howlett
2025-09-10 13:08 ` Pedro Falcato
2025-09-11 9:08 ` Lorenzo Stoakes
2026-01-13 19:48 ` Liam R. Howlett [this message]
2025-09-11 9:24 ` David Hildenbrand
2026-01-13 19:19 ` Liam R. Howlett
2025-09-09 19:09 ` [PATCH v1 5/9] mm/vma: Add page table limit to unmap_region() Liam R. Howlett
2025-09-09 21:29 ` Suren Baghdasaryan
2026-01-13 20:09 ` Liam R. Howlett
2025-09-10 13:08 ` Pedro Falcato
2025-09-09 19:09 ` [PATCH v1 6/9] mm: Change dup_mmap() recovery Liam R. Howlett
2025-09-09 22:03 ` Suren Baghdasaryan
2025-09-10 13:23 ` Pedro Falcato
2025-09-11 9:13 ` Lorenzo Stoakes
2025-09-11 9:31 ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 7/9] mm: Introduce unmap_desc struct to reduce function arguments Liam R. Howlett
2025-09-09 21:44 ` Suren Baghdasaryan
2025-09-11 9:22 ` Lorenzo Stoakes
2025-09-11 16:51 ` Suren Baghdasaryan
2025-09-11 16:56 ` Liam R. Howlett
2025-09-11 17:03 ` Suren Baghdasaryan
2025-09-10 13:10 ` Pedro Falcato
2025-09-11 9:20 ` Lorenzo Stoakes
2025-09-09 19:09 ` [PATCH v1 8/9] mm/vma: Use unmap_desc in vms_clear_ptes() and exit_mmap() Liam R. Howlett
2025-09-09 22:16 ` Suren Baghdasaryan
2025-09-11 16:59 ` Liam R. Howlett
2025-09-11 9:53 ` Lorenzo Stoakes
2025-09-12 5:08 ` kernel test robot
2025-09-09 19:09 ` [PATCH v1 9/9] mm: Use unmap_desc struct for freeing page tables Liam R. Howlett
2025-09-09 22:27 ` Suren Baghdasaryan
2025-09-11 10:06 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nns65h74umsr3ahzx2td4pxbye26ootx7nibkhihxiczoyxodz@k4z7jrjr72ez \
--to=liam.howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=maple-tree@lists.infradead.org \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=pfalcato@suse.de \
--cc=quic_charante@quicinc.com \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox