linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
To: Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, axelrasmussen@google.com,
	 yuanchu@google.com, david@kernel.org, hughd@google.com,
	chrisl@kernel.org,  kasong@tencent.com, weixugc@google.com,
	Liam.Howlett@oracle.com, vbabka@kernel.org,  rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, riel@surriel.com,
	 harry.yoo@oracle.com, jannh@google.com, pfalcato@suse.de,
	 baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com,  baohua@kernel.org,
	youngjun.park@lge.com, ziy@nvidia.com, kas@kernel.org,
	 willy@infradead.org, yuzhao@google.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, ryan.roberts@arm.com,
	anshuman.khandual@arm.com
Subject: Re: [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs
Date: Tue, 10 Mar 2026 08:34:25 +0000	[thread overview]
Message-ID: <00f65edd-5b16-4f9e-a9fa-b923eba052b7@lucifer.local> (raw)
In-Reply-To: <20260310073013.4069309-6-dev.jain@arm.com>

On Tue, Mar 10, 2026 at 01:00:09PM +0530, Dev Jain wrote:
> The ptes are all the same w.r.t belonging to the same type of VMA, and
> being marked with uffd-wp or all being not marked. Therefore we can batch
> set uffd-wp markers through install_uffd_wp_ptes_if_needed, and enable
> batched unmapping of folios belonging to uffd-wp VMAs by dropping that
> condition from folio_unmap_pte_batch.
>
> It may happen that we don't batch over the entire folio in one go, in which
> case, we must skip over the current batch. Add a helper to do that -
> page_vma_mapped_walk_jump() will increment the relevant fields of pvmw
> by nr pages.
>
> I think that we can get away with just incrementing pvmw->pte
> and pvmw->address, since looking at the code in page_vma_mapped.c,
> pvmw->pfn and pvmw->nr_pages are used in conjunction, and pvmw->pgoff
> and pvmw->nr_pages (in vma_address_end()) are used in conjunction,
> cancelling out the increment and decrement in the respective fields. But
> let us not rely on the pvmw implementation and keep this simple.

This isn't simple...

>
> Export this function to rmap.h to enable future reuse.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
>  include/linux/rmap.h | 10 ++++++++++
>  mm/rmap.c            |  8 +++-----
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 8dc0871e5f001..1b7720c66ac87 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -892,6 +892,16 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
>  		spin_unlock(pvmw->ptl);
>  }
>
> +static inline void page_vma_mapped_walk_jump(struct page_vma_mapped_walk *pvmw,
> +		unsigned int nr)

unsigned long nr_pages... 'nr' is meaningless and you're mixing + matching types
for no reason.

> +{
> +	pvmw->pfn += nr;
> +	pvmw->nr_pages -= nr;
> +	pvmw->pgoff += nr;
> +	pvmw->pte += nr;
> +	pvmw->address += nr * PAGE_SIZE;
> +}

I absolutely hate this. It's extremely confusing, especially since you're now
going from looking at 1 page to nr_pages - 1, jump doesn't really mean anything
here, you're losing sight of the batch size and exposing a silly detail to the
caller, and I really don't want to 'export' this at this time.

If we must have this, can you please make it static in rmap.c at least for the
time being.

Or perhaps instead, have a batched variant of page_vma_mapped_walk(), like
page_vma_mapped_walk_batch()?

I think that makes a lot more sense...

I mean I kind of hate the pvmw interface in general, this is a hack to handle
batching clamped on to the side of it, let's figure out how to do this sensibly
and do what's needed rather than adding yet more hacks-on-hacks please.

> +
>  /**
>   * page_vma_mapped_walk_restart - Restart the page table walk.
>   * @pvmw: Pointer to struct page_vma_mapped_walk.
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a7570cd037344..dd638429c963e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1953,9 +1953,6 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>  	if (pte_unused(pte))
>  		return 1;
>
> -	if (userfaultfd_wp(vma))
> -		return 1;
> -
>  	/*
>  	 * If unmap fails, we need to restore the ptes. To avoid accidentally
>  	 * upgrading write permissions for ptes that were not originally
> @@ -2235,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  		 * we may want to replace a none pte with a marker pte if
>  		 * it's file-backed, so we don't lose the tracking info.
>  		 */
> -		install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1);
> +		install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, nr_pages);
>
>  		/* Update high watermark before we lower rss */
>  		update_hiwater_rss(mm);
> @@ -2359,8 +2356,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  		 * If we are sure that we batched the entire folio and cleared
>  		 * all PTEs, we can just optimize and stop right here.
>  		 */
> -		if (nr_pages == folio_nr_pages(folio))
> +		if (likely(nr_pages == folio_nr_pages(folio)))

Please don't add random likely()'s based on what you think is likely(). This
kind of thing should only be done based on profiling.

>  			goto walk_done;
> +		page_vma_mapped_walk_jump(&pvmw, nr_pages - 1);

(You're now passing a signed long to an unsigned int...!)


>  		continue;
>  walk_abort:
>  		ret = false;
> --
> 2.34.1
>

Thanks, Lorenzo


  reply	other threads:[~2026-03-10  8:34 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  7:30 [PATCH 0/9] mm/rmap: Optimize anonymous large folio unmapping Dev Jain
2026-03-10  7:30 ` [PATCH 1/9] mm/rmap: make nr_pages signed in try_to_unmap_one Dev Jain
2026-03-10  7:56   ` Lorenzo Stoakes (Oracle)
2026-03-10  8:06     ` David Hildenbrand (Arm)
2026-03-10  8:23       ` Dev Jain
2026-03-10 12:40         ` Matthew Wilcox
2026-03-11  4:54           ` Dev Jain
2026-03-10  7:30 ` [PATCH 2/9] mm/rmap: initialize nr_pages to 1 at loop start " Dev Jain
2026-03-10  8:10   ` Lorenzo Stoakes (Oracle)
2026-03-10  8:31     ` Dev Jain
2026-03-10  8:39       ` Lorenzo Stoakes (Oracle)
2026-03-10  8:43         ` Dev Jain
2026-03-10  7:30 ` [PATCH 3/9] mm/rmap: refactor lazyfree unmap commit path to commit_ttu_lazyfree_folio() Dev Jain
2026-03-10  8:19   ` Lorenzo Stoakes (Oracle)
2026-03-10  8:42     ` Dev Jain
2026-03-19 15:53       ` Lorenzo Stoakes (Oracle)
2026-03-10  7:30 ` [PATCH 4/9] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-03-10  7:30 ` [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-03-10  8:34   ` Lorenzo Stoakes (Oracle) [this message]
2026-03-10 23:32     ` Barry Song
2026-03-11  4:14       ` Barry Song
2026-03-11  4:52         ` Dev Jain
2026-03-11  4:56     ` Dev Jain
2026-03-10  7:30 ` [PATCH 6/9] mm/swapfile: Make folio_dup_swap batchable Dev Jain
2026-03-10  8:27   ` Kairui Song
2026-03-10  8:46     ` Dev Jain
2026-03-10  8:49   ` Lorenzo Stoakes (Oracle)
2026-03-11  5:42     ` Dev Jain
2026-03-19 15:26       ` Lorenzo Stoakes (Oracle)
2026-03-19 16:47       ` Matthew Wilcox
2026-03-18  0:20   ` kernel test robot
2026-03-10  7:30 ` [PATCH 7/9] mm/swapfile: Make folio_put_swap batchable Dev Jain
2026-03-10  8:29   ` Kairui Song
2026-03-10  8:50     ` Dev Jain
2026-03-10  8:55   ` Lorenzo Stoakes (Oracle)
2026-03-18  1:04   ` kernel test robot
2026-03-10  7:30 ` [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes Dev Jain
2026-03-10  9:38   ` Lorenzo Stoakes (Oracle)
2026-03-11  8:09     ` Dev Jain
2026-03-12  8:19       ` Wei Yang
2026-03-19 15:47       ` Lorenzo Stoakes (Oracle)
2026-04-08  7:14         ` Dev Jain
2026-03-10  7:30 ` [PATCH 9/9] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
2026-03-10  8:02 ` [PATCH 0/9] mm/rmap: Optimize anonymous large folio unmapping Lorenzo Stoakes (Oracle)
2026-03-10  9:28   ` Dev Jain
2026-03-10 12:59 ` Lance Yang
2026-03-11  8:11   ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00f65edd-5b16-4f9e-a9fa-b923eba052b7@lucifer.local \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry.yoo@oracle.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kas@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=youngjun.park@lge.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox