linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com,
	vbabka@suse.cz, jannh@google.com, pfalcato@suse.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	david@redhat.com, peterx@redhat.com, ryan.roberts@arm.com,
	mingo@kernel.org, libang.li@antgroup.com, maobibo@loongson.cn,
	zhengqi.arch@bytedance.com, baohua@kernel.org,
	anshuman.khandual@arm.com, willy@infradead.org,
	ioworker0@gmail.com, yang@os.amperecomputing.com
Subject: Re: [PATCH 3/3] mm: Optimize mremap() by PTE batching
Date: Tue, 6 May 2025 14:49:04 +0100	[thread overview]
Message-ID: <7c6e61fa-2437-4c99-b1d3-97af5e2b37a3@lucifer.local> (raw)
In-Reply-To: <20250506050056.59250-4-dev.jain@arm.com>

On Tue, May 06, 2025 at 10:30:56AM +0530, Dev Jain wrote:
> Use folio_pte_batch() to optimize move_ptes(). Use get_and_clear_full_ptes()
> so as to elide TLBIs on each contig block, which was previously done by
> ptep_get_and_clear().

No mention of large folios

>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
>  mm/mremap.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 1a08a7c3b92f..3621c07d8eea 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -176,7 +176,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	struct vm_area_struct *vma = pmc->old;
>  	bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
>  	struct mm_struct *mm = vma->vm_mm;
> -	pte_t *old_ptep, *new_ptep, pte;
> +	pte_t *old_ptep, *new_ptep, old_pte, pte;

Obviously given previous comment you know what I'm going to say here :) let's
put old_pte, pte in a new decl.

>  	pmd_t dummy_pmdval;
>  	spinlock_t *old_ptl, *new_ptl;
>  	bool force_flush = false;
> @@ -185,6 +185,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  	unsigned long old_end = old_addr + extent;
>  	unsigned long len = old_end - old_addr;
>  	int err = 0;
> +	int nr;
>
>  	/*
>  	 * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma
> @@ -237,10 +238,14 @@ static int move_ptes(struct pagetable_move_control *pmc,
>
>  	for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
>  				   new_ptep++, new_addr += PAGE_SIZE) {

Hm this just seems wrong, even if we're dealing with a large folio we're still
offsetting by PAGE_SIZE each time and iterating through further sub-pages?

Shouldn't we be doing something like += nr and += PAGE_SIZE * nr?

Then it'd make sense to initialise nr to 1.

Honestly I'd prefer us though to refactor move_ptes() to something like:

	for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
				   new_ptep++, new_addr += PAGE_SIZE) {
		pte_t old_pte = ptep_get(old_ptep);

		if (pte_none(old_pte))
			continue;

		move_pte(pmc, vma, old_ptep, old_pte);
	}

Declaring this new move_pte() where you can put the rest of the stuff.

I'd much rather we do this than add to the mess as-is.



> -		if (pte_none(ptep_get(old_ptep)))
> +		const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
> +		int max_nr = (old_end - old_addr) >> PAGE_SHIFT;
> +
> +		nr = 1;
> +		old_pte = ptep_get(old_ptep);

You can declare this in the for loop, no need for us to contaminate whole
function scope with it.

Same with 'nr' in this implementation (though I'd rather you changed it up, see
above).

> +		if (pte_none(old_pte))
>  			continue;
>
> -		pte = ptep_get_and_clear(mm, old_addr, old_ptep);
>  		/*
>  		 * If we are remapping a valid PTE, make sure
>  		 * to flush TLB before we drop the PTL for the
> @@ -252,8 +257,17 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  		 * the TLB entry for the old mapping has been
>  		 * flushed.
>  		 */
> -		if (pte_present(pte))
> +		if (pte_present(old_pte)) {
> +			if ((max_nr != 1) && maybe_contiguous_pte_pfns(old_ptep, old_pte)) {
> +				struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);
> +
> +				if (folio && folio_test_large(folio))
> +					nr = folio_pte_batch(folio, old_addr, old_ptep,
> +					old_pte, max_nr, fpb_flags, NULL, NULL, NULL);

Indentation seems completely broken here? I also hate that we're nesting to this
degree? Can we please find a way not to?

This function is already a bit of a clogged mess, I'd rather we clean up then
add to it.

(See above again :)


> +			}
>  			force_flush = true;
> +		}
> +		pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr, 0);
>  		pte = move_pte(pte, old_addr, new_addr);
>  		pte = move_soft_dirty_pte(pte);
>
> @@ -266,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  				else if (is_swap_pte(pte))
>  					pte = pte_swp_clear_uffd_wp(pte);
>  			}
> -			set_pte_at(mm, new_addr, new_ptep, pte);
> +			set_ptes(mm, new_addr, new_ptep, pte, nr);
>  		}
>  	}
>
> --
> 2.30.2
>

Cheers, Lorenzo


  parent reply	other threads:[~2025-05-06 13:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-06  5:00 [PATCH 0/3] Optimize mremap() by PTE-batching Dev Jain
2025-05-06  5:00 ` [PATCH 1/3] mm: Call pointers to ptes as ptep Dev Jain
2025-05-06  8:50   ` Anshuman Khandual
2025-05-06  9:05     ` Lorenzo Stoakes
2025-05-06 10:52   ` Lorenzo Stoakes
2025-05-06 11:52     ` Dev Jain
2025-05-06  5:00 ` [PATCH 2/3] mm: Add generic helper to hint a large folio Dev Jain
2025-05-06  9:10   ` Anshuman Khandual
2025-05-06 13:34     ` Lorenzo Stoakes
2025-05-06 15:46   ` Matthew Wilcox
2025-05-07  3:43     ` Dev Jain
2025-05-07 10:03   ` David Hildenbrand
2025-05-08  5:02     ` Dev Jain
2025-05-08 10:55       ` David Hildenbrand
2025-05-09  5:25         ` Dev Jain
2025-05-09  9:16           ` David Hildenbrand
2025-05-06  5:00 ` [PATCH 3/3] mm: Optimize mremap() by PTE batching Dev Jain
2025-05-06 10:10   ` Anshuman Khandual
2025-05-06 10:20     ` Dev Jain
2025-05-06 13:49   ` Lorenzo Stoakes [this message]
2025-05-06 14:03     ` Lorenzo Stoakes
2025-05-06 14:10     ` Dev Jain
2025-05-06 14:14       ` Lorenzo Stoakes
2025-05-06  9:16 ` [PATCH 0/3] Optimize mremap() by PTE-batching Anshuman Khandual
2025-05-06 10:22   ` Dev Jain
2025-05-06 10:44     ` Lorenzo Stoakes
2025-05-06 11:53       ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c6e61fa-2437-4c99-b1d3-97af5e2b37a3@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=libang.li@antgroup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maobibo@loongson.cn \
    --cc=mingo@kernel.org \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=ryan.roberts@arm.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox