Re: [PATCH v5 22/25] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Barry Song <21cnbao@gmail.com>,
	Alistair Popple <apopple@nvidia.com>,
	Yang Shi <shy828301@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>
Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 22/25] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch()
Date: Mon, 12 Feb 2024 14:43:35 +0100	[thread overview]
Message-ID: <6d452a1a-1edc-4e97-8b39-99dc48315bb8@redhat.com> (raw)
In-Reply-To: <20240202080756.1453939-23-ryan.roberts@arm.com>

On 02.02.24 09:07, Ryan Roberts wrote:
> Some architectures (e.g. arm64) can tell from looking at a pte, if some
> follow-on ptes also map contiguous physical memory with the same pgprot.
> (for arm64, these are contpte mappings).
> 
> Take advantage of this knowledge to optimize folio_pte_batch() so that
> it can skip these ptes when scanning to create a batch. By default, if
> an arch does not opt-in, folio_pte_batch() returns a compile-time 1, so
> the changes are optimized out and the behaviour is as before.
> 
> arm64 will opt-in to providing this hint in the next patch, which will
> greatly reduce the cost of ptep_get() when scanning a range of contptes.
> 
> Tested-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>   include/linux/pgtable.h | 18 ++++++++++++++++++
>   mm/memory.c             | 20 +++++++++++++-------
>   2 files changed, 31 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 50f32cccbd92..cba31f177d27 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -212,6 +212,24 @@ static inline int pmd_dirty(pmd_t pmd)
>   #define arch_flush_lazy_mmu_mode()	do {} while (0)
>   #endif
>   
> +#ifndef pte_batch_hint
> +/**
> + * pte_batch_hint - Number of pages that can be added to batch without scanning.
> + * @ptep: Page table pointer for the entry.
> + * @pte: Page table entry.
> + *
> + * Some architectures know that a set of contiguous ptes all map the same
> + * contiguous memory with the same permissions. In this case, it can provide a
> + * hint to aid pte batching without the core code needing to scan every pte.

I think we might want to document here the expectation regarding
dirty/accessed bits. folio_pte_batch() will ignore dirty bits only with
FPB_IGNORE_DIRTY. But especially for arm64, it makes sense to ignore them
always when batching, because the dirty bit may target any pte part of the
cont-pte group either way.

Maybe something like:

"
An architecture implementation may only ignore the PTE accessed and dirty bits.
Further, it may only ignore the dirty bit if that bit is already not
maintained with precision per PTE inside the hinted batch, and ptep_get()
would already have to collect it from various PTEs.
"

I think there are some more details to it, but I'm hoping something along
the lines above is sufficient.


> +
>   #ifndef pte_advance_pfn
>   static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr)
>   {
> diff --git a/mm/memory.c b/mm/memory.c
> index 65fbe4f886c1..902665b27702 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -988,16 +988,21 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>   {
>   	unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
>   	const pte_t *end_ptep = start_ptep + max_nr;
> -	pte_t expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, 1), flags);
> -	pte_t *ptep = start_ptep + 1;
> +	pte_t expected_pte = __pte_batch_clear_ignored(pte, flags);
> +	pte_t *ptep = start_ptep;
>   	bool writable;
> +	int nr;
>   
>   	if (any_writable)
>   		*any_writable = false;
>   
>   	VM_WARN_ON_FOLIO(!pte_present(pte), folio);
>   
> -	while (ptep != end_ptep) {
> +	nr = pte_batch_hint(ptep, pte);
> +	expected_pte = pte_advance_pfn(expected_pte, nr);
> +	ptep += nr;
> +

*Maybe* it's easier to get when initializing expected_pte+ptep only once.

Like:

[...]
pte_t expected_pte, *ptep;
[...]

nr = pte_batch_hint(start_ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
ptep = start_ptep + nr;

> +	while (ptep < end_ptep) {
>   		pte = ptep_get(ptep);
>   		if (any_writable)
>   			writable = !!pte_write(pte);
> @@ -1011,17 +1016,18 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
>   		 * corner cases the next PFN might fall into a different
>   		 * folio.
>   		 */
> -		if (pte_pfn(pte) == folio_end_pfn)
> +		if (pte_pfn(pte) >= folio_end_pfn)
>   			break;
>   
>   		if (any_writable)
>   			*any_writable |= writable;
>   
> -		expected_pte = pte_advance_pfn(expected_pte, 1);
> -		ptep++;
> +		nr = pte_batch_hint(ptep, pte);
> +		expected_pte = pte_advance_pfn(expected_pte, nr);
> +		ptep += nr;
>   	}
>   
> -	return ptep - start_ptep;
> +	return min(ptep - start_ptep, max_nr);
>   }

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-02-12 13:43 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02  8:07 [PATCH v5 00/25] Transparent Contiguous PTEs for User Mappings Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 01/25] mm: Clarify the spec for set_ptes() Ryan Roberts
2024-02-12 12:03   ` David Hildenbrand
2024-02-02  8:07 ` [PATCH v5 02/25] mm: thp: Batch-collapse PMD with set_ptes() Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 03/25] mm: Make pte_next_pfn() a wrapper around pte_advance_pfn() Ryan Roberts
2024-02-12 12:14   ` David Hildenbrand
2024-02-12 14:10     ` Ryan Roberts
2024-02-12 14:29       ` David Hildenbrand
2024-02-12 21:34         ` Ryan Roberts
2024-02-13  9:54           ` David Hildenbrand
2024-02-02  8:07 ` [PATCH v5 04/25] arm/mm: Convert pte_next_pfn() to pte_advance_pfn() Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 05/25] arm64/mm: " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 06/25] powerpc/mm: " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 07/25] x86/mm: " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 08/25] mm: Remove pte_next_pfn() and replace with pte_advance_pfn() Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 09/25] arm64/mm: set_pte(): New layer to manage contig bit Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 10/25] arm64/mm: set_ptes()/set_pte_at(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 11/25] arm64/mm: pte_clear(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 12/25] arm64/mm: ptep_get_and_clear(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 13/25] arm64/mm: ptep_test_and_clear_young(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 14/25] arm64/mm: ptep_clear_flush_young(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 15/25] arm64/mm: ptep_set_wrprotect(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 16/25] arm64/mm: ptep_set_access_flags(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 17/25] arm64/mm: ptep_get(): " Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 18/25] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2024-02-12 12:44   ` David Hildenbrand
2024-02-12 13:05     ` Ryan Roberts
2024-02-12 13:15       ` David Hildenbrand
2024-02-12 13:27         ` Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2024-02-12 12:00   ` Mark Rutland
2024-02-12 12:59     ` Ryan Roberts
2024-02-12 13:54       ` David Hildenbrand
2024-02-12 14:45         ` Ryan Roberts
2024-02-12 15:26           ` David Hildenbrand
2024-02-12 15:34             ` Ryan Roberts
2024-02-12 16:24               ` David Hildenbrand
2024-02-13 15:29                 ` Ryan Roberts
2024-02-12 15:30       ` Ryan Roberts
2024-02-12 20:38         ` Ryan Roberts
2024-02-13 10:01           ` David Hildenbrand
2024-02-13 12:06           ` Ryan Roberts
2024-02-13 12:19             ` David Hildenbrand
2024-02-13 13:06               ` Ryan Roberts
2024-02-13 13:13                 ` David Hildenbrand
2024-02-13 13:20                   ` Ryan Roberts
2024-02-13 13:22                     ` David Hildenbrand
2024-02-13 13:24                       ` Ryan Roberts
2024-02-13 13:33                     ` Ard Biesheuvel
2024-02-13 13:45                       ` David Hildenbrand
2024-02-13 14:02                         ` Ryan Roberts
2024-02-13 14:05                           ` David Hildenbrand
2024-02-13 14:08                             ` Ard Biesheuvel
2024-02-13 14:21                               ` Ryan Roberts
2024-02-13 12:02       ` Mark Rutland
2024-02-13 13:03         ` Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 20/25] arm64/mm: Implement new wrprotect_ptes() batch API Ryan Roberts
2024-02-13 16:31   ` Mark Rutland
2024-02-13 16:36     ` Ryan Roberts
2024-02-02  8:07 ` [PATCH v5 21/25] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs Ryan Roberts
2024-02-13 16:43   ` Mark Rutland
2024-02-13 16:48     ` Ryan Roberts
2024-02-13 16:53       ` Mark Rutland
2024-02-02  8:07 ` [PATCH v5 22/25] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch() Ryan Roberts
2024-02-12 13:43   ` David Hildenbrand [this message]
2024-02-12 15:00     ` Ryan Roberts
2024-02-12 15:47     ` Ryan Roberts
2024-02-12 16:27       ` David Hildenbrand
2024-02-02  8:07 ` [PATCH v5 23/25] arm64/mm: Implement pte_batch_hint() Ryan Roberts
2024-02-12 13:46   ` David Hildenbrand
2024-02-13 16:54   ` Mark Rutland
2024-02-02  8:07 ` [PATCH v5 24/25] arm64/mm: __always_inline to improve fork() perf Ryan Roberts
2024-02-13 16:55   ` Mark Rutland
2024-02-02  8:07 ` [PATCH v5 25/25] arm64/mm: Automatically fold contpte mappings Ryan Roberts
2024-02-13 17:44   ` Mark Rutland
2024-02-13 18:05     ` Ryan Roberts
2024-02-08 17:34 ` [PATCH v5 00/25] Transparent Contiguous PTEs for User Mappings Mark Rutland
2024-02-09  8:54   ` Ryan Roberts
2024-02-09 22:16     ` David Hildenbrand
2024-02-09 23:52       ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d452a1a-1edc-4e97-8b39-99dc48315bb8@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=apopple@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox