From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Dev Jain <dev.jain@arm.com>
Cc: akpm@linux-foundation.org, david@redhat.com,
catalin.marinas@arm.com, will@kernel.org,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, suzuki.poulose@arm.com,
steven.price@arm.com, gshan@redhat.com,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 3/3] mm/pagewalk: Add pre/post_pte_table callback for lazy MMU on arm64
Date: Fri, 30 May 2025 12:14:31 +0100 [thread overview]
Message-ID: <7b4360a0-42ef-45ab-9e63-eace52943529@lucifer.local> (raw)
In-Reply-To: <20250530090407.19237-4-dev.jain@arm.com>
On Fri, May 30, 2025 at 02:34:07PM +0530, Dev Jain wrote:
> arm64 implements lazy_mmu_mode to allow deferral and batching of barriers
> when updating kernel PTEs, which provides a nice performance boost. arm64
> currently uses apply_to_page_range() to modify kernel PTE permissions,
> which runs inside lazy_mmu_mode. So to prevent a performance regression,
> let's add hooks to walk_page_range_novma() to allow continued use of
> lazy_mmu_mode.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> Credits to Ryan for the patch description.
>
> arch/arm64/mm/pageattr.c | 12 ++++++++++++
> include/linux/pagewalk.h | 2 ++
> mm/pagewalk.c | 6 ++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index a5c829c64969..9163324b12a0 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -75,11 +75,23 @@ static int pageattr_pte_entry(pte_t *pte, unsigned long addr,
> return 0;
> }
>
> +static void pte_lazy_mmu_enter(void)
> +{
> + arch_enter_lazy_mmu_mode();
> +}
Hm am I missing something? I don't see this function or the leave version
defined in arch/arm64?
No do I see __HAVE_ARCH_ENTER_LAZY_MMU_MODE?
> +
> +static void pte_lazy_mmu_leave(void)
> +{
> + arch_leave_lazy_mmu_mode();
> +}
Are you absolutely sure you will never need to hook this stuff on higher level
page tables?
If this relates to vmalloc, then we do have huge page mappings in vmalloc logic?
> +
> static const struct mm_walk_ops pageattr_ops = {
> .pud_entry = pageattr_pud_entry,
> .pmd_entry = pageattr_pmd_entry,
> .pte_entry = pageattr_pte_entry,
> .walk_lock = PGWALK_NOLOCK,
> + .pre_pte_table = pte_lazy_mmu_enter,
> + .post_pte_table = pte_lazy_mmu_leave,
This is kind of horrid really, are we sure the lazy mmu mode is valid for
everything that occurs within the the loop? I suppose it's only simple logic for
the ops->pte_entry stuff.
But it feels like hacking something in for this specific case.
At the same time I don't want to get in the way of an optimisation. We could do
something in ops->pmd_entry, but then we'd not get to turn it off afterwards...
Same for any higher level page table hm.
Is this really the only way to get this? I guess it's not feasible having this
just switched on for the whole operation...
I just fear that we could end up populating these mm_walk_ops with every corner
case thing we think of.
> };
>
> bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED);
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 9bc8853ed3de..2157d345974c 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -88,6 +88,8 @@ struct mm_walk_ops {
> int (*pre_vma)(unsigned long start, unsigned long end,
> struct mm_walk *walk);
> void (*post_vma)(struct mm_walk *walk);
> + void (*pre_pte_table)(void);
> + void (*post_pte_table)(void);
> int (*install_pte)(unsigned long addr, unsigned long next,
> pte_t *ptep, struct mm_walk *walk);
> enum page_walk_lock walk_lock;
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 9657cf4664b2..a441f5cbbc45 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -33,6 +33,9 @@ static int walk_pte_range_inner(pte_t *pte, unsigned long addr,
> const struct mm_walk_ops *ops = walk->ops;
> int err = 0;
>
> + if (walk->ops->pre_pte_table)
> + walk->ops->pre_pte_table();
NIT: you have 'ops' already, no need for walk-> :)
> +
> for (;;) {
> if (ops->install_pte && pte_none(ptep_get(pte))) {
> pte_t new_pte;
> @@ -56,6 +59,9 @@ static int walk_pte_range_inner(pte_t *pte, unsigned long addr,
> addr += PAGE_SIZE;
> pte++;
> }
> +
> + if (walk->ops->post_pte_table)
> + walk->ops->post_pte_table();
NIT: same as above.
> return err;
> }
>
> --
> 2.30.2
>
next prev parent reply other threads:[~2025-05-30 11:14 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 9:04 [PATCH 0/3] Enable huge-vmalloc permission change Dev Jain
2025-05-30 9:04 ` [PATCH 1/3] mm: Allow pagewalk without locks Dev Jain
2025-05-30 10:33 ` Ryan Roberts
2025-05-30 21:33 ` Yang Shi
2025-05-30 10:57 ` Lorenzo Stoakes
2025-06-06 9:21 ` Dev Jain
2025-06-06 9:33 ` Dev Jain
2025-06-06 10:02 ` Lorenzo Stoakes
2025-05-30 9:04 ` [PATCH 2/3] arm64: pageattr: Use walk_page_range_novma() to change memory permissions Dev Jain
2025-05-30 12:53 ` Ryan Roberts
2025-06-02 4:35 ` Dev Jain
2025-06-06 9:49 ` Lorenzo Stoakes
2025-06-06 10:39 ` Dev Jain
2025-06-06 10:56 ` Lorenzo Stoakes
2025-06-06 11:08 ` Dev Jain
2025-06-09 9:41 ` Dev Jain
2025-06-09 11:00 ` Lorenzo Stoakes
2025-06-09 11:31 ` Dev Jain
2025-05-30 9:04 ` [PATCH 3/3] mm/pagewalk: Add pre/post_pte_table callback for lazy MMU on arm64 Dev Jain
2025-05-30 11:14 ` Lorenzo Stoakes [this message]
2025-05-30 12:12 ` Ryan Roberts
2025-05-30 12:18 ` Lorenzo Stoakes
2025-05-30 9:24 ` [PATCH 0/3] Enable huge-vmalloc permission change Dev Jain
2025-05-30 10:03 ` Ryan Roberts
2025-05-30 10:10 ` Dev Jain
2025-05-30 10:37 ` Ryan Roberts
2025-05-30 10:42 ` Dev Jain
2025-05-30 10:51 ` Ryan Roberts
2025-05-30 11:11 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b4360a0-42ef-45ab-9e63-eace52943529@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
--cc=suzuki.poulose@arm.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox