* [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT
2024-12-10 16:05 [PATCH 0/4] ARM: towards 32-bit preempt-rt support Arnd Bergmann
@ 2024-12-10 16:05 ` Arnd Bergmann
2024-12-11 13:04 ` Linus Walleij
2024-12-11 13:26 ` Sebastian Andrzej Siewior
2024-12-10 16:05 ` [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels Arnd Bergmann
` (2 subsequent siblings)
3 siblings, 2 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-10 16:05 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, Arnd Bergmann, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Linus Walleij, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt,
Thomas Gleixner
From: Thomas Gleixner <tglx@linutronix.de>
jump-labels are used to efficiently switch between two possible code
paths. To achieve this, stop_machine() is used to keep the CPU in a
known state while the opcode is modified. The usage of stop_machine()
here leads to large latency spikes which can be observed on PREEMPT_RT.
Jump labels may change the target during runtime and are not restricted
to debug or "configuration/ setup" part of a PREEMPT_RT system where
high latencies could be defined as acceptable.
On 64-bit Arm, it is possible to use jump labels without the
stop_machine() call, which architecturally provides a way to atomically
change one 32-bit instruction word while keeping maintaining consistency,
but this is not generally the case on 32-bit, in particular in thumb2
mode.
Disable jump-label support on a PREEMPT_RT system when SMP is enabled.
[bigeasy: Patch description.]
[arnd: add !SMP case, extend changelog]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20220613182447.112191-2-bigeasy@linutronix.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
I slightly changed the patch from the version currently in linux-rt.git
to leave jump labels enabled on single-CPU kernels that are still
fairly common on 32-bit arm.
If there are no additional concerns about this version, I will
forward it to Russell's patch system
---
arch/arm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fb4e1da3bb98..ed850cc0ed3c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -80,7 +80,7 @@ config ARM
select HAS_IOPORT if PCI || PCMCIA || ISA || ARCH_FOOTBRIDGE || ARCH_RPC
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
- select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
+ select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU && (!PREEMPT_RT || !SMP)
select HAVE_ARCH_KFENCE if MMU && !XIP_KERNEL
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
--
2.39.5
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT
2024-12-10 16:05 ` [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT Arnd Bergmann
@ 2024-12-11 13:04 ` Linus Walleij
2024-12-11 13:26 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 26+ messages in thread
From: Linus Walleij @ 2024-12-11 13:04 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt,
Thomas Gleixner
On Tue, Dec 10, 2024 at 5:06 PM Arnd Bergmann <arnd@kernel.org> wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> jump-labels are used to efficiently switch between two possible code
> paths. To achieve this, stop_machine() is used to keep the CPU in a
> known state while the opcode is modified. The usage of stop_machine()
> here leads to large latency spikes which can be observed on PREEMPT_RT.
>
> Jump labels may change the target during runtime and are not restricted
> to debug or "configuration/ setup" part of a PREEMPT_RT system where
> high latencies could be defined as acceptable.
>
> On 64-bit Arm, it is possible to use jump labels without the
> stop_machine() call, which architecturally provides a way to atomically
> change one 32-bit instruction word while keeping maintaining consistency,
> but this is not generally the case on 32-bit, in particular in thumb2
> mode.
>
> Disable jump-label support on a PREEMPT_RT system when SMP is enabled.
>
> [bigeasy: Patch description.]
> [arnd: add !SMP case, extend changelog]
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Link: https://lkml.kernel.org/r/20220613182447.112191-2-bigeasy@linutronix.de
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Makes sense.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT
2024-12-10 16:05 ` [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT Arnd Bergmann
2024-12-11 13:04 ` Linus Walleij
@ 2024-12-11 13:26 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-11 13:26 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Linus Walleij, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt, Thomas Gleixner
On 2024-12-10 17:05:53 [+0100], Arnd Bergmann wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> jump-labels are used to efficiently switch between two possible code
> paths. To achieve this, stop_machine() is used to keep the CPU in a
> known state while the opcode is modified. The usage of stop_machine()
> here leads to large latency spikes which can be observed on PREEMPT_RT.
>
> Jump labels may change the target during runtime and are not restricted
> to debug or "configuration/ setup" part of a PREEMPT_RT system where
> high latencies could be defined as acceptable.
>
> On 64-bit Arm, it is possible to use jump labels without the
> stop_machine() call, which architecturally provides a way to atomically
> change one 32-bit instruction word while keeping maintaining consistency,
> but this is not generally the case on 32-bit, in particular in thumb2
> mode.
>
> Disable jump-label support on a PREEMPT_RT system when SMP is enabled.
>
> [bigeasy: Patch description.]
> [arnd: add !SMP case, extend changelog]
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Link: https://lkml.kernel.org/r/20220613182447.112191-2-bigeasy@linutronix.de
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> I slightly changed the patch from the version currently in linux-rt.git
> to leave jump labels enabled on single-CPU kernels that are still
> fairly common on 32-bit arm.
So HOTPLUG_CPU depends on SMP and without SMP there no HOTPLUG_CPU so
the patch function will be invoked directly. So we patch and flush.
Well, okay.
> If there are no additional concerns about this version, I will
> forward it to Russell's patch system
I'm fine with it, thank you.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-10 16:05 [PATCH 0/4] ARM: towards 32-bit preempt-rt support Arnd Bergmann
2024-12-10 16:05 ` [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT Arnd Bergmann
@ 2024-12-10 16:05 ` Arnd Bergmann
2024-12-11 13:29 ` Linus Walleij
2024-12-11 13:48 ` Sebastian Andrzej Siewior
2024-12-10 16:05 ` [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support Arnd Bergmann
2024-12-10 16:05 ` [PATCH 4/4] mm: drop HIGHPTE support altogether Arnd Bergmann
3 siblings, 2 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-10 16:05 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, Arnd Bergmann, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Linus Walleij, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
gup_pgd_range() is invoked with disabled interrupts and invokes
__kmap_local_page_prot() via pte_offset_map(), gup_p4d_range().
With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get()
which uses a spinlock_t via lock_kmap_any(). This leads to an
sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a
sleeping lock and must not be acquired in atomic context.
The loop in map_new_virtual() uses wait_queue_head_t for wake up which
also is using a spinlock_t.
Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT
to allow the use of get_user_pages_fast().
[arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
There is an open question about whether HIGHPTE is still needed
at all, given how rare 32-bit machines with more than 4GB
are on any architecture. If we instead decide to remove HIGHPTE
altogether, this patch is no longer needed.
---
arch/arm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ed850cc0ed3c..4de4e5697bdf 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1231,7 +1231,7 @@ config HIGHMEM
config HIGHPTE
bool "Allocate 2nd-level pagetables from highmem" if EXPERT
- depends on HIGHMEM
+ depends on HIGHMEM && !PREEMPT_RT
default y
help
The VM uses one page of physical memory for each page table.
--
2.39.5
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-10 16:05 ` [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels Arnd Bergmann
@ 2024-12-11 13:29 ` Linus Walleij
2024-12-11 15:22 ` Sebastian Andrzej Siewior
2024-12-11 13:48 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 26+ messages in thread
From: Linus Walleij @ 2024-12-11 13:29 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
On Tue, Dec 10, 2024 at 5:06 PM Arnd Bergmann <arnd@kernel.org> wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> gup_pgd_range() is invoked with disabled interrupts and invokes
There is no gup_pgd_range() in the kernel, is this patch a bit
old?
There is gup_fast_pgd_range().
See 23babe1934d7637b598e4c9d9f3876e318fa63a4
gup.c contains:
get_user_pages_fast attempts to pin user pages by
walking the page tables directly and avoids taking locks.
(...)
Let's consistently call the "fast-only" part of GUP "GUP-fast"
and rename all relevant internal functions to start with
"gup_fast", to make it clearer that this is not ordinary GUP.
The current mixture of "lockless", "gup" and "gup_fast" is
confusing.
So fast GUP is supposed to be lockless, and should just not
have this problem. So it can't be addressing gup_fast_pgd_range()
right?
> __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range().
> With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get()
> which uses a spinlock_t via lock_kmap_any(). This leads to an
> sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a
> sleeping lock and must not be acquired in atomic context.
I think this needs to be inspected by David Hildenbrand, if he consistently
rename the GPU functions to be "fast" and there is a lock somewhere
deep in there, something must be wrong and violating the API
contract.
> The loop in map_new_virtual() uses wait_queue_head_t for wake up which
> also is using a spinlock_t.
>
> Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT
> to allow the use of get_user_pages_fast().
>
> [arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]
HAVE_FAST_GUP
I'm still confused, how can something that is supposed to be
lockless "fast" acquire a spinlock? Something is odd here.
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> There is an open question about whether HIGHPTE is still needed
> at all, given how rare 32-bit machines with more than 4GB
> are on any architecture. If we instead decide to remove HIGHPTE
> altogether, this patch is no longer needed.
I'm more asking if HIGHPTE even acquires a spinlock anymore
as it is supposed to be "fast"/lockless. If it does, it is clearly violating
the "fast" promise of the fast GUP API and should not exist.
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-11 13:29 ` Linus Walleij
@ 2024-12-11 15:22 ` Sebastian Andrzej Siewior
2024-12-13 0:27 ` Linus Walleij
0 siblings, 1 reply; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-11 15:22 UTC (permalink / raw)
To: Linus Walleij
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, Arnd Bergmann,
linux-mm, linux-rt-devel, Ard Biesheuvel, Clark Williams,
Jason Baron, Josh Poimboeuf, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt
On 2024-12-11 14:29:29 [+0100], Linus Walleij wrote:
> So fast GUP is supposed to be lockless, and should just not
> have this problem. So it can't be addressing gup_fast_pgd_range()
> right?
…
> I'm more asking if HIGHPTE even acquires a spinlock anymore
> as it is supposed to be "fast"/lockless. If it does, it is clearly violating
> the "fast" promise of the fast GUP API and should not exist.
This is lockless on x86. The problem is ARM's
arch_kmap_local_high_get(). This is where the lock is from.
> Yours,
> Linus Walleij
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-11 15:22 ` Sebastian Andrzej Siewior
@ 2024-12-13 0:27 ` Linus Walleij
2024-12-13 9:11 ` Russell King (Oracle)
0 siblings, 1 reply; 26+ messages in thread
From: Linus Walleij @ 2024-12-13 0:27 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, Arnd Bergmann,
linux-mm, linux-rt-devel, Ard Biesheuvel, Clark Williams,
Jason Baron, Josh Poimboeuf, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt, David Hildenbrand
On Wed, Dec 11, 2024 at 4:22 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2024-12-11 14:29:29 [+0100], Linus Walleij wrote:
> > So fast GUP is supposed to be lockless, and should just not
> > have this problem. So it can't be addressing gup_fast_pgd_range()
> > right?
> …
> > I'm more asking if HIGHPTE even acquires a spinlock anymore
> > as it is supposed to be "fast"/lockless. If it does, it is clearly violating
> > the "fast" promise of the fast GUP API and should not exist.
>
> This is lockless on x86. The problem is ARM's
> arch_kmap_local_high_get(). This is where the lock is from.
Aha that calls down to kmap_high_get() that that issues
lock_kmap_any(flags).
But is it really sound that the "fast" API does this? It feels
like a violation of the whole design of the fast stuff.
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-13 0:27 ` Linus Walleij
@ 2024-12-13 9:11 ` Russell King (Oracle)
2024-12-14 22:11 ` Matthew Wilcox
0 siblings, 1 reply; 26+ messages in thread
From: Russell King (Oracle) @ 2024-12-13 9:11 UTC (permalink / raw)
To: Linus Walleij
Cc: Sebastian Andrzej Siewior, Arnd Bergmann, linux-arm-kernel,
linux-kernel, Arnd Bergmann, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Mark Rutland, Matthew Wilcox, Peter Zijlstra, Steven Rostedt,
David Hildenbrand
On Fri, Dec 13, 2024 at 01:27:00AM +0100, Linus Walleij wrote:
> On Wed, Dec 11, 2024 at 4:22 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > On 2024-12-11 14:29:29 [+0100], Linus Walleij wrote:
> > > So fast GUP is supposed to be lockless, and should just not
> > > have this problem. So it can't be addressing gup_fast_pgd_range()
> > > right?
> > …
> > > I'm more asking if HIGHPTE even acquires a spinlock anymore
> > > as it is supposed to be "fast"/lockless. If it does, it is clearly violating
> > > the "fast" promise of the fast GUP API and should not exist.
> >
> > This is lockless on x86. The problem is ARM's
> > arch_kmap_local_high_get(). This is where the lock is from.
>
> Aha that calls down to kmap_high_get() that that issues
> lock_kmap_any(flags).
>
> But is it really sound that the "fast" API does this? It feels
> like a violation of the whole design of the fast stuff.
If there's no way to do it lockless, then it has to take the lock.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-13 9:11 ` Russell King (Oracle)
@ 2024-12-14 22:11 ` Matthew Wilcox
0 siblings, 0 replies; 26+ messages in thread
From: Matthew Wilcox @ 2024-12-14 22:11 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Linus Walleij, Sebastian Andrzej Siewior, Arnd Bergmann,
linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Mark Rutland, Peter Zijlstra, Steven Rostedt,
David Hildenbrand
On Fri, Dec 13, 2024 at 09:11:57AM +0000, Russell King (Oracle) wrote:
> On Fri, Dec 13, 2024 at 01:27:00AM +0100, Linus Walleij wrote:
> > On Wed, Dec 11, 2024 at 4:22 PM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > > On 2024-12-11 14:29:29 [+0100], Linus Walleij wrote:
> > > > So fast GUP is supposed to be lockless, and should just not
> > > > have this problem. So it can't be addressing gup_fast_pgd_range()
> > > > right?
> > > …
> > > > I'm more asking if HIGHPTE even acquires a spinlock anymore
> > > > as it is supposed to be "fast"/lockless. If it does, it is clearly violating
> > > > the "fast" promise of the fast GUP API and should not exist.
> > >
> > > This is lockless on x86. The problem is ARM's
> > > arch_kmap_local_high_get(). This is where the lock is from.
> >
> > Aha that calls down to kmap_high_get() that that issues
> > lock_kmap_any(flags).
> >
> > But is it really sound that the "fast" API does this? It feels
> > like a violation of the whole design of the fast stuff.
>
> If there's no way to do it lockless, then it has to take the lock.
Well, no, it's allowed to fail. It could trylock and fail if it can't
get the lock.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-10 16:05 ` [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels Arnd Bergmann
2024-12-11 13:29 ` Linus Walleij
@ 2024-12-11 13:48 ` Sebastian Andrzej Siewior
2024-12-11 14:04 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-11 13:48 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Linus Walleij, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt
On 2024-12-10 17:05:54 [+0100], Arnd Bergmann wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> gup_pgd_range() is invoked with disabled interrupts and invokes
> __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range().
s@gup_pgd_range@gup_fast_pgd_range@
s@gup_p4d_range@gup_fast_p4d_range@
The functions got renamed…
> With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get()
> which uses a spinlock_t via lock_kmap_any(). This leads to an
> sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a
> sleeping lock and must not be acquired in atomic context.
>
> The loop in map_new_virtual() uses wait_queue_head_t for wake up which
> also is using a spinlock_t.
>
> Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT
> to allow the use of get_user_pages_fast().
>
> [arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
This version works, too. Thanks.
> ---
> There is an open question about whether HIGHPTE is still needed
> at all, given how rare 32-bit machines with more than 4GB
> are on any architecture. If we instead decide to remove HIGHPTE
> altogether, this patch is no longer needed.
HIGHPTE isn't much about 4GiB+ but about the page-table which is
offloaded to HIGHMEM. Maybe it is more likely to be needed with 4GiB+ of
memory. No idea. X86 had support for up to 64GiB of memory and is the
only architecture supporting HIGHPTE :)
I guess if you have boxes with 4GiB+ and can proof that the performance
improves without HIGHPTE (since you don't have to map the page table).
The question is then how much of low mem has to be used instead and when
does it start to hurt.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-11 13:48 ` Sebastian Andrzej Siewior
@ 2024-12-11 14:04 ` Sebastian Andrzej Siewior
2024-12-11 14:30 ` Arnd Bergmann
2024-12-11 15:55 ` Russell King (Oracle)
0 siblings, 2 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-11 14:04 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Linus Walleij, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt
On 2024-12-11 14:48:11 [+0100], To Arnd Bergmann wrote:
> I guess if you have boxes with 4GiB+ and can proof that the performance
> improves without HIGHPTE (since you don't have to map the page table).
> The question is then how much of low mem has to be used instead and when
> does it start to hurt.
Some numbers have been been documented in commit
14315592009c1 ("x86, mm: Allow highmem user page tables to be disabled at boot time")
and I would like cite:
| We could probably handwave up an argument for a threshold at 16G of total
| RAM.
which means HIGHPTE would make sense with >= 16GiB of memory.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-11 14:04 ` Sebastian Andrzej Siewior
@ 2024-12-11 14:30 ` Arnd Bergmann
2024-12-11 15:55 ` Russell King (Oracle)
1 sibling, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-11 14:30 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Linus Walleij, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Steven Rostedt
On Wed, Dec 11, 2024, at 15:04, Sebastian Andrzej Siewior wrote:
> On 2024-12-11 14:48:11 [+0100], To Arnd Bergmann wrote:
>> I guess if you have boxes with 4GiB+ and can proof that the performance
>> improves without HIGHPTE (since you don't have to map the page table).
>> The question is then how much of low mem has to be used instead and when
>> does it start to hurt.
>
> Some numbers have been been documented in commit
> 14315592009c1 ("x86, mm: Allow highmem user page tables to be
> disabled at boot time")
>
> and I would like cite:
> | We could probably handwave up an argument for a threshold at 16G of total
> | RAM.
>
> which means HIGHPTE would make sense with >= 16GiB of memory.
Very useful, thanks!
On x86, that means we can definitely remove HIGHPTE along with
CONFIG_HIGHMEM64G on x86.
On 32-bit ARM, we still need to support LPAE for systems that
require 64-bit addressing. LPAE supports 36 bits of addressing
(up to 64GB), but the largest actual size I've seen mentioned
is 16GB (Hisilicon HiP04, Calxeda Midway servers) and I'm
certain nobody actually requires these to perform well
given that they are no longer useful for the workloads they
were designed for.
There are also a small number of embedded systems with 8GB
(Ti Keystone2, NVidia Tegra3, Marvell Armada XP), but they
are rare enough that turning off HIGHPTE is completely safe.
Arnd
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-11 14:04 ` Sebastian Andrzej Siewior
2024-12-11 14:30 ` Arnd Bergmann
@ 2024-12-11 15:55 ` Russell King (Oracle)
2024-12-20 14:37 ` Arnd Bergmann
1 sibling, 1 reply; 26+ messages in thread
From: Russell King (Oracle) @ 2024-12-11 15:55 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, Arnd Bergmann,
linux-mm, linux-rt-devel, Ard Biesheuvel, Clark Williams,
Jason Baron, Josh Poimboeuf, Linus Walleij, Mark Rutland,
Matthew Wilcox, Peter Zijlstra, Steven Rostedt
On Wed, Dec 11, 2024 at 03:04:02PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-12-11 14:48:11 [+0100], To Arnd Bergmann wrote:
> > I guess if you have boxes with 4GiB+ and can proof that the performance
> > improves without HIGHPTE (since you don't have to map the page table).
> > The question is then how much of low mem has to be used instead and when
> > does it start to hurt.
>
> Some numbers have been been documented in commit
> 14315592009c1 ("x86, mm: Allow highmem user page tables to be disabled at boot time")
>
> and I would like cite:
> | We could probably handwave up an argument for a threshold at 16G of total
> | RAM.
>
> which means HIGHPTE would make sense with >= 16GiB of memory.
However, there is more to consider.
32-bit Arm works out at the same for this:
Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.
because, presumably, x86 uses 8 bytes per PTE entry, whereas on Arm we
still use 4 bytes, but because we keep two copies of a PTE (one for
hardware, the other for the kernel) it works out that we're the same
there - one PTE page can also map 2M of RAM.
However, what is quite different is the L1 page tables. On x86,
everything is nice and easy, and each page table is one 4k page.
On 32-bit Arm, this is not the case - we need to grab a 16k page for
the L1, and the more immovable allocations we have in lowmem, the
harder it will be to satisfy this. Failing to grab a 16k page
leads to fork() failing and an unusable system.
So, we want to keep as many immovable allocations out of lowmem as
possible - which is an additional constraint x86 doesn't have, and
shouldn't be overlooked without ensuring that the probability of it
happening remains acceptably low.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels
2024-12-11 15:55 ` Russell King (Oracle)
@ 2024-12-20 14:37 ` Arnd Bergmann
0 siblings, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-20 14:37 UTC (permalink / raw)
To: Russell King, Sebastian Andrzej Siewior
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Linus Walleij, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Steven Rostedt
TL;DR: I would still like to merge patch 3 and 4, but let's
keep that separate from the RT patches.
(just realized I had never sent my reply here)
On Wed, Dec 11, 2024, at 16:55, Russell King (Oracle) wrote:
> On Wed, Dec 11, 2024 at 03:04:02PM +0100, Sebastian Andrzej Siewior wrote:
>
> However, what is quite different is the L1 page tables. On x86,
> everything is nice and easy, and each page table is one 4k page.
> On 32-bit Arm, this is not the case - we need to grab a 16k page for
> the L1, and the more immovable allocations we have in lowmem, the
> harder it will be to satisfy this. Failing to grab a 16k page
> leads to fork() failing and an unusable system.
>
> So, we want to keep as many immovable allocations out of lowmem as
> possible - which is an additional constraint x86 doesn't have, and
> shouldn't be overlooked without ensuring that the probability of it
> happening remains acceptably low.
Right, non-LPAE systems indeed have a bigger problem with
fragmentation here, I hadn't thought of that. Most of the
systems with 4GB RAM (and all of those with more) support
LPAE though, which means they can use the same page table
layout as x86 when when using CONFIG_LPAE=y.
Almost all SoCs prior to ARMv7VE are limited to 2GB of RAM
or less, the only exceptions I can think of are three SoCs
that can use close to 4GB (a small physical address range
is used for MMIO):
- Calxeda highbank
- Hisilicon hip01
- NXP i.MX6D/DP/Q/QP
We can throw the first two under the bus, I'm sure nobody
cares enough, but the i.MX6Q was actually shipped in a
handful of products with 4GB that some users still have:
- Bunnie's Novena Laptop (856 original backers on crowdsupply)
- Solidrun CuBox i4x4 (not the more common i4P)
- Solidrun Hummingboard (early revisions only, later 2GB)
A couple of other i.MX6Q boards were advertised as supporting
"up to 4GB", but if you tried to buy those, the only ones in
stock anywhere seemed to be limited to 1 or 2 GB. Examples:
- armstone A9r4 (dts not upstream)
- DFI FS053 (dts not upstream)
- VAR-SOM-MX6
[Part of the problem here apparently was availability
and cost of qualified 8Gbit DDR3 chips.]
Do you know of any other SoCs or boards in that category?
My feeling so far is that none of these are show-stoppers:
- The 4GB/8GB boards with LPAE don't have the added problem
of fragmentation with order-2 page table allocations.
They can run low on lowmem with CONFIG_VMSPLIT_3G, but
as Sebastian cited, HIGHPTE is not likely to be the
breaking point for them. These were mainly built around
2012-2013 (before 64-bit became available) as high-end
SoCs and are reaching the end of their commercial life.
- The 4GB boards without LPAE are also 10+ years old by now
and were fairly rare even then. These would suffer the
most from the fragmentation though.
- The 2GB boards with or without LPAE can theoretically
avoid HIGHMEM entirely by using CONFIG_VMSPLIT_2G_OPT.
These are still very common today, but at least a third
of the total memory is going to be available as lowmem
even with VMSPLIT_3G.
The case that is less clear to me is the one where memory
is sparse enough to lead to exhausting lowmem without
having a lot of total RAM. These are a little harder
to find, but I think e.g. Renesas has some chips that
need this, and Realview PBX had a custom __phys_to_virt
in earlier versions to work around this. This should
be solved when we get the "densemem" support that Linus
mentioned.
Arnd
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-10 16:05 [PATCH 0/4] ARM: towards 32-bit preempt-rt support Arnd Bergmann
2024-12-10 16:05 ` [PATCH 1/4] ARM: Disable jump-label on PREEMPT_RT Arnd Bergmann
2024-12-10 16:05 ` [PATCH 2/4] ARM: Disable HIGHPTE on PREEMPT_RT kernels Arnd Bergmann
@ 2024-12-10 16:05 ` Arnd Bergmann
2024-12-11 13:32 ` Linus Walleij
` (2 more replies)
2024-12-10 16:05 ` [PATCH 4/4] mm: drop HIGHPTE support altogether Arnd Bergmann
3 siblings, 3 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-10 16:05 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, Arnd Bergmann, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Linus Walleij, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
From: Arnd Bergmann <arnd@arndb.de>
CONFIG_HIGHPTE was added in linux-2.6.32, a few years before 64-bit
support. At the time it made sense, as the CONFIG_ARM_LPAE option allowed
systems with 16GB of memory that made lowmem a particularly scarce
resource, and the HIGHPTE implementation gave feature parity with 32-bit
x86 and frv machines.
Since Arm is the last architecture remaining that uses this, and almost
no 32-bit machines support more than 4GB of RAM, the cost of continuing
to maintain HIGHPTE seems unjustified, so remove it here to allow
simplifying the generic page table handling.
Link: https://lore.kernel.org/lkml/20241204103042.1904639-8-arnd@kernel.org/T/#u
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
I sent a patch to drop HIGHPTE support on x86 today, see
https://lore.kernel.org/lkml/20241210144945.2325330-9-arnd@kernel.org/T/#u
If that one gets merged, we can merge this one instead of the one
that makes HIGHPTE depend on !PREEMPT_RT, but if we decide against
the x86 change, then we probably don't want this one either.
---
arch/arm/Kconfig | 11 -----------
arch/arm/include/asm/pgalloc.h | 8 +-------
2 files changed, 1 insertion(+), 18 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4de4e5697bdf..e132effafd8b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1229,17 +1229,6 @@ config HIGHMEM
If unsure, say n.
-config HIGHPTE
- bool "Allocate 2nd-level pagetables from highmem" if EXPERT
- depends on HIGHMEM && !PREEMPT_RT
- default y
- help
- The VM uses one page of physical memory for each page table.
- For systems with a lot of processes, this can use a lot of
- precious low memory, eventually leading to low memory being
- consumed by page tables. Setting this option will allow
- user-space 2nd level page tables to reside in high memory.
-
config ARM_PAN
bool "Enable privileged no-access"
depends on MMU
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index a17f01235c29..ef6cb3e6d179 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -85,18 +85,12 @@ pte_alloc_one_kernel(struct mm_struct *mm)
return pte;
}
-#ifdef CONFIG_HIGHPTE
-#define PGTABLE_HIGHMEM __GFP_HIGHMEM
-#else
-#define PGTABLE_HIGHMEM 0
-#endif
-
static inline pgtable_t
pte_alloc_one(struct mm_struct *mm)
{
struct page *pte;
- pte = __pte_alloc_one(mm, GFP_PGTABLE_USER | PGTABLE_HIGHMEM);
+ pte = __pte_alloc_one(mm, GFP_PGTABLE_USER);
if (!pte)
return NULL;
if (!PageHighMem(pte))
--
2.39.5
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-10 16:05 ` [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support Arnd Bergmann
@ 2024-12-11 13:32 ` Linus Walleij
2024-12-11 13:50 ` Russell King (Oracle)
2024-12-11 14:25 ` Sebastian Andrzej Siewior
2024-12-14 18:40 ` David Laight
2 siblings, 1 reply; 26+ messages in thread
From: Linus Walleij @ 2024-12-11 13:32 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
On Tue, Dec 10, 2024 at 5:06 PM Arnd Bergmann <arnd@kernel.org> wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> CONFIG_HIGHPTE was added in linux-2.6.32, a few years before 64-bit
> support. At the time it made sense, as the CONFIG_ARM_LPAE option allowed
> systems with 16GB of memory that made lowmem a particularly scarce
> resource, and the HIGHPTE implementation gave feature parity with 32-bit
> x86 and frv machines.
>
> Since Arm is the last architecture remaining that uses this, and almost
> no 32-bit machines support more than 4GB of RAM, the cost of continuing
> to maintain HIGHPTE seems unjustified, so remove it here to allow
> simplifying the generic page table handling.
>
> Link: https://lore.kernel.org/lkml/20241204103042.1904639-8-arnd@kernel.org/T/#u
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I'm in favor of this if the x86 patch goes in. We need to get rid
of highmem anyway and this will need to happen sooner or later
either way.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-11 13:32 ` Linus Walleij
@ 2024-12-11 13:50 ` Russell King (Oracle)
2024-12-11 14:31 ` Linus Walleij
0 siblings, 1 reply; 26+ messages in thread
From: Russell King (Oracle) @ 2024-12-11 13:50 UTC (permalink / raw)
To: Linus Walleij
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, Arnd Bergmann,
linux-mm, linux-rt-devel, Ard Biesheuvel, Clark Williams,
Jason Baron, Josh Poimboeuf, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Sebastian Andrzej Siewior, Steven Rostedt
On Wed, Dec 11, 2024 at 02:32:51PM +0100, Linus Walleij wrote:
> On Tue, Dec 10, 2024 at 5:06 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > CONFIG_HIGHPTE was added in linux-2.6.32, a few years before 64-bit
> > support. At the time it made sense, as the CONFIG_ARM_LPAE option allowed
> > systems with 16GB of memory that made lowmem a particularly scarce
> > resource, and the HIGHPTE implementation gave feature parity with 32-bit
> > x86 and frv machines.
> >
> > Since Arm is the last architecture remaining that uses this, and almost
> > no 32-bit machines support more than 4GB of RAM, the cost of continuing
> > to maintain HIGHPTE seems unjustified, so remove it here to allow
> > simplifying the generic page table handling.
> >
> > Link: https://lore.kernel.org/lkml/20241204103042.1904639-8-arnd@kernel.org/T/#u
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> I'm in favor of this if the x86 patch goes in. We need to get rid
> of highmem anyway and this will need to happen sooner or later
> either way.
Well... I use highmem routinely.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-11 13:50 ` Russell King (Oracle)
@ 2024-12-11 14:31 ` Linus Walleij
0 siblings, 0 replies; 26+ messages in thread
From: Linus Walleij @ 2024-12-11 14:31 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, Arnd Bergmann,
linux-mm, linux-rt-devel, Ard Biesheuvel, Clark Williams,
Jason Baron, Josh Poimboeuf, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Sebastian Andrzej Siewior, Steven Rostedt
On Wed, Dec 11, 2024 at 2:51 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> On Wed, Dec 11, 2024 at 02:32:51PM +0100, Linus Walleij wrote:
> > On Tue, Dec 10, 2024 at 5:06 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > > From: Arnd Bergmann <arnd@arndb.de>
> > >
> > > CONFIG_HIGHPTE was added in linux-2.6.32, a few years before 64-bit
> > > support. At the time it made sense, as the CONFIG_ARM_LPAE option allowed
> > > systems with 16GB of memory that made lowmem a particularly scarce
> > > resource, and the HIGHPTE implementation gave feature parity with 32-bit
> > > x86 and frv machines.
> > >
> > > Since Arm is the last architecture remaining that uses this, and almost
> > > no 32-bit machines support more than 4GB of RAM, the cost of continuing
> > > to maintain HIGHPTE seems unjustified, so remove it here to allow
> > > simplifying the generic page table handling.
> > >
> > > Link: https://lore.kernel.org/lkml/20241204103042.1904639-8-arnd@kernel.org/T/#u
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> >
> > I'm in favor of this if the x86 patch goes in. We need to get rid
> > of highmem anyway and this will need to happen sooner or later
> > either way.
>
> Well... I use highmem routinely.
Oh I don't mean we should get rid of it without any replacement. Certainly
systems with big physical memories need to be usable.
I am pursuing two ideas (inspired by Arnd and MM people):
1. The easy option - "densemem", on systems with a "hole" in the physical
memory making the 1:1 linear phys-to-virt map run out too soon and
overconsume virual memory, actually collect the physical memory on low
virtual addresses by elaborate phys-to-virt virt-to-phys and page
numbering that isn't 1:1.
2. The hard option - 4G-by-4G splitting, making the kernel and userspace
virtual memory spaces separate as it is in hardware on S/390, so the
kernel can use a while 4G of memory for its needs. I banged my head
against this a fair amount of time, so I might be incompetent to do it,
but I still try.
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-10 16:05 ` [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support Arnd Bergmann
2024-12-11 13:32 ` Linus Walleij
@ 2024-12-11 14:25 ` Sebastian Andrzej Siewior
2024-12-14 18:40 ` David Laight
2 siblings, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-11 14:25 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Linus Walleij, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt
On 2024-12-10 17:05:55 [+0100], Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> CONFIG_HIGHPTE was added in linux-2.6.32, a few years before 64-bit
> support. At the time it made sense, as the CONFIG_ARM_LPAE option allowed
> systems with 16GB of memory that made lowmem a particularly scarce
> resource, and the HIGHPTE implementation gave feature parity with 32-bit
> x86 and frv machines.
>
> Since Arm is the last architecture remaining that uses this, and almost
> no 32-bit machines support more than 4GB of RAM, the cost of continuing
> to maintain HIGHPTE seems unjustified, so remove it here to allow
> simplifying the generic page table handling.
>
> Link: https://lore.kernel.org/lkml/20241204103042.1904639-8-arnd@kernel.org/T/#u
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> I sent a patch to drop HIGHPTE support on x86 today, see
> https://lore.kernel.org/lkml/20241210144945.2325330-9-arnd@kernel.org/T/#u
>
> If that one gets merged, we can merge this one instead of the one
> that makes HIGHPTE depend on !PREEMPT_RT, but if we decide against
> the x86 change, then we probably don't want this one either.
Based on what I have written in 20241211140402.yf7gMExr@linutronix.de it
makes sense.
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-10 16:05 ` [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support Arnd Bergmann
2024-12-11 13:32 ` Linus Walleij
2024-12-11 14:25 ` Sebastian Andrzej Siewior
@ 2024-12-14 18:40 ` David Laight
2024-12-20 13:10 ` Linus Walleij
2 siblings, 1 reply; 26+ messages in thread
From: David Laight @ 2024-12-14 18:40 UTC (permalink / raw)
To: 'Arnd Bergmann', linux-arm-kernel
Cc: linux-kernel, Arnd Bergmann, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Linus Walleij, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
From: Arnd Bergmann
> Sent: 10 December 2024 16:06
...
> Since Arm is the last architecture remaining that uses this, and almost
> no 32-bit machines support more than 4GB of RAM, the cost of continuing
> to maintain HIGHPTE seems unjustified, so remove it here to allow
> simplifying the generic page table handling.
'Picking at nits' 'highmem' support was needed for systems with 4GB of RAM
in order to use more than 3GB or 3.5GB (depending on the bios) because
of the physical addresses that are reserved for PCI (and other MMIO).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-14 18:40 ` David Laight
@ 2024-12-20 13:10 ` Linus Walleij
2024-12-20 14:30 ` Arnd Bergmann
0 siblings, 1 reply; 26+ messages in thread
From: Linus Walleij @ 2024-12-20 13:10 UTC (permalink / raw)
To: David Laight
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, Arnd Bergmann,
linux-mm, linux-rt-devel, Ard Biesheuvel, Clark Williams,
Jason Baron, Josh Poimboeuf, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Sebastian Andrzej Siewior,
Steven Rostedt
On Sat, Dec 14, 2024 at 7:41 PM David Laight <David.Laight@aculab.com> wrote:
> From: Arnd Bergmann
> > Sent: 10 December 2024 16:06
> ...
> > Since Arm is the last architecture remaining that uses this, and almost
> > no 32-bit machines support more than 4GB of RAM, the cost of continuing
> > to maintain HIGHPTE seems unjustified, so remove it here to allow
> > simplifying the generic page table handling.
>
> 'Picking at nits' 'highmem' support was needed for systems with 4GB of RAM
> in order to use more than 3GB or 3.5GB (depending on the bios) because
> of the physical addresses that are reserved for PCI (and other MMIO).
Wow I didn't know that, there are so many reasons why highmem is used
by different architectures.
On ARM it was originally added for a certain Marvell system with
a mere 2GB of RAM:
commit 053a96ca11a9785a7e63fc89eed4514a6446ec58
The reason was that since the virtual address space is just 4GB and
we have reserved virtual kernel memory from (typically) 0xc0000000
only ~1GB can be linearly accessed by the kernel (actually less than
that).
This wasn't a problem since no ARM system was using more than
1GB until Nico ran into it.
So the ARM "high memory" is something to do with virtual memory
size rather than physical memory reservations as in the x86 case.
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support
2024-12-20 13:10 ` Linus Walleij
@ 2024-12-20 14:30 ` Arnd Bergmann
0 siblings, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-20 14:30 UTC (permalink / raw)
To: Linus Walleij, David Laight
Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
On Fri, Dec 20, 2024, at 14:10, Linus Walleij wrote:
> On Sat, Dec 14, 2024 at 7:41 PM David Laight <David.Laight@aculab.com> wrote:
>
>> From: Arnd Bergmann
>> > Sent: 10 December 2024 16:06
>> ...
>> > Since Arm is the last architecture remaining that uses this, and almost
>> > no 32-bit machines support more than 4GB of RAM, the cost of continuing
>> > to maintain HIGHPTE seems unjustified, so remove it here to allow
>> > simplifying the generic page table handling.
>>
>> 'Picking at nits' 'highmem' support was needed for systems with 4GB of RAM
>> in order to use more than 3GB or 3.5GB (depending on the bios) because
>> of the physical addresses that are reserved for PCI (and other MMIO).
What you mean here is CONFIG_HIGHMEM64G on x86, not CONFIG_HIGHMEM
or CONFIG_HIGHPTE.
> Wow I didn't know that, there are so many reasons why highmem is used
> by different architectures.
>
> On ARM it was originally added for a certain Marvell system with
> a mere 2GB of RAM:
> commit 053a96ca11a9785a7e63fc89eed4514a6446ec58
>
> The reason was that since the virtual address space is just 4GB and
> we have reserved virtual kernel memory from (typically) 0xc0000000
> only ~1GB can be linearly accessed by the kernel (actually less than
> that).
>
> This wasn't a problem since no ARM system was using more than
> 1GB until Nico ran into it.
>
> So the ARM "high memory" is something to do with virtual memory
> size rather than physical memory reservations as in the x86 case.
HIGHMEM works the exact same way on the major 32-bit architectures
(x86, arm, powerpc): With the default TASK_SIZE and PAGE_OFFSET
of set to 3GB (0xc0000000) you have 1GB left that is use for
the linear lowmem and the vmalloc area, leaving between 786MB
and 900MB for lowmem, unless you shrink TASK_SIZE (CONFIG_VMSPLIT_*).
The most common systems (Intel Pentium M, PowerPC 74xx, Arm
Cortex-A9) have a 4GB address limit and have to fit both RAM and
MMIO or PCI into that, which is where the 3GB (Pentium M) through
3.9GB (i.MX6, Calxeda Highbank) total-RAM limit comes from.
The 4GB physical address limit is broken by CONFIG_PHYS_ADDR_T_64BIT
on some x86 server chipsets (Serverworks GC, HP F8, IBM Summit,
Intel 450GX, ...), Arm Cortex-A15 and PowerPC e500/e600 cores,
which gets you to the point where you fill up all your lowmem
before using all of highmem.
HIGHPTE was added when x86 servers to to 32GB, at which point
the lowmem is mostly filled with page tables for typical
workloads.
Other architectures (mips, sparc, arc, ...) have some but not
all of the above, HIGHPTE was only ever a thing on x86,
arm and frv.
Arnd
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 4/4] mm: drop HIGHPTE support altogether
2024-12-10 16:05 [PATCH 0/4] ARM: towards 32-bit preempt-rt support Arnd Bergmann
` (2 preceding siblings ...)
2024-12-10 16:05 ` [PATCH 3/4] ARM: drop CONFIG_HIGHPTE support Arnd Bergmann
@ 2024-12-10 16:05 ` Arnd Bergmann
2024-12-11 13:53 ` Linus Walleij
2024-12-11 14:29 ` Sebastian Andrzej Siewior
3 siblings, 2 replies; 26+ messages in thread
From: Arnd Bergmann @ 2024-12-10 16:05 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, Arnd Bergmann, linux-mm, linux-rt-devel,
Ard Biesheuvel, Clark Williams, Jason Baron, Josh Poimboeuf,
Linus Walleij, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
From: Arnd Bergmann <arnd@arndb.de>
With both x86 and arm having dropped CONFIG_HIGHPTE support, no
architecture is left using it, so remove the remnants in common code.
It is likely that further cleanups are possible in the page table
code but those are not obvious from the config options.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
include/linux/hugetlb.h | 5 +----
include/linux/mm.h | 1 -
include/linux/pgtable.h | 9 ---------
3 files changed, 1 insertion(+), 14 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ae4fe8615bb6..5369a269dd39 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -176,7 +176,6 @@ extern struct list_head huge_boot_pages[MAX_NUMNODES];
/* arch callbacks */
-#ifndef CONFIG_HIGHPTE
/*
* pte_offset_huge() and pte_alloc_huge() are helpers for those architectures
* which may go down to the lowest PTE level in their huge_pte_offset() and
@@ -191,7 +190,6 @@ static inline pte_t *pte_alloc_huge(struct mm_struct *mm, pmd_t *pmd,
{
return pte_alloc(mm, pmd) ? NULL : pte_offset_huge(pmd, address);
}
-#endif
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz);
@@ -966,9 +964,8 @@ static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
*/
if (size >= PUD_SIZE)
return pud_lockptr(mm, (pud_t *) pte);
- else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))
+ else if (size >= PMD_SIZE)
return pmd_lockptr(mm, (pmd_t *) pte);
- /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
return ptep_lockptr(mm, pte);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f56f81d5e244..6353fd939702 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2954,7 +2954,6 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
{
- BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE);
return ptlock_ptr(virt_to_ptdesc(pte));
}
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index adef9d6e9b1b..23be8776bd5e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -119,14 +119,6 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
#define pte_offset_kernel pte_offset_kernel
#endif
-#ifdef CONFIG_HIGHPTE
-#define __pte_map(pmd, address) \
- ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address)))
-#define pte_unmap(pte) do { \
- kunmap_local((pte)); \
- rcu_read_unlock(); \
-} while (0)
-#else
static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address)
{
return pte_offset_kernel(pmd, address);
@@ -135,7 +127,6 @@ static inline void pte_unmap(pte_t *pte)
{
rcu_read_unlock();
}
-#endif
void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable);
--
2.39.5
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 4/4] mm: drop HIGHPTE support altogether
2024-12-10 16:05 ` [PATCH 4/4] mm: drop HIGHPTE support altogether Arnd Bergmann
@ 2024-12-11 13:53 ` Linus Walleij
2024-12-11 14:29 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 26+ messages in thread
From: Linus Walleij @ 2024-12-11 13:53 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Mark Rutland, Matthew Wilcox, Peter Zijlstra,
Russell King, Sebastian Andrzej Siewior, Steven Rostedt
On Tue, Dec 10, 2024 at 5:06 PM Arnd Bergmann <arnd@kernel.org> wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> With both x86 and arm having dropped CONFIG_HIGHPTE support, no
> architecture is left using it, so remove the remnants in common code.
>
> It is likely that further cleanups are possible in the page table
> code but those are not obvious from the config options.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Mapping PTEs in highmem makes no sense to me, it's better
that we try other solutions to move away from highmem use
altogether.
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/4] mm: drop HIGHPTE support altogether
2024-12-10 16:05 ` [PATCH 4/4] mm: drop HIGHPTE support altogether Arnd Bergmann
2024-12-11 13:53 ` Linus Walleij
@ 2024-12-11 14:29 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 26+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-12-11 14:29 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-arm-kernel, linux-kernel, Arnd Bergmann, linux-mm,
linux-rt-devel, Ard Biesheuvel, Clark Williams, Jason Baron,
Josh Poimboeuf, Linus Walleij, Mark Rutland, Matthew Wilcox,
Peter Zijlstra, Russell King, Steven Rostedt
On 2024-12-10 17:05:56 [+0100], Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> With both x86 and arm having dropped CONFIG_HIGHPTE support, no
> architecture is left using it, so remove the remnants in common code.
>
> It is likely that further cleanups are possible in the page table
> code but those are not obvious from the config options.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread