[RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
@ 2025-11-27 14:01 Xie Yuanbin
  2025-11-27 14:01 ` [RFC PATCH v2 2/2] ARM/mm/fault: Enable interrupts before sending signal Xie Yuanbin
  2025-11-27 14:51 ` [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Sebastian Andrzej Siewior
  0 siblings, 2 replies; 11+ messages in thread
From: Xie Yuanbin @ 2025-11-27 14:01 UTC (permalink / raw)
  To: viro, will, nico, rmk+kernel, linux, david.laight, rppt, vbabka,
	pfalcato, brauner, lorenzo.stoakes, kuninori.morimoto.gx, tony,
	arnd, bigeasy, akpm, punitagrawal, hch, jack, rjw, marc.zyngier
  Cc: linux-arm-kernel, linux-mm, linux-kernel, linux-fsdevel, wozizhi,
	liaohua4, lilinjie8, xieyuanbin1, pangliyuan1, wangkefeng.wang

Two bugs are related to this patch.

BUG1:
On arm32, a page fault may cause the current thread to sleep inside
mmap_read_lock_killable(). This can happen even if the addr is a kernel
address.

When opening a file, if the path is initialized with LOOKUP_RCU flag in
path_init(), the rcu read lock will be acquired. Inside the rcu critical
section, load_unaligned_zeropad() may be called.

According to the comments of load_unaligned_zeropad(), when loading the
memory, a page fault may be triggered in the very unlikely case. When
CONFIG_KFENCE=y, page faults are more likely to occur in this scenario.

If CONFIG_PREEMPT_RCU=y, the following warning may be triggered:
```log
[   16.923630] WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x408/0x610, CPU#0: test/68
[   16.924780] Voluntary context switch within RCU read-side critical section!
[   16.924887] Modules linked in:
[   16.925670] CPU: 0 UID: 0 PID: 68 Comm: test Tainted: G        W           6.18.0-rc6-next-20251124 #28 PREEMPT
[   16.926120] Tainted: [W]=WARN
[   16.926257] Hardware name: Generic DT based system
[   16.926474] Call trace:
[   16.926487]  unwind_backtrace from show_stack+0x10/0x14
[   16.926899]  show_stack from dump_stack_lvl+0x50/0x5c
[   16.927318]  dump_stack_lvl from __warn+0xf8/0x200
[   16.927696]  __warn from warn_slowpath_fmt+0x180/0x208
[   16.928060]  warn_slowpath_fmt from rcu_note_context_switch+0x408/0x610
[   16.928768]  rcu_note_context_switch from __schedule+0xe4/0xa58
[   16.928917]  __schedule from schedule+0x70/0x124
[   16.929197]  schedule from schedule_preempt_disabled+0x14/0x20
[   16.929514]  schedule_preempt_disabled from rwsem_down_read_slowpath+0x26c/0x4e4
[   16.929875]  rwsem_down_read_slowpath from down_read_killable+0x58/0x10c
[   16.930320]  down_read_killable from mmap_read_lock_killable+0x24/0x84
[   16.930761]  mmap_read_lock_killable from lock_mm_and_find_vma+0x164/0x18c
[   16.931101]  lock_mm_and_find_vma from do_page_fault+0x1d4/0x4a0
[   16.931354]  do_page_fault from do_DataAbort+0x30/0xa8
[   16.931649]  do_DataAbort from __dabt_svc+0x44/0x60
[   16.931862] Exception stack(0xf0b41d88 to 0xf0b41dd0)
[   16.932063] 1d80:                   c3219088 eec5dffd f0b41ec0 00000002 c3219118 00000010
[   16.933732] 1da0: c321913c 00000002 00007878 c2da86c0 00000000 00000002 b8009440 f0b41ddc
[   16.934019] 1dc0: eec5dffd c0677300 60000013 ffffffff
[   16.934294]  __dabt_svc from __d_lookup_rcu+0xc4/0x10c
[   16.934468]  __d_lookup_rcu from lookup_fast+0xa0/0x190
[   16.934720]  lookup_fast from path_openat+0x154/0xe18
[   16.934953]  path_openat from do_filp_open+0x94/0x134
[   16.935141]  do_filp_open from do_sys_openat2+0x9c/0xf0
[   16.935384]  do_sys_openat2 from sys_openat+0x80/0xa0
[   16.935547]  sys_openat from ret_fast_syscall+0x0/0x4c
[   16.935799] Exception stack(0xf0b41fa8 to 0xf0b41ff0)
[   16.936007] 1fa0:                   00000000 00000000 ffffff9c beb27d0c 00000242 000001b6
[   16.936293] 1fc0: 00000000 00000000 000c543c 00000142 00027e85 00000002 00000002 00000000
[   16.936624] 1fe0: beb27c20 beb27c0c 0006ea80 00072e78
[   16.936780] ---[ end trace 0000000000000000 ]---
```

If CONFIG_DEBUG_ATOMIC_SLEEP=y, the following warning will be triggered:
```log
[   16.243462] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1559
[   16.245271] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 68, name: test
[   16.246219] preempt_count: 0, expected: 0
[   16.246582] RCU nest depth: 1, expected: 0
[   16.247262] CPU: 0 UID: 0 PID: 68 Comm: test Not tainted 6.18.0-rc6-next-20251124 #28 PREEMPT
[   16.247432] Hardware name: Generic DT based system
[   16.247549] Call trace:
[   16.247618]  unwind_backtrace from show_stack+0x10/0x14
[   16.248442]  show_stack from dump_stack_lvl+0x50/0x5c
[   16.248458]  dump_stack_lvl from __might_resched+0x174/0x188
[   16.248475]  __might_resched from down_read_killable+0x18/0x10c
[   16.248490]  down_read_killable from mmap_read_lock_killable+0x24/0x84
[   16.248504]  mmap_read_lock_killable from lock_mm_and_find_vma+0x164/0x18c
[   16.248516]  lock_mm_and_find_vma from do_page_fault+0x1d4/0x4a0
[   16.248529]  do_page_fault from do_DataAbort+0x30/0xa8
[   16.248549]  do_DataAbort from __dabt_svc+0x44/0x60
[   16.248597] Exception stack(0xf0b41da0 to 0xf0b41de8)
[   16.248675] 1da0: c20b34f0 c3f23bf8 00000000 c389be50 f0b41e90 00000501 61c88647 00000000
[   16.248698] 1dc0: 80808080 fefefeff 2f2f2f2f eec51ffd c3219088 f0b41df0 c066d3e4 c066d218
[   16.248705] 1de0: 60000013 ffffffff
[   16.248736]  __dabt_svc from link_path_walk+0xa8/0x444
[   16.248752]  link_path_walk from path_openat+0xac/0xe18
[   16.248764]  path_openat from do_filp_open+0x94/0x134
[   16.248775]  do_filp_open from do_sys_openat2+0x9c/0xf0
[   16.248785]  do_sys_openat2 from sys_openat+0x80/0xa0
[   16.248806]  sys_openat from ret_fast_syscall+0x0/0x4c
[   16.248814] Exception stack(0xf0b41fa8 to 0xf0b41ff0)
[   16.248825] 1fa0:                   00000000 00000000 ffffff9c beb27d0c 00000242 000001b6
[   16.248834] 1fc0: 00000000 00000000 000c543c 00000142 00027e85 00000002 00000002 00000000
[   16.248841] 1fe0: beb27c20 beb27c0c 0006ea80 00072e78
```

BUG2:
When a user program try to access any valid kernel address and attacks
the kernel, it may run into the do_page_fault(). Before
harden_branch_predictor(), the thread might be migrated to another cpu,
which causes the mitigation meaningless.

If CONFIG_PREEMPT=y, CONFIG_DEBUG_PREEMPT=y, CONFIG_ARM_LPAE=y,
the following warning will be triggered:
```log
[    1.089103] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
[    1.093367] caller is __do_user_fault+0x20/0x6c
[    1.094355] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.14.3 #7
[    1.094585] Hardware name: Generic DT based system
[    1.094706] Call trace:
[    1.095211]  unwind_backtrace from show_stack+0x10/0x14
[    1.095329]  show_stack from dump_stack_lvl+0x50/0x5c
[    1.095352]  dump_stack_lvl from check_preemption_disabled+0x104/0x108
[    1.095448]  check_preemption_disabled from __do_user_fault+0x20/0x6c
[    1.095459]  __do_user_fault from do_page_fault+0x334/0x3dc
[    1.095505]  do_page_fault from do_DataAbort+0x30/0xa8
[    1.095528]  do_DataAbort from __dabt_usr+0x54/0x60
[    1.095570] Exception stack(0xf0825fb0 to 0xf0825ff8)
```

Always goto bad_area before local_irq_enable() to handle these two
scenarios, just like what x86 does.

Fixes: b9a50f74905a ("ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs")
Fixes: f5fe12b1eaee ("ARM: spectre-v2: harden user aborts in kernel space")

Closes: https://lore.kernel.org/20251126090505.3057219-1-wozizhi@huaweicloud.com
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Co-developed-by: Liyuan Pang <pangliyuan1@huawei.com>
Signed-off-by: Liyuan Pang <pangliyuan1@huawei.com>
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
V1->V2: https://lore.kernel.org/20251126101952.174467-1-xieyuanbin1@huawei.com
  - Fix the bug in arm/mm, instead of vfs
  - Update git message
  - Also fix https://lore.kernel.org/20251016121622.8957-1-xieyuanbin1@huawei.com

For this patch, the only thing I'm unsure about is `if (fsr & FSR_LNX_PF)`.
I'm not sure whether skipping this check might have some side effects.
This patch also skips local_irq_enable() when addr >= TASK_SIZE, but I
think it is ok, __do_kernel_fault() can be called with interrupts
disabled, and both do_bad_area() and do_sect_fault() do this.

Test cases for reproduction:
kernel source: latest linux-next branch, use default arm32's
multi_v7_defconfig, and setting CONFIG_PREEMPT=y, CONFIG_DEBUG_PREEMPT=y,
CONFIG_ARM_LPAE=y, CONFIG_KFENCE=y, CONFIG_DEBUG_ATOMIC_SLEEP=y,
CONFIG_ARM_PAN=n.

BUG1:
```c
static void *thread(void *arg)
{
	while (1) {
		void *p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);

		assert(p != (void *)-1);
		__asm__ volatile ("":"+r"(p)::"memory");

		munmap(p, 4096);
	}
}

int main(void)
{
	pthread_t th;
	int ret;
	char path[4096] = "/tmp";

	for (size_t i = 0; i < 2044; ++i) {
		strcat(path, "/x");
		ret = mkdir(path, 0755);
		assert(ret == 0 || errno == EEXIST);
	}
	strcat(path, "/xx");

	assert(strlen(path) == 4095);

	assert(pthread_create(&th, NULL, thread, NULL) == 0);

	while (1) {
		FILE *fp = fopen(path, "wb+");

		assert(fp);
		fclose(fp);
	}
	return 0;
}
```

BUG2:
```c
static void han(int x)
{
	while (1);
}

int main(void)
{
	signal(SIGSEGV, han);
	/* 0xc0331fd4 is just a kernel address in kernel .text section */
	__asm__ volatile (""::"r"(*(int *)(uintptr_t)0xc0331fd4):"memory");
	while (1);
	return 0;
}
```

 arch/arm/mm/fault.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2bc828a1940c..5c58072d8235 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -270,10 +270,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	vm_flags_t vm_flags = VM_ACCESS_FLAGS;
 
 	if (kprobe_page_fault(regs, fsr))
 		return 0;
 
+	if (unlikely(addr >= TASK_SIZE)) {
+		fault = 0;
+		code = SEGV_MAPERR;
+		goto bad_area;
+	}
 
 	/* Enable interrupts if they were enabled in the parent context. */
 	if (interrupts_enabled(regs))
 		local_irq_enable();
 
-- 
2.51.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH v2 2/2] ARM/mm/fault: Enable interrupts before sending signal
  2025-11-27 14:01 [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Xie Yuanbin
@ 2025-11-27 14:01 ` Xie Yuanbin
  2025-11-27 14:49   ` Sebastian Andrzej Siewior
  2025-11-27 14:51 ` [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Sebastian Andrzej Siewior
  1 sibling, 1 reply; 11+ messages in thread
From: Xie Yuanbin @ 2025-11-27 14:01 UTC (permalink / raw)
  To: viro, will, nico, rmk+kernel, linux, david.laight, rppt, vbabka,
	pfalcato, brauner, lorenzo.stoakes, kuninori.morimoto.gx, tony,
	arnd, bigeasy, akpm, punitagrawal, hch, jack, rjw, marc.zyngier
  Cc: linux-arm-kernel, linux-mm, linux-kernel, linux-fsdevel, wozizhi,
	liaohua4, lilinjie8, xieyuanbin1, pangliyuan1, wangkefeng.wang

From: xieyuanbin1 <xieyuanbin1@huawei.com>

Sending a signal requires to acquire sighand_struct::siglock which is a
spinlock_t. On PREEMPT_RT spinlock_t becomes a sleeping spin lock which
requires interrupts to be enabled. Since the calling context is user
land, interrupts must have been enabled so it is fine to enable them in
this case.

Signed-off-by: xieyuanbin1 <xieyuanbin1@huawei.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
This patch depends on patch
Link: https://lore.kernel.org/20251029155918.503060-6-bigeasy@linutronix.de

The commit message is copy from:
Link: https://lore.kernel.org/20251029155918.503060-3-bigeasy@linutronix.de

Maybe I should add:
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Or something else? Thanks!

 arch/arm/mm/fault.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 5c58072d8235..f8ee1854c854 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -184,10 +184,13 @@ __do_user_fault(unsigned long addr, unsigned int fsr, unsigned int sig,
 	struct task_struct *tsk = current;
 
 	if (addr > TASK_SIZE)
 		harden_branch_predictor();
 
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_enable();
+
 #ifdef CONFIG_DEBUG_USER
 	if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
 	    ((user_debug & UDBG_BUS)  && (sig == SIGBUS))) {
 		pr_err("8<--- cut here ---\n");
 		pr_err("%s: unhandled page fault (%d) at 0x%08lx, code 0x%03x\n",
-- 
2.51.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 2/2] ARM/mm/fault: Enable interrupts before sending signal
  2025-11-27 14:01 ` [RFC PATCH v2 2/2] ARM/mm/fault: Enable interrupts before sending signal Xie Yuanbin
@ 2025-11-27 14:49   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-27 14:49 UTC (permalink / raw)
  To: Xie Yuanbin
  Cc: viro, will, nico, rmk+kernel, linux, david.laight, rppt, vbabka,
	pfalcato, brauner, lorenzo.stoakes, kuninori.morimoto.gx, tony,
	arnd, akpm, punitagrawal, hch, jack, rjw, marc.zyngier,
	linux-arm-kernel, linux-mm, linux-kernel, linux-fsdevel, wozizhi,
	liaohua4, lilinjie8, pangliyuan1, wangkefeng.wang

On 2025-11-27 22:01:09 [+0800], Xie Yuanbin wrote:
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -184,10 +184,13 @@ __do_user_fault(unsigned long addr, unsigned int fsr, unsigned int sig,
>  	struct task_struct *tsk = current;
>  
>  	if (addr > TASK_SIZE)
>  		harden_branch_predictor();
>  
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +		local_irq_enable();

This shouldn't be limited to CONFIG_PREEMPT_RT. There is nothing wrong
with enabling it unconditionally.

>  #ifdef CONFIG_DEBUG_USER
>  	if (((user_debug & UDBG_SEGV) && (sig == SIGSEGV)) ||
>  	    ((user_debug & UDBG_BUS)  && (sig == SIGBUS))) {
>  		pr_err("8<--- cut here ---\n");
>  		pr_err("%s: unhandled page fault (%d) at 0x%08lx, code 0x%03x\n",

Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-27 14:01 [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Xie Yuanbin
  2025-11-27 14:01 ` [RFC PATCH v2 2/2] ARM/mm/fault: Enable interrupts before sending signal Xie Yuanbin
@ 2025-11-27 14:51 ` Sebastian Andrzej Siewior
  2025-11-28  2:27   ` Xie Yuanbin
  1 sibling, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-27 14:51 UTC (permalink / raw)
  To: Xie Yuanbin
  Cc: viro, will, nico, rmk+kernel, linux, david.laight, rppt, vbabka,
	pfalcato, brauner, lorenzo.stoakes, kuninori.morimoto.gx, tony,
	arnd, akpm, punitagrawal, hch, jack, rjw, marc.zyngier,
	linux-arm-kernel, linux-mm, linux-kernel, linux-fsdevel, wozizhi,
	liaohua4, lilinjie8, pangliyuan1, wangkefeng.wang

On 2025-11-27 22:01:08 [+0800], Xie Yuanbin wrote:
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -270,10 +270,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>  	vm_flags_t vm_flags = VM_ACCESS_FLAGS;
>  
>  	if (kprobe_page_fault(regs, fsr))
>  		return 0;
>  
> +	if (unlikely(addr >= TASK_SIZE)) {
> +		fault = 0;
> +		code = SEGV_MAPERR;
> +		goto bad_area;
> +	}
>  
>  	/* Enable interrupts if they were enabled in the parent context. */
>  	if (interrupts_enabled(regs))
>  		local_irq_enable();

What is with the patch I sent wrong?

Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-27 14:51 ` [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Sebastian Andrzej Siewior
@ 2025-11-28  2:27   ` Xie Yuanbin
  2025-11-28 12:03     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Xie Yuanbin @ 2025-11-28  2:27 UTC (permalink / raw)
  To: bigeasy
  Cc: akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, linux, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rmk+kernel, rppt, tony, vbabka, viro, wangkefeng.wang, will,
	wozizhi, xieyuanbin1

On Thu, 27 Nov 2025 15:51:27 +0100, Sebastian Andrzej Siewior wrote:
> What is with the patch I sent wrong?

Hi, Sebastian Andrzej Siewior!

There is nothing wrong with your patches, but when you submitted
your patches, this bug has not been reportted:
Link: https://lore.kernel.org/20251126090505.3057219-1-wozizhi@huaweicloud.com

Your patches fixed the missing mitigation, but the aforementioned bug
still exists. I think there might be a better solution that can fix both
bugs at the same time.

We had some discussions about this bug:
Link: https://lore.kernel.org/CAHk-=wh1Wfwt9OFB4AfBbjyeu4JVZuSWQ4A8OoT3W6x9btddfw@mail.gmail.com
Link: https://lore.kernel.org/20251126192640.GD3538@ZenIV
Link: https://lore.kernel.org/aSeNtFxD1WRjFaiR@shell.armlinux.org.uk

According to the discussion, it might be better to handle the kernel
address fault directly, just like what x86 does, instead of finding VMA.
Link: https://elixir.bootlin.com/linux/v6.18-rc7/source/arch/x86/mm/fault.c#L1473
```c
	if (unlikely(fault_in_kernel_space(address)))
		do_kern_addr_fault(regs, error_code, address);
	else
		do_user_addr_fault(regs, error_code, address);
```

It seems your patches hasn't been merged into the linux-next branch yet.
This patch is based on linux-next, so it doesn't include your
modifications. This patch might conflict with your patch:
Link: https://lore.kernel.org/20251110145555.2555055-2-bigeasy@linutronix.de
so I'd like to discuss it with you.

Thanks!

Xie Yuanbin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-28  2:27   ` Xie Yuanbin
@ 2025-11-28 12:03     ` Sebastian Andrzej Siewior
  2025-11-28 17:01       ` Russell King (Oracle)
  2025-11-29  2:33       ` Xie Yuanbin
  0 siblings, 2 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-28 12:03 UTC (permalink / raw)
  To: Xie Yuanbin
  Cc: akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, linux, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rmk+kernel, rppt, tony, vbabka, viro, wangkefeng.wang, will,
	wozizhi

On 2025-11-28 10:27:56 [+0800], Xie Yuanbin wrote:
> According to the discussion, it might be better to handle the kernel
> address fault directly, just like what x86 does, instead of finding VMA.

the kernel fault shouldn't have a VMA

> Link: https://elixir.bootlin.com/linux/v6.18-rc7/source/arch/x86/mm/fault.c#L1473
> ```c
> 	if (unlikely(fault_in_kernel_space(address)))
> 		do_kern_addr_fault(regs, error_code, address);
> 	else
> 		do_user_addr_fault(regs, error_code, address);
> ```
> 
> It seems your patches hasn't been merged into the linux-next branch yet.

I hope Russell will add them once he gets to it. They got reviewed, I
added them to the patch system.

> This patch is based on linux-next, so it doesn't include your
> modifications. This patch might conflict with your patch:
> Link: https://lore.kernel.org/20251110145555.2555055-2-bigeasy@linutronix.de
> so I'd like to discuss it with you.

what about this:

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index ad58c1e22a5f9..b6b3cd893c808 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -282,10 +282,10 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	}
 
 	/*
-	 * If we're in an interrupt or have no user
-	 * context, we must not take the fault..
+	 * If we're in an interrupt or have no user context, we must not take
+	 * the fault. Kernel addresses are handled in do_translation_fault().
 	 */
-	if (faulthandler_disabled() || !mm)
+	if (faulthandler_disabled() || !mm || addr >= TASK_SIZE)
 		goto no_context;
 
 	if (user_mode(regs))

We shouldn't be getting here. Above TASK_SIZE there are just fix
mappings which don't fault and the VMALLOC array which should be handled
by do_translation_fault(). So this should be only the exception table.

This should also not clash with the previous patches. Would that work
for everyone?

> Thanks!
> 
> Xie Yuanbin

Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-28 12:03     ` Sebastian Andrzej Siewior
@ 2025-11-28 17:01       ` Russell King (Oracle)
  2025-11-28 17:22         ` Sebastian Andrzej Siewior
  2025-11-29  2:33       ` Xie Yuanbin
  1 sibling, 1 reply; 11+ messages in thread
From: Russell King (Oracle) @ 2025-11-28 17:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Xie Yuanbin, akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rppt, tony, vbabka, viro, wangkefeng.wang, will, wozizhi

On Fri, Nov 28, 2025 at 01:03:59PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-11-28 10:27:56 [+0800], Xie Yuanbin wrote:
> > According to the discussion, it might be better to handle the kernel
> > address fault directly, just like what x86 does, instead of finding VMA.
> 
> the kernel fault shouldn't have a VMA
> 
> > Link: https://elixir.bootlin.com/linux/v6.18-rc7/source/arch/x86/mm/fault.c#L1473
> > ```c
> > 	if (unlikely(fault_in_kernel_space(address)))
> > 		do_kern_addr_fault(regs, error_code, address);
> > 	else
> > 		do_user_addr_fault(regs, error_code, address);
> > ```
> > 
> > It seems your patches hasn't been merged into the linux-next branch yet.
> 
> I hope Russell will add them once he gets to it. They got reviewed, I
> added them to the patch system.

I'm not sure which patches you're talking about, but discussion is
still ongoing, so it would be greatly premature to merge anything.

https://lore.kernel.org/r/aSmUnZZATTn3JD7m@willie-the-truck

There are now many threads each with their own discussion, which
makes it more difficult to work out which is the implementation that
should be merged. Clearly, not everyone knows about the other
discussion threads.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-28 17:01       ` Russell King (Oracle)
@ 2025-11-28 17:22         ` Sebastian Andrzej Siewior
  2025-11-28 17:34           ` Russell King (Oracle)
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-28 17:22 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Xie Yuanbin, akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rppt, tony, vbabka, viro, wangkefeng.wang, will, wozizhi

On 2025-11-28 17:01:18 [+0000], Russell King (Oracle) wrote:
> > I hope Russell will add them once he gets to it. They got reviewed, I
> > added them to the patch system.
> 
> I'm not sure which patches you're talking about, but discussion is
> still ongoing, so it would be greatly premature to merge anything.

This thread
	https://lore.kernel.org/all/20251110145555.2555055-1-bigeasy@linutronix.de/

and the patches are 9459/1 to 9463/1 in your patch system. They address
other issues, not this one.

> https://lore.kernel.org/r/aSmUnZZATTn3JD7m@willie-the-truck
> 
> There are now many threads each with their own discussion, which
> makes it more difficult to work out which is the implementation that
> should be merged. Clearly, not everyone knows about the other
> discussion threads.

So Will suggested to let change the handler and handle this case. The
other patch is avoiding handling addr > TASK_SIZE.
Any preferences from your side?

Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-28 17:22         ` Sebastian Andrzej Siewior
@ 2025-11-28 17:34           ` Russell King (Oracle)
  2025-11-30 11:20             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Russell King (Oracle) @ 2025-11-28 17:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Xie Yuanbin, akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rppt, tony, vbabka, viro, wangkefeng.wang, will, wozizhi

On Fri, Nov 28, 2025 at 06:22:42PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-11-28 17:01:18 [+0000], Russell King (Oracle) wrote:
> > > I hope Russell will add them once he gets to it. They got reviewed, I
> > > added them to the patch system.
> > 
> > I'm not sure which patches you're talking about, but discussion is
> > still ongoing, so it would be greatly premature to merge anything.
> 
> This thread
> 	https://lore.kernel.org/all/20251110145555.2555055-1-bigeasy@linutronix.de/
> 
> and the patches are 9459/1 to 9463/1 in your patch system. They address
> other issues, not this one.

Oh, the branch predictor issue. Yea, I'm not keen on changing that
because I'm not sure if it's correct (the knowledge for this has
long since evaporated.) There have been multiple attempts at fixing
this in the past, and I've previously pointed out problems with
them when I _did_ have the knowledge. Have you looked back in the
archives to see whether any of that feedback I've given in the past
is relevant?

> > https://lore.kernel.org/r/aSmUnZZATTn3JD7m@willie-the-truck
> > 
> > There are now many threads each with their own discussion, which
> > makes it more difficult to work out which is the implementation that
> > should be merged. Clearly, not everyone knows about the other
> > discussion threads.
> 
> So Will suggested to let change the handler and handle this case. The
> other patch is avoiding handling addr > TASK_SIZE.
> Any preferences from your side?

... and now we have a new proposal from Linus. I'm not intending to
do anything on this new problem until the discussion calms down and
we stop getting new solutions.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-28 12:03     ` Sebastian Andrzej Siewior
  2025-11-28 17:01       ` Russell King (Oracle)
@ 2025-11-29  2:33       ` Xie Yuanbin
  1 sibling, 0 replies; 11+ messages in thread
From: Xie Yuanbin @ 2025-11-29  2:33 UTC (permalink / raw)
  To: bigeasy
  Cc: akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, linux, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rmk+kernel, rppt, tony, vbabka, viro, wangkefeng.wang, will,
	wozizhi, xieyuanbin1

On Fri, 28 Nov 2025 13:03:59 +0100, Sebastian Andrzej Siewior wrote:
> what about this:
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index ad58c1e22a5f9..b6b3cd893c808 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -282,10 +282,10 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>  	}
>
>  	/*
> -	 * If we're in an interrupt or have no user
> -	 * context, we must not take the fault..
> +	 * If we're in an interrupt or have no user context, we must not take
> +	 * the fault. Kernel addresses are handled in do_translation_fault().
> 	 */
> -	if (faulthandler_disabled() || !mm)
> +	if (faulthandler_disabled() || !mm || addr >= TASK_SIZE)
>  		goto no_context;
>
>  	if (user_mode(regs))
>
> We shouldn't be getting here. Above TASK_SIZE there are just fix
> mappings which don't fault and the VMALLOC array which should be handled
> by do_translation_fault(). So this should be only the exception table.
>
> This should also not clash with the previous patches. Would that work
> for everyone?

When it is user_mode(), it should be goto __do_user_fault(), but
no_context goto __do_kernel_fault(). So I think it is not ok.

> Sebastian

Xie Yuanbin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address
  2025-11-28 17:34           ` Russell King (Oracle)
@ 2025-11-30 11:20             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-30 11:20 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Xie Yuanbin, akpm, arnd, brauner, david.laight, hch, jack,
	kuninori.morimoto.gx, liaohua4, lilinjie8, linux-arm-kernel,
	linux-fsdevel, linux-kernel, linux-mm, lorenzo.stoakes,
	marc.zyngier, nico, pangliyuan1, pfalcato, punitagrawal, rjw,
	rppt, tony, vbabka, viro, wangkefeng.wang, will, wozizhi

On 2025-11-28 17:34:31 [+0000], Russell King (Oracle) wrote:
> On Fri, Nov 28, 2025 at 06:22:42PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-11-28 17:01:18 [+0000], Russell King (Oracle) wrote:
> > > > I hope Russell will add them once he gets to it. They got reviewed, I
> > > > added them to the patch system.
> > > 
> > > I'm not sure which patches you're talking about, but discussion is
> > > still ongoing, so it would be greatly premature to merge anything.
> > 
> > This thread
> > 	https://lore.kernel.org/all/20251110145555.2555055-1-bigeasy@linutronix.de/
> > 
> > and the patches are 9459/1 to 9463/1 in your patch system. They address
> > other issues, not this one.
> 
> Oh, the branch predictor issue. Yea, I'm not keen on changing that
> because I'm not sure if it's correct (the knowledge for this has
> long since evaporated.) There have been multiple attempts at fixing
> this in the past, and I've previously pointed out problems with
> them when I _did_ have the knowledge. Have you looked back in the
> archives to see whether any of that feedback I've given in the past
> is relevant?

I dug up the emails from 2021, 2019 and you complained that I open the
interrupts too early. Now I moved the invocation of hardening the branch
predictor to happen before the interrupts are enabled. Based on that it
should not raise to any complains.

> > So Will suggested to let change the handler and handle this case. The
> > other patch is avoiding handling addr > TASK_SIZE.
> > Any preferences from your side?
> 
> ... and now we have a new proposal from Linus. I'm not intending to
> do anything on this new problem until the discussion calms down and
> we stop getting new solutions.

Okay. If we could please sort out the first part then it might be easier
to move on here once the dust settled.

Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-30 11:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-27 14:01 [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Xie Yuanbin
2025-11-27 14:01 ` [RFC PATCH v2 2/2] ARM/mm/fault: Enable interrupts before sending signal Xie Yuanbin
2025-11-27 14:49   ` Sebastian Andrzej Siewior
2025-11-27 14:51 ` [RFC PATCH v2 1/2] ARM/mm/fault: always goto bad_area when handling with page faults of kernel address Sebastian Andrzej Siewior
2025-11-28  2:27   ` Xie Yuanbin
2025-11-28 12:03     ` Sebastian Andrzej Siewior
2025-11-28 17:01       ` Russell King (Oracle)
2025-11-28 17:22         ` Sebastian Andrzej Siewior
2025-11-28 17:34           ` Russell King (Oracle)
2025-11-30 11:20             ` Sebastian Andrzej Siewior
2025-11-29  2:33       ` Xie Yuanbin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox