From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Xie Yuanbin <xieyuanbin1@huawei.com>
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
brauner@kernel.org, catalin.marinas@arm.com, hch@lst.de,
jack@suse.com, linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, pangliyuan1@huawei.com,
wangkefeng.wang@huawei.com, will@kernel.org,
wozizhi@huaweicloud.com, yangerkun@huawei.com
Subject: Re: [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context
Date: Fri, 5 Dec 2025 12:08:14 +0000 [thread overview]
Message-ID: <aTLLLuup7TeAqFVL@shell.armlinux.org.uk> (raw)
In-Reply-To: <20251203014800.4988-1-xieyuanbin1@huawei.com>
On Wed, Dec 03, 2025 at 09:48:00AM +0800, Xie Yuanbin wrote:
> On Tue, 2 Dec 2025 14:07:25 -0800, Linus Torvalds wrote:
> > On Tue, 2 Dec 2025 at 04:43, Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> >>
> >> What I'm thinking is to address both of these by handling kernel space
> >> page faults (which will be permission or PTE-not-present) separately
> >> (not even build tested):
> >
> > That patch looks sane to me.
> >
> > But I also didn't build test it, just scanned it visually ;)
>
> That patch removes harden_branch_predictor() from __do_user_fault(), and
> moves it to do_page_fault()->do_kernel_address_page_fault().
> This resolves previously mentioned kernel warning issue. However,
> __do_user_fault() is not only called by do_page_fault(), it is
> alse called by do_bad_area(), do_sect_fault() and do_translation_fault().
>
> So I think that some harden_branch_predictor() is missing on other paths.
> According to my tests, when CONFIG_ARM_LPAE=n, harden_branch_predictor()
> will never be called anymore, even if a user program trys to access the
> kernel address.
>
> Or perhaps I've misunderstood something, could you please point it out?
> Thank you very much.
Right, let's split these issues into separate patches. Please test this
patch, which should address only the hash_name() fault issue, and
provides the basis for fixing the branch predictor issue.
Yes, at the moment, do_kernel_address_page_fault() looks very much like
do_bad_area(), but with the addition of the IRQ-enable if the parent
context was enabled, but the following patch to address the branch
predictor hardening will show why its different.
In my opinion, this approach makes the handling for kernel address
page faults (non-present pages and page permission faults) much easier
to understand.
Note that this will call __do_user_fault() with interrupts disabled.
Build tested, and remotely boot tested on Cortex-A5 hardware but
without kfence enabled. Also tested usermode access to kernel space
which fails with SEGV:
- read from 0xc0000000 (section permission fault, do_sect_fault)
- read from 0xffff2000 (page translation fault, do_page_fault)
- read from 0xffff0000 (vectors page - read possible as expected)
- write to 0xffff0000 (page permission fault, do_page_fault)
8<===
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Subject: [PATCH] ARM: fix hash_name() fault
Zizhi Wo reports:
"During the execution of hash_name()->load_unaligned_zeropad(), a
potential memory access beyond the PAGE boundary may occur. For
example, when the filename length is near the PAGE_SIZE boundary.
This triggers a page fault, which leads to a call to
do_page_fault()->mmap_read_trylock(). If we can't acquire the lock,
we have to fall back to the mmap_read_lock() path, which calls
might_sleep(). This breaks RCU semantics because path lookup occurs
under an RCU read-side critical section."
This is seen with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_KFENCE=y.
Kernel addresses (with the exception of the vectors/kuser helper
page) do not have VMAs associated with them. If the vectors/kuser
helper page faults, then there are two possibilities:
1. if the fault happened while in kernel mode, then we're basically
dead, because the CPU won't be able to vector through this page
to handle the fault.
2. if the fault happened while in user mode, that means the page was
protected from user access, and we want to fault anyway.
Thus, we can handle kernel addresses from any context entirely
separately without going anywhere near the mmap lock. This gives us
an entirely non-sleeping path for all kernel mode kernel address
faults.
Reported-by: Zizhi Wo <wozizhi@huaweicloud.com>
Reported-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Link: https://lore.kernel.org/r/20251126090505.3057219-1-wozizhi@huaweicloud.com
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
arch/arm/mm/fault.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 46169fe42c61..2bbec38ced97 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -260,6 +260,35 @@ static inline bool ttbr0_usermode_access_allowed(struct pt_regs *regs)
}
#endif
+static int __kprobes
+do_kernel_address_page_fault(struct mm_struct *mm, unsigned long addr,
+ unsigned int fsr, struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /*
+ * Fault from user mode for a kernel space address. User mode
+ * should not be faulting in kernel space, which includes the
+ * vector/khelper page. Send a SIGSEGV.
+ */
+ __do_user_fault(addr, fsr, SIGSEGV, SEGV_MAPERR, regs);
+ } else {
+ /*
+ * Fault from kernel mode. Enable interrupts if they were
+ * enabled in the parent context. Section (upper page table)
+ * translation faults are handled via do_translation_fault(),
+ * so we will only get here for a non-present kernel space
+ * PTE or PTE permission fault. This may happen in exceptional
+ * circumstances and need the fixup tables to be walked.
+ */
+ if (interrupts_enabled(regs))
+ local_irq_enable();
+
+ __do_kernel_fault(mm, addr, fsr, regs);
+ }
+
+ return 0;
+}
+
static int __kprobes
do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
{
@@ -273,6 +302,12 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if (kprobe_page_fault(regs, fsr))
return 0;
+ /*
+ * Handle kernel addresses faults separately, which avoids touching
+ * the mmap lock from contexts that are not able to sleep.
+ */
+ if (addr >= TASK_SIZE)
+ return do_kernel_address_page_fault(mm, addr, fsr, regs);
/* Enable interrupts if they were enabled in the parent context. */
if (interrupts_enabled(regs))
--
2.47.3
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2025-12-05 12:08 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-26 9:05 Zizhi Wo
2025-11-26 10:19 ` [RFC PATCH] vfs: Fix might sleep in load_unaligned_zeropad() with rcu read lock held Xie Yuanbin
2025-11-26 18:10 ` Al Viro
2025-11-26 18:48 ` Al Viro
2025-11-26 19:05 ` Russell King (Oracle)
2025-11-26 19:26 ` Al Viro
2025-11-26 19:51 ` Russell King (Oracle)
2025-11-26 20:02 ` Al Viro
2025-11-26 22:25 ` david laight
2025-11-26 23:51 ` Al Viro
2025-11-26 23:31 ` Russell King (Oracle)
2025-11-27 3:03 ` Xie Yuanbin
2025-11-27 7:20 ` Sebastian Andrzej Siewior
2025-11-27 11:20 ` Xie Yuanbin
2025-11-28 1:39 ` Xie Yuanbin
2025-11-26 20:42 ` Al Viro
2025-11-26 10:27 ` [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context Zizhi Wo
2025-11-26 21:12 ` Linus Torvalds
2025-11-27 10:27 ` Will Deacon
2025-11-27 10:57 ` Russell King (Oracle)
2025-11-28 17:06 ` Linus Torvalds
2025-11-29 1:01 ` Zizhi Wo
2025-11-29 1:35 ` Linus Torvalds
2025-11-29 4:08 ` [Bug report] hash_name() may cross page boundary and trigger Xie Yuanbin
2025-11-29 9:08 ` Al Viro
2025-11-29 9:25 ` Xie Yuanbin
2025-11-29 9:44 ` Al Viro
2025-11-29 10:05 ` Xie Yuanbin
2025-11-29 10:45 ` david laight
2025-11-29 8:54 ` [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context Al Viro
2025-12-01 2:08 ` Zizhi Wo
2025-11-29 2:18 ` [Bug report] hash_name() may cross page boundary and trigger Xie Yuanbin
2025-12-01 13:28 ` [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context Will Deacon
2025-12-02 12:43 ` Russell King (Oracle)
2025-12-02 13:02 ` Xie Yuanbin
2025-12-02 22:07 ` Linus Torvalds
2025-12-03 1:48 ` Xie Yuanbin
2025-12-05 12:08 ` Russell King (Oracle) [this message]
2025-11-26 18:55 ` Al Viro
2025-11-27 2:24 ` Zizhi Wo
2025-11-29 3:37 ` Al Viro
2025-11-30 3:01 ` [RFC][alpha] saner vmalloc handling (was Re: [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context) Al Viro
2025-11-30 11:32 ` david laight
2025-11-30 16:43 ` Al Viro
2025-11-30 18:14 ` Magnus Lindholm
2025-11-30 19:03 ` david laight
2025-11-30 20:31 ` Al Viro
2025-11-30 20:32 ` Al Viro
2025-11-30 22:16 ` Linus Torvalds
2025-11-30 23:37 ` Al Viro
2025-12-01 2:03 ` [Bug report] hash_name() may cross page boundary and trigger sleep in RCU context Zizhi Wo
2025-11-27 12:59 ` Will Deacon
2025-11-28 1:17 ` Zizhi Wo
2025-11-28 1:18 ` Zizhi Wo
2025-11-28 1:39 ` Zizhi Wo
2025-11-28 12:25 ` Will Deacon
2025-11-29 1:02 ` Zizhi Wo
2025-11-29 3:55 ` Al Viro
2025-12-01 2:38 ` Zizhi Wo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTLLLuup7TeAqFVL@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=hch@lst.de \
--cc=jack@suse.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pangliyuan1@huawei.com \
--cc=torvalds@linux-foundation.org \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=wozizhi@huaweicloud.com \
--cc=xieyuanbin1@huawei.com \
--cc=yangerkun@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox