* [PATCH v11 00/15] unwind_deferred: Implement sframe handling
@ 2025-10-22 14:43 Jens Remus
2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
` (17 more replies)
0 siblings, 18 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
This is the implementation of parsing the SFrame section in an ELF file.
It's a continuation of Josh's and Steve's last work that can be found
here:
https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
Currently the only way to get a user space stack trace from a stack
walk (and not just copying large amount of user stack into the kernel
ring buffer) is to use frame pointers. This has a few issues. The biggest
one is that compiling frame pointers into every application and library
has been shown to cause performance overhead.
Another issue is that the format of the frames may not always be consistent
between different compilers and some architectures (s390) has no defined
format to do a reliable stack walk. The only way to perform user space
profiling on these architectures is to copy the user stack into the kernel
buffer.
SFrames[1] is now supported in gcc binutils and soon will also be supported
by LLVM. SFrames acts more like ORC, and lives in the ELF executable
file as its own section. Like ORC it has two tables where the first table
is sorted by instruction pointers (IP) and using the current IP and finding
it's entry in the first table, it will take you to the second table which
will tell you where the return address of the current function is located
and then you can use that address to look it up in the first table to find
the return address of that function, and so on. This performs a user
space stack walk.
Now because the SFrame section lives in the ELF file it needs to be faulted
into memory when it is used. This means that walking the user space stack
requires being in a faultable context. As profilers like perf request a stack
trace in interrupt or NMI context, it cannot do the walking when it is
requested. Instead it must be deferred until it is safe to fault in user
space. One place this is known to be safe is when the task is about to return
back to user space.
This series makes the deferred unwind code implement SFrames.
[1] https://sourceware.org/binutils/wiki/sframe
Changes since v10:
- Rebase on v6.17-rc1 with Peter's unwind user fixes and x86 support
series [2] and Steve's support for the deferred unwinding infrastructure
series in perf [3] and perf tool [4] on top.
- Support for SFrame V2 PC-relative FDE function start address. (Jens)
- Support for SFrame V2 representing RA undefined as indication for
outermost frames. (Jens)
[2]: [PATCH 00/12] Various fixes and x86 support,
https://lore.kernel.org/all/20250924075948.579302904@infradead.org/
[3]: [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure,
https://lore.kernel.org/all/20251007214008.080852573@kernel.org/
[4]: [PATCH v16 0/4] perf tool: Support the deferred unwinding infrastructure,
https://lore.kernel.org/all/20250908175319.841517121@kernel.org/
Patches 1 and 2 are suggested fixups to patches from Peter's unwind user
fixes and x86 support series. They keep the factoring out of the word
size from the frame's CFA, FP, and RA offsets local to unwind user fp, as
unwind user sframe does use absolute offsets.
Patches 3, 6, and 14 have been updated to exclusively support the recent
PC-relative SFrame FDE function start address encoding. With Binutils 2.45
the SFrame V2 FDE function start address field value is an offset from the
field (i.e. PC-relative) instead of from the .sframe section start. This
is indicated by the new SFrame header flag SFRAME_F_FDE_FUNC_START_PCREL.
Old SFrame V2 sections get rejected with dynamic debug message
"bad/unsupported sframe header".
Patches 9 and 10 add support to unwind user and unwind user sframe for
a recent change of the SFrame V2 format to represent an undefined
return address as an SFrame FRE without any offsets, which is used as
indication for outermost frames. Note that currently only a development
build of Binutils mainline generates SFrame information including this
new indication for outermost frames. SFrame information without the new
indication is still supported. Without these patches unwind user sframe
would identify such new SFrame FREs without any offsets as corrupted and
remove the .sframe section, causing any any further stack tracing using
sframe to fail.
Regards,
Jens
Jens Remus (4):
fixup! unwind: Implement compat fp unwind
fixup! unwind_user/x86: Enable frame pointer unwinding on x86
unwind_user: Stop when reaching an outermost frame
unwind_user/sframe: Add support for outermost frame indication
Josh Poimboeuf (11):
unwind_user/sframe: Add support for reading .sframe headers
unwind_user/sframe: Store sframe section data in per-mm maple tree
x86/uaccess: Add unsafe_copy_from_user() implementation
unwind_user/sframe: Add support for reading .sframe contents
unwind_user/sframe: Detect .sframe sections in executables
unwind_user/sframe: Wire up unwind_user to sframe
unwind_user/sframe/x86: Enable sframe unwinding on x86
unwind_user/sframe: Remove .sframe section on detected corruption
unwind_user/sframe: Show file name in debug output
unwind_user/sframe: Add .sframe validation option
unwind_user/sframe: Add prctl() interface for registering .sframe
sections
MAINTAINERS | 1 +
arch/Kconfig | 23 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mmu.h | 2 +-
arch/x86/include/asm/uaccess.h | 39 +-
arch/x86/include/asm/unwind_user.h | 11 +-
fs/binfmt_elf.c | 49 ++-
include/linux/mm_types.h | 3 +
include/linux/sframe.h | 60 +++
include/linux/unwind_user_types.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 6 +-
kernel/fork.c | 10 +
kernel/sys.c | 9 +
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 615 +++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 72 ++++
kernel/unwind/sframe_debug.h | 68 ++++
kernel/unwind/user.c | 56 ++-
mm/init-mm.c | 2 +
20 files changed, 1004 insertions(+), 32 deletions(-)
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
create mode 100644 kernel/unwind/sframe_debug.h
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86 Jens Remus
` (16 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
This fixup resolves the following issue for unwind user sframe, that
got introduced by Peter Zijlstra's patch "[PATCH 11/12] unwind:
Implement compat fp unwind" [1]:
Peter factored out the word size (i.e. 4 for 32-bit compat or
8 for 64-bit) from the frame CFA, FP, and RA offsets. This is an
issue for unwind user sframe for two reasons:
1. SFrame provides absolute offsets, which would require to be
unnecessarily scaled down only to get scaled up again prior to use.
2. Factoring out the word size from those offsets requires that all
architectures/ABIs guarantee, that these offsets are always aligned
to the word size.
Limit the down-/upscaling by word size to unwind user (compat) fp.
[1]: https://lore.kernel.org/lkml/20250924080119.613695709@infradead.org/
kernel/unwind/user.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index bc1bf1e83d65..696004ee956a 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -8,19 +8,15 @@
#include <linux/unwind_user.h>
#include <linux/uaccess.h>
-static const struct unwind_user_frame fp_frame = {
- ARCH_INIT_USER_FP_FRAME
-};
-
#define for_each_user_frame(state) \
for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
static inline int
-get_user_word(unsigned long *word, unsigned long base, int off, int size)
+get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
{
- unsigned long __user *addr = (void __user *)base + (off * size);
+ unsigned long __user *addr = (void __user *)base + off;
#ifdef CONFIG_COMPAT
- if (size == sizeof(int)) {
+ if (ws == sizeof(int)) {
unsigned int data;
int ret = get_user(data, (unsigned int __user *)addr);
*word = data;
@@ -32,6 +28,9 @@ get_user_word(unsigned long *word, unsigned long base, int off, int size)
static int unwind_user_next_fp(struct unwind_user_state *state)
{
+ const struct unwind_user_frame fp_frame = {
+ ARCH_INIT_USER_FP_FRAME(state->ws)
+ };
const struct unwind_user_frame *frame = &fp_frame;
unsigned long cfa, fp, ra;
@@ -44,7 +43,7 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
}
/* Get the Canonical Frame Address (CFA) */
- cfa += state->ws * frame->cfa_off;
+ cfa += frame->cfa_off;
/* stack going in wrong direction? */
if (cfa <= state->sp)
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
` (15 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
This fixup adjusts patch "[PATCH 12/12] unwind_user/x86: Enable frame
pointer unwinding on x86" [1] to my preceding fixup, which limits the
down-/upscaling by word size to unwind user (compat) fp.
[1]: https://lore.kernel.org/lkml/20250924080119.613695709@infradead.org/
arch/x86/include/asm/unwind_user.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index e649b8fea2aa..5e0755ea3086 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -2,10 +2,10 @@
#ifndef _ASM_X86_UNWIND_USER_H
#define _ASM_X86_UNWIND_USER_H
-#define ARCH_INIT_USER_FP_FRAME \
- .cfa_off = 2, \
- .ra_off = -1, \
- .fp_off = -2, \
+#define ARCH_INIT_USER_FP_FRAME(ws) \
+ .cfa_off = 2*(ws), \
+ .ra_off = -1*(ws), \
+ .fp_off = -2*(ws), \
.use_fp = true,
#endif /* _ASM_X86_UNWIND_USER_H */
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
2025-10-22 14:43 ` [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86 Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-11-18 17:04 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree Jens Remus
` (14 subsequent siblings)
17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for unwinding user space stacks with sframe, add basic
sframe compile infrastructure and support for reading the .sframe
section header.
sframe_add_section() reads the header and unconditionally returns an
error, so it's not very useful yet. A subsequent patch will improve
that.
Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
[ Jens Remus: Add support for PC-relative FDE function start address. ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v11:
- Support for SFrame V2 PC-relative FDE function start address. (Jens)
MAINTAINERS | 1 +
arch/Kconfig | 3 +
include/linux/sframe.h | 40 ++++++++++++
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 137 +++++++++++++++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 72 ++++++++++++++++++++++
6 files changed, 255 insertions(+), 1 deletion(-)
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
diff --git a/MAINTAINERS b/MAINTAINERS
index fe168477caa4..251dcb49e112 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26305,6 +26305,7 @@ USERSPACE STACK UNWINDING
M: Josh Poimboeuf <jpoimboe@kernel.org>
M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
+F: include/linux/sframe.h
F: include/linux/unwind*.h
F: kernel/unwind/
diff --git a/arch/Kconfig b/arch/Kconfig
index d1b4ffd6e085..69fcabf53088 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -451,6 +451,9 @@ config HAVE_UNWIND_USER_FP
bool
select UNWIND_USER
+config HAVE_UNWIND_USER_SFRAME
+ bool
+
config HAVE_PERF_REGS
bool
help
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
new file mode 100644
index 000000000000..0584f661f698
--- /dev/null
+++ b/include/linux/sframe.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SFRAME_H
+#define _LINUX_SFRAME_H
+
+#include <linux/mm_types.h>
+#include <linux/unwind_user_types.h>
+
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+
+struct sframe_section {
+ unsigned long sframe_start;
+ unsigned long sframe_end;
+ unsigned long text_start;
+ unsigned long text_end;
+
+ unsigned long fdes_start;
+ unsigned long fres_start;
+ unsigned long fres_end;
+ unsigned int num_fdes;
+
+ signed char ra_off;
+ signed char fp_off;
+};
+
+extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end);
+extern int sframe_remove_section(unsigned long sframe_addr);
+
+#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ return -ENOSYS;
+}
+static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+
+#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+#endif /* _LINUX_SFRAME_H */
diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile
index eae37bea54fd..146038165865 100644
--- a/kernel/unwind/Makefile
+++ b/kernel/unwind/Makefile
@@ -1 +1,2 @@
- obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME) += sframe.o
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
new file mode 100644
index 000000000000..b28ec77bc9a8
--- /dev/null
+++ b/kernel/unwind/sframe.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Userspace sframe access functions
+ */
+
+#define pr_fmt(fmt) "sframe: " fmt
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <linux/string_helpers.h>
+#include <linux/sframe.h>
+#include <linux/unwind_user_types.h>
+
+#include "sframe.h"
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static void free_section(struct sframe_section *sec)
+{
+ kfree(sec);
+}
+
+static int sframe_read_header(struct sframe_section *sec)
+{
+ unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
+ struct sframe_header shdr;
+ unsigned int num_fdes;
+
+ if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
+ dbg("header usercopy failed\n");
+ return -EFAULT;
+ }
+
+ if (shdr.preamble.magic != SFRAME_MAGIC ||
+ shdr.preamble.version != SFRAME_VERSION_2 ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
+ shdr.auxhdr_len) {
+ dbg("bad/unsupported sframe header\n");
+ return -EINVAL;
+ }
+
+ if (!shdr.num_fdes || !shdr.num_fres) {
+ dbg("no fde/fre entries\n");
+ return -EINVAL;
+ }
+
+ header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
+ if (header_end >= sec->sframe_end) {
+ dbg("header doesn't fit in section\n");
+ return -EINVAL;
+ }
+
+ num_fdes = shdr.num_fdes;
+ fdes_start = header_end + shdr.fdes_off;
+ fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde));
+
+ fres_start = header_end + shdr.fres_off;
+ fres_end = fres_start + shdr.fre_len;
+
+ if (fres_start < fdes_end || fres_end > sec->sframe_end) {
+ dbg("inconsistent fde/fre offsets\n");
+ return -EINVAL;
+ }
+
+ sec->num_fdes = num_fdes;
+ sec->fdes_start = fdes_start;
+ sec->fres_start = fres_start;
+ sec->fres_end = fres_end;
+
+ sec->ra_off = shdr.cfa_fixed_ra_offset;
+ sec->fp_off = shdr.cfa_fixed_fp_offset;
+
+ return 0;
+}
+
+int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
+ struct vm_area_struct *sframe_vma, *text_vma;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ int ret;
+
+ if (!sframe_start || !sframe_end || !text_start || !text_end) {
+ dbg("zero-length sframe/text address\n");
+ return -EINVAL;
+ }
+
+ scoped_guard(mmap_read_lock, mm) {
+ sframe_vma = vma_lookup(mm, sframe_start);
+ if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
+ dbg("bad sframe address (0x%lx - 0x%lx)\n",
+ sframe_start, sframe_end);
+ return -EINVAL;
+ }
+
+ text_vma = vma_lookup(mm, text_start);
+ if (!text_vma ||
+ !(text_vma->vm_flags & VM_EXEC) ||
+ text_end > text_vma->vm_end) {
+ dbg("bad text address (0x%lx - 0x%lx)\n",
+ text_start, text_end);
+ return -EINVAL;
+ }
+ }
+
+ sec = kzalloc(sizeof(*sec), GFP_KERNEL);
+ if (!sec)
+ return -ENOMEM;
+
+ sec->sframe_start = sframe_start;
+ sec->sframe_end = sframe_end;
+ sec->text_start = text_start;
+ sec->text_end = text_end;
+
+ ret = sframe_read_header(sec);
+ if (ret)
+ goto err_free;
+
+ /* TODO nowhere to store it yet - just free it and return an error */
+ ret = -ENOSYS;
+
+err_free:
+ free_section(sec);
+ return ret;
+}
+
+int sframe_remove_section(unsigned long sframe_start)
+{
+ return -ENOSYS;
+}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
new file mode 100644
index 000000000000..559a74322666
--- /dev/null
+++ b/kernel/unwind/sframe.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * From https://www.sourceware.org/binutils/docs/sframe-spec.html
+ */
+#ifndef _SFRAME_H
+#define _SFRAME_H
+
+#include <linux/types.h>
+
+#define SFRAME_VERSION_1 1
+#define SFRAME_VERSION_2 2
+#define SFRAME_MAGIC 0xdee2
+
+#define SFRAME_F_FDE_SORTED 0x1
+#define SFRAME_F_FRAME_POINTER 0x2
+#define SFRAME_F_FDE_FUNC_START_PCREL 0x4
+
+#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1
+#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2
+#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3
+
+#define SFRAME_FDE_TYPE_PCINC 0
+#define SFRAME_FDE_TYPE_PCMASK 1
+
+struct sframe_preamble {
+ u16 magic;
+ u8 version;
+ u8 flags;
+} __packed;
+
+struct sframe_header {
+ struct sframe_preamble preamble;
+ u8 abi_arch;
+ s8 cfa_fixed_fp_offset;
+ s8 cfa_fixed_ra_offset;
+ u8 auxhdr_len;
+ u32 num_fdes;
+ u32 num_fres;
+ u32 fre_len;
+ u32 fdes_off;
+ u32 fres_off;
+} __packed;
+
+#define SFRAME_HEADER_SIZE(header) \
+ ((sizeof(struct sframe_header) + header.auxhdr_len))
+
+#define SFRAME_AARCH64_PAUTH_KEY_A 0
+#define SFRAME_AARCH64_PAUTH_KEY_B 1
+
+struct sframe_fde {
+ s32 start_addr;
+ u32 func_size;
+ u32 fres_off;
+ u32 fres_num;
+ u8 info;
+ u8 rep_size;
+ u16 padding;
+} __packed;
+
+#define SFRAME_FUNC_FRE_TYPE(data) (data & 0xf)
+#define SFRAME_FUNC_FDE_TYPE(data) ((data >> 4) & 0x1)
+#define SFRAME_FUNC_PAUTH_KEY(data) ((data >> 5) & 0x1)
+
+#define SFRAME_BASE_REG_FP 0
+#define SFRAME_BASE_REG_SP 1
+
+#define SFRAME_FRE_CFA_BASE_REG_ID(data) (data & 0x1)
+#define SFRAME_FRE_OFFSET_COUNT(data) ((data >> 1) & 0xf)
+#define SFRAME_FRE_OFFSET_SIZE(data) ((data >> 5) & 0x3)
+#define SFRAME_FRE_MANGLED_RA_P(data) ((data >> 7) & 0x1)
+
+#endif /* _SFRAME_H */
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (2 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
` (13 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Associate an sframe section with its mm by adding it to a per-mm maple
tree which is indexed by the corresponding text address range. A single
sframe section can be associated with multiple text ranges.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/mmu.h | 2 +-
include/linux/mm_types.h | 3 +++
include/linux/sframe.h | 13 +++++++++
kernel/fork.c | 10 +++++++
kernel/unwind/sframe.c | 55 +++++++++++++++++++++++++++++++++++---
mm/init-mm.c | 2 ++
6 files changed, 81 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 0fe9c569d171..227a32899a59 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -87,7 +87,7 @@ typedef struct {
.context = { \
.ctx_id = 1, \
.lock = __MUTEX_INITIALIZER(mm.context.lock), \
- }
+ },
void leave_mm(void);
#define leave_mm leave_mm
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 08bc2442db93..31fbd6663047 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1210,6 +1210,9 @@ struct mm_struct {
#ifdef CONFIG_MM_ID
mm_id_t mm_id;
#endif /* CONFIG_MM_ID */
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ struct maple_tree sframe_mt;
+#endif
} __randomize_layout;
/*
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 0584f661f698..73bf6f0b30c2 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -22,18 +22,31 @@ struct sframe_section {
signed char fp_off;
};
+#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
+extern void sframe_free_mm(struct mm_struct *mm);
+
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+static inline bool current_has_sframe(void)
+{
+ struct mm_struct *mm = current->mm;
+
+ return mm && !mtree_empty(&mm->sframe_mt);
+}
+
#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+#define INIT_MM_SFRAME
+static inline void sframe_free_mm(struct mm_struct *mm) {}
static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/fork.c b/kernel/fork.c
index d827cc6c5362..9eb9b9a5d022 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -106,6 +106,7 @@
#include <linux/pidfs.h>
#include <linux/tick.h>
#include <linux/unwind_deferred.h>
+#include <linux/sframe.h>
#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@@ -690,6 +691,7 @@ void __mmdrop(struct mm_struct *mm)
mm_destroy_cid(mm);
percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
futex_hash_free(mm);
+ sframe_free_mm(mm);
free_mm(mm);
}
@@ -1028,6 +1030,13 @@ static void mmap_init_lock(struct mm_struct *mm)
#endif
}
+static void mm_init_sframe(struct mm_struct *mm)
+{
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ mt_init(&mm->sframe_mt);
+#endif
+}
+
static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
struct user_namespace *user_ns)
{
@@ -1056,6 +1065,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->pmd_huge_pte = NULL;
#endif
mm_init_uprobes_state(mm);
+ mm_init_sframe(mm);
hugetlb_count_init(mm);
if (current->mm) {
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index b28ec77bc9a8..149ce70e4229 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -123,15 +123,64 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
if (ret)
goto err_free;
- /* TODO nowhere to store it yet - just free it and return an error */
- ret = -ENOSYS;
+ ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
+ if (ret) {
+ dbg("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
+ goto err_free;
+ }
+
+ return 0;
err_free:
free_section(sec);
return ret;
}
+static int __sframe_remove_section(struct mm_struct *mm,
+ struct sframe_section *sec)
+{
+ if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
+ dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ return -EINVAL;
+ }
+
+ free_section(sec);
+
+ return 0;
+}
+
int sframe_remove_section(unsigned long sframe_start)
{
- return -ENOSYS;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ unsigned long index = 0;
+ bool found = false;
+ int ret = 0;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
+ if (sec->sframe_start == sframe_start) {
+ found = true;
+ ret |= __sframe_remove_section(mm, sec);
+ }
+ }
+
+ if (!found || ret)
+ return -EINVAL;
+
+ return 0;
+}
+
+void sframe_free_mm(struct mm_struct *mm)
+{
+ struct sframe_section *sec;
+ unsigned long index = 0;
+
+ if (!mm)
+ return;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
+ free_section(sec);
+
+ mtree_destroy(&mm->sframe_mt);
}
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 4600e7605cab..b32fcf167cc2 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -11,6 +11,7 @@
#include <linux/atomic.h>
#include <linux/user_namespace.h>
#include <linux/iommu.h>
+#include <linux/sframe.h>
#include <asm/mmu.h>
#ifndef INIT_MM_CONTEXT
@@ -46,6 +47,7 @@ struct mm_struct init_mm = {
.user_ns = &init_user_ns,
.cpu_bitmap = CPU_BITS_NONE,
INIT_MM_CONTEXT(init_mm)
+ INIT_MM_SFRAME
};
void setup_initial_init_mm(void *start_code, void *end_code,
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (3 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
` (12 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add an x86 implementation of unsafe_copy_from_user() similar to the
existing unsafe_copy_to_user().
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/uaccess.h | 39 +++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 3a7755c1a441..3caf02d0503e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -599,7 +599,7 @@ _label: \
* We want the unsafe accessors to always be inlined and use
* the error labels - thus the macro games.
*/
-#define unsafe_copy_loop(dst, src, len, type, label) \
+#define unsafe_copy_to_user_loop(dst, src, len, type, label) \
while (len >= sizeof(type)) { \
unsafe_put_user(*(type *)(src),(type __user *)(dst),label); \
dst += sizeof(type); \
@@ -607,15 +607,34 @@ _label: \
len -= sizeof(type); \
}
-#define unsafe_copy_to_user(_dst,_src,_len,label) \
-do { \
- char __user *__ucu_dst = (_dst); \
- const char *__ucu_src = (_src); \
- size_t __ucu_len = (_len); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
+#define unsafe_copy_to_user(_dst, _src, _len, label) \
+do { \
+ void __user *__dst = (_dst); \
+ const void *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u8, label); \
+} while (0)
+
+#define unsafe_copy_from_user_loop(dst, src, len, type, label) \
+ while (len >= sizeof(type)) { \
+ unsafe_get_user(*(type *)(dst), (type __user *)(src), label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+#define unsafe_copy_from_user(_dst, _src, _len, label) \
+do { \
+ void *__dst = (_dst); \
+ void __user *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u8, label); \
} while (0)
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (4 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-23 16:04 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
` (11 subsequent siblings)
17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for using sframe to unwind user space stacks, add an
sframe_find() interface for finding the sframe information associated
with a given text address.
For performance, use user_read_access_begin() and the corresponding
unsafe_*() accessors. Note that use of pr_debug() in uaccess-enabled
regions would break noinstr validation, so there aren't any debug
messages yet. That will be added in a subsequent commit.
Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
[ Jens Remus: Add support for PC-relative FDE function start address. ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v11:
- Support for SFrame V2 PC-relative FDE function start address. (Jens)
include/linux/sframe.h | 5 +
kernel/unwind/sframe.c | 318 ++++++++++++++++++++++++++++++++++-
kernel/unwind/sframe_debug.h | 35 ++++
3 files changed, 354 insertions(+), 4 deletions(-)
create mode 100644 kernel/unwind/sframe_debug.h
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 73bf6f0b30c2..9a72209696f9 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -3,11 +3,14 @@
#define _LINUX_SFRAME_H
#include <linux/mm_types.h>
+#include <linux/srcu.h>
#include <linux/unwind_user_types.h>
#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
struct sframe_section {
+ struct rcu_head rcu;
+
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
@@ -28,6 +31,7 @@ extern void sframe_free_mm(struct mm_struct *mm);
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+extern int sframe_find(unsigned long ip, struct unwind_user_frame *frame);
static inline bool current_has_sframe(void)
{
@@ -46,6 +50,7 @@ static inline int sframe_add_section(unsigned long sframe_start, unsigned long s
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline int sframe_find(unsigned long ip, struct unwind_user_frame *frame) { return -ENOSYS; }
static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 149ce70e4229..5536374e2a22 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -15,9 +15,310 @@
#include <linux/unwind_user_types.h>
#include "sframe.h"
+#include "sframe_debug.h"
-#define dbg(fmt, ...) \
- pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+struct sframe_fre {
+ unsigned int size;
+ u32 ip_off;
+ s32 cfa_off;
+ s32 ra_off;
+ s32 fp_off;
+ u8 info;
+};
+
+DEFINE_STATIC_SRCU(sframe_srcu);
+
+static __always_inline unsigned char fre_type_to_size(unsigned char fre_type)
+{
+ if (fre_type > 2)
+ return 0;
+ return 1 << fre_type;
+}
+
+static __always_inline unsigned char offset_size_enum_to_size(unsigned char off_size)
+{
+ if (off_size > 2)
+ return 0;
+ return 1 << off_size;
+}
+
+static __always_inline int __read_fde(struct sframe_section *sec,
+ unsigned int fde_num,
+ struct sframe_fde *fde,
+ unsigned long *fde_start_base)
+{
+ unsigned long fde_addr, ip;
+
+ fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
+ unsafe_copy_from_user(fde, (void __user *)fde_addr,
+ sizeof(struct sframe_fde), Efault);
+
+ ip = fde_addr + fde->start_addr;
+ if (ip < sec->text_start || ip > sec->text_end)
+ return -EINVAL;
+
+ *fde_start_base = fde_addr;
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fde(struct sframe_section *sec,
+ unsigned long ip,
+ struct sframe_fde *fde,
+ unsigned long *fde_start_base)
+{
+ unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
+ struct sframe_fde __user *first, *low, *high, *found = NULL;
+ int ret;
+
+ first = (void __user *)sec->fdes_start;
+ low = first;
+ high = first + sec->num_fdes - 1;
+
+ while (low <= high) {
+ struct sframe_fde __user *mid;
+ s32 func_off;
+ unsigned long func_addr;
+
+ mid = low + ((high - low) / 2);
+
+ unsafe_get_user(func_off, (s32 __user *)mid, Efault);
+ func_addr = (unsigned long)mid + func_off;
+
+ if (ip >= func_addr) {
+ if (func_addr < func_addr_low)
+ return -EFAULT;
+
+ func_addr_low = func_addr;
+
+ found = mid;
+ low = mid + 1;
+ } else {
+ if (func_addr > func_addr_high)
+ return -EFAULT;
+
+ func_addr_high = func_addr;
+
+ high = mid - 1;
+ }
+ }
+
+ if (!found)
+ return -EINVAL;
+
+ ret = __read_fde(sec, found - first, fde, fde_start_base);
+ if (ret)
+ return ret;
+
+ /* make sure it's not in a gap */
+ if (ip < *fde_start_base + fde->start_addr ||
+ ip >= *fde_start_base + fde->start_addr + fde->func_size)
+ return -EINVAL;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+#define ____UNSAFE_GET_USER_INC(to, from, type, label) \
+({ \
+ type __to; \
+ unsafe_get_user(__to, (type __user *)from, label); \
+ from += sizeof(__to); \
+ to = __to; \
+})
+
+#define __UNSAFE_GET_USER_INC(to, from, size, label, u_or_s) \
+({ \
+ switch (size) { \
+ case 1: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##8, label); \
+ break; \
+ case 2: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##16, label); \
+ break; \
+ case 4: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##32, label); \
+ break; \
+ default: \
+ return -EFAULT; \
+ } \
+})
+
+#define UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, u)
+
+#define UNSAFE_GET_USER_SIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, s)
+
+#define UNSAFE_GET_USER_INC(to, from, size, label) \
+ _Generic(to, \
+ u8: UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u16: UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u32: UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ s8: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s16: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s32: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+
+static __always_inline int __read_fre(struct sframe_section *sec,
+ struct sframe_fde *fde,
+ unsigned long fre_addr,
+ struct sframe_fre *fre)
+{
+ unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
+ unsigned char fre_type = SFRAME_FUNC_FRE_TYPE(fde->info);
+ unsigned char offset_count, offset_size;
+ s32 cfa_off, ra_off, fp_off;
+ unsigned long cur = fre_addr;
+ unsigned char addr_size;
+ u32 ip_off;
+ u8 info;
+
+ addr_size = fre_type_to_size(fre_type);
+ if (!addr_size)
+ return -EFAULT;
+
+ if (fre_addr + addr_size + 1 > sec->fres_end)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
+ if (fde_type == SFRAME_FDE_TYPE_PCINC && ip_off > fde->func_size)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(info, cur, 1, Efault);
+ offset_count = SFRAME_FRE_OFFSET_COUNT(info);
+ offset_size = offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(info));
+ if (!offset_count || !offset_size)
+ return -EFAULT;
+
+ if (cur + (offset_count * offset_size) > sec->fres_end)
+ return -EFAULT;
+
+ fre->size = addr_size + 1 + (offset_count * offset_size);
+
+ UNSAFE_GET_USER_INC(cfa_off, cur, offset_size, Efault);
+ offset_count--;
+
+ ra_off = sec->ra_off;
+ if (!ra_off) {
+ if (!offset_count--)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ra_off, cur, offset_size, Efault);
+ }
+
+ fp_off = sec->fp_off;
+ if (!fp_off && offset_count) {
+ offset_count--;
+ UNSAFE_GET_USER_INC(fp_off, cur, offset_size, Efault);
+ }
+
+ if (offset_count)
+ return -EFAULT;
+
+ fre->ip_off = ip_off;
+ fre->cfa_off = cfa_off;
+ fre->ra_off = ra_off;
+ fre->fp_off = fp_off;
+ fre->info = info;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fre(struct sframe_section *sec,
+ struct sframe_fde *fde,
+ unsigned long fde_start_base,
+ unsigned long ip,
+ struct unwind_user_frame *frame)
+{
+ unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
+ struct sframe_fre *fre, *prev_fre = NULL;
+ struct sframe_fre fres[2];
+ unsigned long fre_addr;
+ bool which = false;
+ unsigned int i;
+ u32 ip_off;
+
+ ip_off = ip - (fde_start_base + fde->start_addr);
+
+ if (fde_type == SFRAME_FDE_TYPE_PCMASK)
+ ip_off %= fde->rep_size;
+
+ fre_addr = sec->fres_start + fde->fres_off;
+
+ for (i = 0; i < fde->fres_num; i++) {
+ int ret;
+
+ /*
+ * Alternate between the two fre_addr[] entries for 'fre' and
+ * 'prev_fre'.
+ */
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ if (ret)
+ return ret;
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off)
+ return -EFAULT;
+
+ if (fre->ip_off > ip_off)
+ break;
+
+ prev_fre = fre;
+ }
+
+ if (!prev_fre)
+ return -EINVAL;
+ fre = prev_fre;
+
+ frame->cfa_off = fre->cfa_off;
+ frame->ra_off = fre->ra_off;
+ frame->fp_off = fre->fp_off;
+ frame->use_fp = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+
+ return 0;
+}
+
+int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
+{
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ struct sframe_fde fde;
+ unsigned long fde_start_base;
+ int ret;
+
+ if (!mm)
+ return -EINVAL;
+
+ guard(srcu)(&sframe_srcu);
+
+ sec = mtree_load(&mm->sframe_mt, ip);
+ if (!sec)
+ return -EINVAL;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+
+ ret = __find_fde(sec, ip, &fde, &fde_start_base);
+ if (ret)
+ goto end;
+
+ ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
+end:
+ user_read_access_end();
+ return ret;
+}
static void free_section(struct sframe_section *sec)
{
@@ -120,8 +421,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_end = text_end;
ret = sframe_read_header(sec);
- if (ret)
+ if (ret) {
+ dbg_print_header(sec);
goto err_free;
+ }
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
@@ -137,6 +440,13 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
return ret;
}
+static void sframe_free_srcu(struct rcu_head *rcu)
+{
+ struct sframe_section *sec = container_of(rcu, struct sframe_section, rcu);
+
+ free_section(sec);
+}
+
static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
@@ -145,7 +455,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
return -EINVAL;
}
- free_section(sec);
+ call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
return 0;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
new file mode 100644
index 000000000000..055c8c8fae24
--- /dev/null
+++ b/kernel/unwind/sframe_debug.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SFRAME_DEBUG_H
+#define _SFRAME_DEBUG_H
+
+#include <linux/sframe.h>
+#include "sframe.h"
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static __always_inline void dbg_print_header(struct sframe_section *sec)
+{
+ unsigned long fdes_end;
+
+ fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde));
+
+ dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+#else /* !CONFIG_DYNAMIC_DEBUG */
+
+#define dbg(args...) no_printk(args)
+
+static inline void dbg_print_header(struct sframe_section *sec) {}
+
+#endif /* !CONFIG_DYNAMIC_DEBUG */
+
+#endif /* _SFRAME_DEBUG_H */
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (5 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
` (10 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
When loading an ELF executable, automatically detect an .sframe section
and associate it with the mm_struct.
Cc: linux-mm@kvack.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
fs/binfmt_elf.c | 49 +++++++++++++++++++++++++++++++++++++---
include/uapi/linux/elf.h | 1 +
2 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 264fba0d44bd..1fd7623cf9a5 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -47,6 +47,7 @@
#include <linux/dax.h>
#include <linux/uaccess.h>
#include <linux/rseq.h>
+#include <linux/sframe.h>
#include <asm/param.h>
#include <asm/page.h>
@@ -622,6 +623,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
}
+static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
+ unsigned long base_addr)
+{
+ unsigned long sframe_start, sframe_end, text_start, text_end;
+
+ sframe_start = base_addr + sframe->p_vaddr;
+ sframe_end = sframe_start + sframe->p_memsz;
+
+ text_start = base_addr + text->p_vaddr;
+ text_end = text_start + text->p_memsz;
+
+ /* Ignore return value, sframe section isn't critical */
+ sframe_add_section(sframe_start, sframe_end, text_start, text_end);
+}
+
/* This is much more generalized than the library routine read function,
so we keep this separate. Technically the library read function
is only provided so that we can read a.out libraries that have
@@ -632,7 +648,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
unsigned long no_base, struct elf_phdr *interp_elf_phdata,
struct arch_elf_state *arch_state)
{
- struct elf_phdr *eppnt;
+ struct elf_phdr *eppnt, *sframe_phdr = NULL;
unsigned long load_addr = 0;
int load_addr_set = 0;
unsigned long error = ~0UL;
@@ -658,7 +674,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
eppnt = interp_elf_phdata;
for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
- if (eppnt->p_type == PT_LOAD) {
+ switch (eppnt->p_type) {
+ case PT_LOAD: {
int elf_type = MAP_PRIVATE;
int elf_prot = make_prot(eppnt->p_flags, arch_state,
true, true);
@@ -697,6 +714,20 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
error = -ENOMEM;
goto out;
}
+ break;
+ }
+ case PT_GNU_SFRAME:
+ sframe_phdr = eppnt;
+ break;
+ }
+ }
+
+ if (sframe_phdr) {
+ eppnt = interp_elf_phdata;
+ for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
+ if (eppnt->p_flags & PF_X) {
+ elf_add_sframe(eppnt, sframe_phdr, load_addr);
+ }
}
}
@@ -821,7 +852,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
int first_pt_load = 1;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
- struct elf_phdr *elf_property_phdata = NULL;
+ struct elf_phdr *elf_property_phdata = NULL, *sframe_phdr = NULL;
unsigned long elf_brk;
bool brk_moved = false;
int retval, i;
@@ -930,6 +961,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
executable_stack = EXSTACK_DISABLE_X;
break;
+ case PT_GNU_SFRAME:
+ sframe_phdr = elf_ppnt;
+ break;
+
case PT_LOPROC ... PT_HIPROC:
retval = arch_elf_pt_proc(elf_ex, elf_ppnt,
bprm->file, false,
@@ -1227,6 +1262,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
elf_brk = k;
}
+ if (sframe_phdr) {
+ for (i = 0, elf_ppnt = elf_phdata;
+ i < elf_ex->e_phnum; i++, elf_ppnt++) {
+ if ((elf_ppnt->p_flags & PF_X))
+ elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
+ }
+ }
+
e_entry = elf_ex->e_entry + load_bias;
phdr_addr += load_bias;
elf_brk += load_bias;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 819ded2d39de..92c16c94fca8 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -41,6 +41,7 @@ typedef __u16 Elf64_Versym;
#define PT_GNU_STACK (PT_LOOS + 0x474e551)
#define PT_GNU_RELRO (PT_LOOS + 0x474e552)
#define PT_GNU_PROPERTY (PT_LOOS + 0x474e553)
+#define PT_GNU_SFRAME (PT_LOOS + 0x474e554)
/* ARM MTE memory tag segment type */
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (6 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-24 13:44 ` Peter Zijlstra
2025-10-22 14:43 ` [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame Jens Remus
` (9 subsequent siblings)
17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Now that the sframe infrastructure is fully in place, make it work by
hooking it up to the unwind_user interface.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/Kconfig | 1 +
include/linux/unwind_user_types.h | 4 ++-
kernel/unwind/user.c | 41 +++++++++++++++++++++++++++----
3 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 69fcabf53088..277b87af949f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -453,6 +453,7 @@ config HAVE_UNWIND_USER_FP
config HAVE_UNWIND_USER_SFRAME
bool
+ select UNWIND_USER
config HAVE_PERF_REGS
bool
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 938f7e623332..ee0ce855e045 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -9,7 +9,8 @@
* available.
*/
enum unwind_user_type_bits {
- UNWIND_USER_TYPE_FP_BIT = 0,
+ UNWIND_USER_TYPE_SFRAME_BIT = 0,
+ UNWIND_USER_TYPE_FP_BIT = 1,
NR_UNWIND_USER_TYPE_BITS,
};
@@ -17,6 +18,7 @@ enum unwind_user_type_bits {
enum unwind_user_type {
/* Type "none" for the start of stack walk iteration. */
UNWIND_USER_TYPE_NONE = 0,
+ UNWIND_USER_TYPE_SFRAME = BIT(UNWIND_USER_TYPE_SFRAME_BIT),
UNWIND_USER_TYPE_FP = BIT(UNWIND_USER_TYPE_FP_BIT),
};
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 696004ee956a..f6c543cb255b 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -7,6 +7,7 @@
#include <linux/sched/task_stack.h>
#include <linux/unwind_user.h>
#include <linux/uaccess.h>
+#include <linux/sframe.h>
#define for_each_user_frame(state) \
for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
@@ -26,12 +27,10 @@ get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
return get_user(*word, addr);
}
-static int unwind_user_next_fp(struct unwind_user_state *state)
+static int unwind_user_next_common(struct unwind_user_state *state,
+ const struct unwind_user_frame *frame,
+ struct pt_regs *regs)
{
- const struct unwind_user_frame fp_frame = {
- ARCH_INIT_USER_FP_FRAME(state->ws)
- };
- const struct unwind_user_frame *frame = &fp_frame;
unsigned long cfa, fp, ra;
if (frame->use_fp) {
@@ -67,6 +66,26 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
return 0;
}
+static int unwind_user_next_sframe(struct unwind_user_state *state)
+{
+ struct unwind_user_frame _frame, *frame;
+
+ /* sframe expects the frame to be local storage */
+ frame = &_frame;
+ if (sframe_find(state->ip, frame))
+ return -ENOENT;
+ return unwind_user_next_common(state, frame, task_pt_regs(current));
+}
+
+static int unwind_user_next_fp(struct unwind_user_state *state)
+{
+ const struct unwind_user_frame fp_frame = {
+ ARCH_INIT_USER_FP_FRAME(state->ws)
+ };
+
+ return unwind_user_next_common(state, &fp_frame, task_pt_regs(current));
+}
+
static int unwind_user_next(struct unwind_user_state *state)
{
unsigned long iter_mask = state->available_types;
@@ -80,6 +99,16 @@ static int unwind_user_next(struct unwind_user_state *state)
state->current_type = type;
switch (type) {
+ case UNWIND_USER_TYPE_SFRAME:
+ switch (unwind_user_next_sframe(state)) {
+ case 0:
+ return 0;
+ case -ENOENT:
+ continue; /* Try next method. */
+ default:
+ state->done = true;
+ }
+ break;
case UNWIND_USER_TYPE_FP:
if (!unwind_user_next_fp(state))
return 0;
@@ -108,6 +137,8 @@ static int unwind_user_start(struct unwind_user_state *state)
return -EINVAL;
}
+ if (current_has_sframe())
+ state->available_types |= UNWIND_USER_TYPE_SFRAME;
if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP))
state->available_types |= UNWIND_USER_TYPE_FP;
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (7 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication Jens Remus
` (8 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
Add an indication for an outermost frame to the unwind user frame
structure and stop unwinding when reaching an outermost frame.
This will be used by unwind user sframe, as SFrame may represent an
undefined return address as indication for an outermost frame.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v11:
- New patch. (Jens)
arch/x86/include/asm/unwind_user.h | 3 ++-
include/linux/unwind_user_types.h | 1 +
kernel/unwind/user.c | 6 ++++++
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 5e0755ea3086..08684bbeff0c 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -6,6 +6,7 @@
.cfa_off = 2*(ws), \
.ra_off = -1*(ws), \
.fp_off = -2*(ws), \
- .use_fp = true,
+ .use_fp = true, \
+ .outermost = false,
#endif /* _ASM_X86_UNWIND_USER_H */
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index ee0ce855e045..e07fee69d315 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -32,6 +32,7 @@ struct unwind_user_frame {
s32 ra_off;
s32 fp_off;
bool use_fp;
+ bool outermost;
};
struct unwind_user_state {
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index f6c543cb255b..c8034a447c16 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -33,6 +33,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
{
unsigned long cfa, fp, ra;
+ /* Stop unwinding when reaching an outermost frame. */
+ if (frame->outermost) {
+ state->done = true;
+ return 0;
+ }
+
if (frame->use_fp) {
if (state->fp < state->sp)
return -EINVAL;
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (8 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
` (7 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
SFrame may represent an undefined return address (RA) as SFrame FRE
without any offsets as indication for an outermost frame.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v11:
- New patch. (Jens)
kernel/unwind/sframe.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 5536374e2a22..bc3e2eb00325 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -24,6 +24,7 @@ struct sframe_fre {
s32 ra_off;
s32 fp_off;
u8 info;
+ bool ra_undefined;
};
DEFINE_STATIC_SRCU(sframe_srcu);
@@ -173,6 +174,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
unsigned char offset_count, offset_size;
s32 cfa_off, ra_off, fp_off;
unsigned long cur = fre_addr;
+ bool ra_undefined = false;
unsigned char addr_size;
u32 ip_off;
u8 info;
@@ -191,7 +193,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
UNSAFE_GET_USER_INC(info, cur, 1, Efault);
offset_count = SFRAME_FRE_OFFSET_COUNT(info);
offset_size = offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(info));
- if (!offset_count || !offset_size)
+ if (!offset_size)
return -EFAULT;
if (cur + (offset_count * offset_size) > sec->fres_end)
@@ -199,6 +201,14 @@ static __always_inline int __read_fre(struct sframe_section *sec,
fre->size = addr_size + 1 + (offset_count * offset_size);
+ if (!offset_count) {
+ cfa_off = 0;
+ ra_off = 0;
+ fp_off = 0;
+ ra_undefined = true;
+ goto done;
+ }
+
UNSAFE_GET_USER_INC(cfa_off, cur, offset_size, Efault);
offset_count--;
@@ -219,11 +229,13 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (offset_count)
return -EFAULT;
+done:
fre->ip_off = ip_off;
fre->cfa_off = cfa_off;
fre->ra_off = ra_off;
fre->fp_off = fp_off;
fre->info = info;
+ fre->ra_undefined = ra_undefined;
return 0;
@@ -285,6 +297,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
frame->ra_off = fre->ra_off;
frame->fp_off = fre->fp_off;
frame->use_fp = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+ frame->outermost = fre->ra_undefined;
return 0;
}
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (9 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
` (6 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
The x86 sframe 2.0 implementation works fairly well, starting with
binutils 2.41 (though some bugs are getting fixed in later versions).
Enable it.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8f94c58d4de8..c3518f145f0d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -300,6 +300,7 @@ config X86
select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_UNWIND_USER_FP if X86_64
+ select HAVE_UNWIND_USER_SFRAME if X86_64
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
select VDSO_GETRANDOM if X86_64
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (10 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output Jens Remus
` (5 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
To avoid continued attempted use of a bad .sframe section, remove it
on demand when the first sign of corruption is detected.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
kernel/unwind/sframe.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index bc3e2eb00325..77ef1f0bb9c5 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -330,6 +330,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
end:
user_read_access_end();
+
+ if (ret == -EFAULT)
+ WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+
return ret;
}
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (11 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option Jens Remus
` (4 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
When debugging sframe issues, the error messages aren't all that helpful
without knowing what file a corresponding .sframe section belongs to.
Prefix debug output strings with the file name.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
include/linux/sframe.h | 4 +++-
kernel/unwind/sframe.c | 23 ++++++++++--------
kernel/unwind/sframe_debug.h | 45 +++++++++++++++++++++++++++++++-----
3 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 9a72209696f9..b79c5ec09229 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -10,7 +10,9 @@
struct sframe_section {
struct rcu_head rcu;
-
+#ifdef CONFIG_DYNAMIC_DEBUG
+ const char *filename;
+#endif
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 77ef1f0bb9c5..82eaf3c5d6b0 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -331,14 +331,17 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
end:
user_read_access_end();
- if (ret == -EFAULT)
+ if (ret == -EFAULT) {
+ dbg_sec("removing bad .sframe section\n");
WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+ }
return ret;
}
static void free_section(struct sframe_section *sec)
{
+ dbg_free(sec);
kfree(sec);
}
@@ -349,7 +352,7 @@ static int sframe_read_header(struct sframe_section *sec)
unsigned int num_fdes;
if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
- dbg("header usercopy failed\n");
+ dbg_sec("header usercopy failed\n");
return -EFAULT;
}
@@ -358,18 +361,18 @@ static int sframe_read_header(struct sframe_section *sec)
!(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
!(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
shdr.auxhdr_len) {
- dbg("bad/unsupported sframe header\n");
+ dbg_sec("bad/unsupported sframe header\n");
return -EINVAL;
}
if (!shdr.num_fdes || !shdr.num_fres) {
- dbg("no fde/fre entries\n");
+ dbg_sec("no fde/fre entries\n");
return -EINVAL;
}
header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
if (header_end >= sec->sframe_end) {
- dbg("header doesn't fit in section\n");
+ dbg_sec("header doesn't fit in section\n");
return -EINVAL;
}
@@ -381,7 +384,7 @@ static int sframe_read_header(struct sframe_section *sec)
fres_end = fres_start + shdr.fre_len;
if (fres_start < fdes_end || fres_end > sec->sframe_end) {
- dbg("inconsistent fde/fre offsets\n");
+ dbg_sec("inconsistent fde/fre offsets\n");
return -EINVAL;
}
@@ -437,6 +440,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_start = text_start;
sec->text_end = text_end;
+ dbg_init(sec);
+
ret = sframe_read_header(sec);
if (ret) {
dbg_print_header(sec);
@@ -445,8 +450,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
- dbg("mtree_insert_range failed: text=%lx-%lx\n",
- sec->text_start, sec->text_end);
+ dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
goto err_free;
}
@@ -468,7 +473,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
- dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
return -EINVAL;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
index 055c8c8fae24..7794bf0bd78c 100644
--- a/kernel/unwind/sframe_debug.h
+++ b/kernel/unwind/sframe_debug.h
@@ -10,26 +10,59 @@
#define dbg(fmt, ...) \
pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+#define dbg_sec(fmt, ...) \
+ dbg("%s: " fmt, sec->filename, ##__VA_ARGS__)
+
static __always_inline void dbg_print_header(struct sframe_section *sec)
{
unsigned long fdes_end;
fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde));
- dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
- "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
- "ra_off:%d fp_off:%d\n",
- sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
- sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
- sec->ra_off, sec->fp_off);
+ dbg_sec("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+static inline void dbg_init(struct sframe_section *sec)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+
+ guard(mmap_read_lock)(mm);
+ vma = vma_lookup(mm, sec->sframe_start);
+ if (!vma)
+ sec->filename = kstrdup("(vma gone???)", GFP_KERNEL);
+ else if (vma->vm_file)
+ sec->filename = kstrdup_quotable_file(vma->vm_file, GFP_KERNEL);
+ else if (vma->vm_ops && vma->vm_ops->name)
+ sec->filename = kstrdup(vma->vm_ops->name(vma), GFP_KERNEL);
+ else if (arch_vma_name(vma))
+ sec->filename = kstrdup(arch_vma_name(vma), GFP_KERNEL);
+ else if (!vma->vm_mm)
+ sec->filename = kstrdup("(vdso)", GFP_KERNEL);
+ else
+ sec->filename = kstrdup("(anonymous)", GFP_KERNEL);
+}
+
+static inline void dbg_free(struct sframe_section *sec)
+{
+ kfree(sec->filename);
}
#else /* !CONFIG_DYNAMIC_DEBUG */
#define dbg(args...) no_printk(args)
+#define dbg_sec(args... ) no_printk(args)
static inline void dbg_print_header(struct sframe_section *sec) {}
+static inline void dbg_init(struct sframe_section *sec) {}
+static inline void dbg_free(struct sframe_section *sec) {}
+
#endif /* !CONFIG_DYNAMIC_DEBUG */
#endif /* _SFRAME_DEBUG_H */
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (12 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
` (3 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add a debug feature to validate all .sframe sections when first loading
the file rather than on demand.
[ Jens Remus: Add support for PC-relative FDE function start address. ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v11:
- Support for SFrame V2 PC-relative FDE function start address. (Jens)
arch/Kconfig | 19 +++++++++
kernel/unwind/sframe.c | 97 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 116 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 277b87af949f..918ebe3c5a85 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -455,6 +455,25 @@ config HAVE_UNWIND_USER_SFRAME
bool
select UNWIND_USER
+config SFRAME_VALIDATION
+ bool "Enable .sframe section debugging"
+ depends on HAVE_UNWIND_USER_SFRAME
+ depends on DYNAMIC_DEBUG
+ help
+ When adding an .sframe section for a task, validate the entire
+ section immediately rather than on demand.
+
+ This is a debug feature which is helpful for rooting out .sframe
+ section issues. If the .sframe section is corrupt, it will fail to
+ load immediately, with more information provided in dynamic printks.
+
+ This has a significant page cache footprint due to its reading of the
+ entire .sframe section for every loaded executable and shared
+ library. Also, it's done for all processes, even those which don't
+ get stack traced by the kernel. Not recommended for general use.
+
+ If unsure, say N.
+
config HAVE_PERF_REGS
bool
help
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 82eaf3c5d6b0..3d7ac4eaa8b7 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -339,6 +339,99 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
return ret;
}
+#ifdef CONFIG_SFRAME_VALIDATION
+
+static int safe_read_fde(struct sframe_section *sec,
+ unsigned int fde_num, struct sframe_fde *fde,
+ unsigned long *fde_start_base)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fde(sec, fde_num, fde, fde_start_base);
+ user_read_access_end();
+ return ret;
+}
+
+static int safe_read_fre(struct sframe_section *sec,
+ struct sframe_fde *fde, unsigned long fre_addr,
+ struct sframe_fre *fre)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ user_read_access_end();
+ return ret;
+}
+
+static int sframe_validate_section(struct sframe_section *sec)
+{
+ unsigned long prev_ip = 0;
+ unsigned int i;
+
+ for (i = 0; i < sec->num_fdes; i++) {
+ struct sframe_fre *fre, *prev_fre = NULL;
+ unsigned long ip, fde_start_base, fre_addr;
+ struct sframe_fde fde;
+ struct sframe_fre fres[2];
+ bool which = false;
+ unsigned int j;
+ int ret;
+
+ ret = safe_read_fde(sec, i, &fde, &fde_start_base);
+ if (ret)
+ return ret;
+
+ ip = fde_start_base + fde.start_addr;
+ if (ip <= prev_ip) {
+ dbg_sec("fde %u not sorted\n", i);
+ return -EFAULT;
+ }
+ prev_ip = ip;
+
+ fre_addr = sec->fres_start + fde.fres_off;
+ for (j = 0; j < fde.fres_num; j++) {
+ int ret;
+
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = safe_read_fre(sec, &fde, fre_addr, fre);
+ if (ret) {
+ dbg_sec("fde %u: __read_fre(%u) failed\n", i, j);
+ dbg_sec("FDE: start_addr:0x%x func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
+ fde.start_addr, fde.func_size,
+ fde.fres_off, fde.fres_num,
+ fde.info, fde.rep_size);
+ return ret;
+ }
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off) {
+ dbg_sec("fde %u: fre %u not sorted\n", i, j);
+ return -EFAULT;
+ }
+
+ prev_fre = fre;
+ }
+ }
+
+ return 0;
+}
+
+#else /* !CONFIG_SFRAME_VALIDATION */
+
+static int sframe_validate_section(struct sframe_section *sec) { return 0; }
+
+#endif /* !CONFIG_SFRAME_VALIDATION */
+
+
static void free_section(struct sframe_section *sec)
{
dbg_free(sec);
@@ -448,6 +541,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
goto err_free;
}
+ ret = sframe_validate_section(sec);
+ if (ret)
+ goto err_free;
+
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (13 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
` (2 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
The kernel doesn't have direct visibility to the ELF contents of shared
libraries. Add some prctl() interfaces which allow glibc to tell the
kernel where to find .sframe sections.
[
This adds an interface for prctl() for testing loading of sframes for
libraries. But this interface should really be a system call. This patch
is for testing purposes only and should not be applied to mainline.
]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
include/uapi/linux/prctl.h | 6 +++++-
kernel/sys.c | 9 +++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index ed3aed264aeb..b807baa8a53b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -358,7 +358,7 @@ struct prctl_mm_map {
* configuration. All bits may be locked via this call, including
* undefined bits.
*/
-#define PR_LOCK_SHADOW_STACK_STATUS 76
+#define PR_LOCK_SHADOW_STACK_STATUS 76
/*
* Controls the mode of timer_create() for CRIU restore operations.
@@ -376,4 +376,8 @@ struct prctl_mm_map {
# define PR_FUTEX_HASH_SET_SLOTS 1
# define PR_FUTEX_HASH_GET_SLOTS 2
+/* SFRAME management */
+#define PR_ADD_SFRAME 79
+#define PR_REMOVE_SFRAME 80
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 1e28b40053ce..e6ce79a3a7aa 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -65,6 +65,7 @@
#include <linux/rcupdate.h>
#include <linux/uidgid.h>
#include <linux/cred.h>
+#include <linux/sframe.h>
#include <linux/nospec.h>
@@ -2805,6 +2806,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_FUTEX_HASH:
error = futex_hash_prctl(arg2, arg3, arg4);
break;
+ case PR_ADD_SFRAME:
+ error = sframe_add_section(arg2, arg3, arg4, arg5);
+ break;
+ case PR_REMOVE_SFRAME:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = sframe_remove_section(arg2);
+ break;
default:
trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5);
error = -EINVAL;
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (14 preceding siblings ...)
2025-10-22 14:43 ` [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
@ 2025-10-22 20:39 ` Andrew Morton
2025-10-22 21:58 ` Steven Rostedt
2025-10-23 8:09 ` Fangrui Song
2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
17 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2025-10-22 20:39 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Florian Weimer, Kees Cook, Carlos O'Donell,
Sam James, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
On Wed, 22 Oct 2025 16:43:11 +0200 Jens Remus <jremus@linux.ibm.com> wrote:
> This is the implementation of parsing the SFrame section in an ELF file.
Presently x86_64-only, it seems. Can we expect to see this implemented
for other architectures?
Would a selftest for this be appropriate? To give testers some way of
exercising the code and make to life better for people who are enabling
this on other architectures.
In what tree do you anticipate this project being carried?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
@ 2025-10-22 21:58 ` Steven Rostedt
0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-10-22 21:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Jens Remus, linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Florian Weimer, Kees Cook, Carlos O'Donell,
Sam James, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
On Wed, 22 Oct 2025 13:39:32 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 22 Oct 2025 16:43:11 +0200 Jens Remus <jremus@linux.ibm.com> wrote:
>
> > This is the implementation of parsing the SFrame section in an ELF file.
>
> Presently x86_64-only, it seems. Can we expect to see this implemented
> for other architectures?
Yes, and Jens is here to port it to the s390 :-)
Currently Peter Zijlstra and I are updating the deferred unwinder. Jens is
working on getting sframes to work with it. His interest is getting it for
s390 whereas ours is for x86.
>
> Would a selftest for this be appropriate? To give testers some way of
> exercising the code and make to life better for people who are enabling
> this on other architectures.
Yes we should definitely have selftests. But we are far from getting there.
One requirement is that the toolchain used to build the test must support
adding sframes.
>
> In what tree do you anticipate this project being carried?
>
It will likely go between tip or my tree.
-- Steve
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (15 preceding siblings ...)
2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
@ 2025-10-23 8:09 ` Fangrui Song
2025-10-23 14:23 ` Steven Rostedt
2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
17 siblings, 1 reply; 28+ messages in thread
From: Fangrui Song @ 2025-10-23 8:09 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
On 2025-10-22, Jens Remus wrote:
>This is the implementation of parsing the SFrame section in an ELF file.
>It's a continuation of Josh's and Steve's last work that can be found
>here:
>
> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>
>Currently the only way to get a user space stack trace from a stack
>walk (and not just copying large amount of user stack into the kernel
>ring buffer) is to use frame pointers. This has a few issues. The biggest
>one is that compiling frame pointers into every application and library
>has been shown to cause performance overhead.
>
>Another issue is that the format of the frames may not always be consistent
>between different compilers and some architectures (s390) has no defined
>format to do a reliable stack walk. The only way to perform user space
>profiling on these architectures is to copy the user stack into the kernel
>buffer.
>
>SFrames[1] is now supported in gcc binutils and soon will also be supported
>by LLVM.
Please consider dropping the statement, "soon will also be supported by LLVM."
Speaking as LLVM's MC, lld/ELF, and binary utilities maintainer, I have significant concerns about the v2 format, specifically its apparent disregard for standard ELF and linker conventions
(https://maskray.me/blog/2025-09-28-remarks-on-sframe#linking-and-execution-views)
To arm64 maintainers, it is critical time to revisit a unwind
information format, as I have outlined in my blog post:
A sorted address table like .eh_frame_hdr might still be needed, but the
design could be very different for arm64.
I am curious whether anyone has thought about a library that parses .eh_frame and generates SFrame.
If objtool integrates this library, it can generate SFrame for vmlinux and modules without relying on assembler/linker.
Linker and assembler requires a level of stability that is currently concerning on the toolchain side.
(https://sourceware.org/pipermail/binutils/2025-October/144974.html
"This "linker will DTRT" assertion glosses over significant
implementation complexity. Each version needs not just a reader but
version-specific *merging* logic in every linker—fundamentally different
from simply reading a format.")
>SFrames acts more like ORC, and lives in the ELF executable
>file as its own section. Like ORC it has two tables where the first table
>is sorted by instruction pointers (IP) and using the current IP and finding
>it's entry in the first table, it will take you to the second table which
>will tell you where the return address of the current function is located
>and then you can use that address to look it up in the first table to find
>the return address of that function, and so on. This performs a user
>space stack walk.
>
>Now because the SFrame section lives in the ELF file it needs to be faulted
>into memory when it is used. This means that walking the user space stack
>requires being in a faultable context. As profilers like perf request a stack
>trace in interrupt or NMI context, it cannot do the walking when it is
>requested. Instead it must be deferred until it is safe to fault in user
>space. One place this is known to be safe is when the task is about to return
>back to user space.
>
>This series makes the deferred unwind code implement SFrames.
>
>[1] https://sourceware.org/binutils/wiki/sframe
>
>Changes since v10:
>- Rebase on v6.17-rc1 with Peter's unwind user fixes and x86 support
> series [2] and Steve's support for the deferred unwinding infrastructure
> series in perf [3] and perf tool [4] on top.
>- Support for SFrame V2 PC-relative FDE function start address. (Jens)
>- Support for SFrame V2 representing RA undefined as indication for
> outermost frames. (Jens)
>
>[2]: [PATCH 00/12] Various fixes and x86 support,
> https://lore.kernel.org/all/20250924075948.579302904@infradead.org/
>[3]: [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure,
> https://lore.kernel.org/all/20251007214008.080852573@kernel.org/
>[4]: [PATCH v16 0/4] perf tool: Support the deferred unwinding infrastructure,
> https://lore.kernel.org/all/20250908175319.841517121@kernel.org/
>
>Patches 1 and 2 are suggested fixups to patches from Peter's unwind user
>fixes and x86 support series. They keep the factoring out of the word
>size from the frame's CFA, FP, and RA offsets local to unwind user fp, as
>unwind user sframe does use absolute offsets.
>
>Patches 3, 6, and 14 have been updated to exclusively support the recent
>PC-relative SFrame FDE function start address encoding. With Binutils 2.45
>the SFrame V2 FDE function start address field value is an offset from the
>field (i.e. PC-relative) instead of from the .sframe section start. This
>is indicated by the new SFrame header flag SFRAME_F_FDE_FUNC_START_PCREL.
>Old SFrame V2 sections get rejected with dynamic debug message
>"bad/unsupported sframe header".
>
>Patches 9 and 10 add support to unwind user and unwind user sframe for
>a recent change of the SFrame V2 format to represent an undefined
>return address as an SFrame FRE without any offsets, which is used as
>indication for outermost frames. Note that currently only a development
>build of Binutils mainline generates SFrame information including this
>new indication for outermost frames. SFrame information without the new
>indication is still supported. Without these patches unwind user sframe
>would identify such new SFrame FREs without any offsets as corrupted and
>remove the .sframe section, causing any any further stack tracing using
>sframe to fail.
>
>Regards,
>Jens
>
>
>Jens Remus (4):
> fixup! unwind: Implement compat fp unwind
> fixup! unwind_user/x86: Enable frame pointer unwinding on x86
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
>
>Josh Poimboeuf (11):
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Store sframe section data in per-mm maple tree
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Wire up unwind_user to sframe
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe: Add prctl() interface for registering .sframe
> sections
>
> MAINTAINERS | 1 +
> arch/Kconfig | 23 ++
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/mmu.h | 2 +-
> arch/x86/include/asm/uaccess.h | 39 +-
> arch/x86/include/asm/unwind_user.h | 11 +-
> fs/binfmt_elf.c | 49 ++-
> include/linux/mm_types.h | 3 +
> include/linux/sframe.h | 60 +++
> include/linux/unwind_user_types.h | 5 +-
> include/uapi/linux/elf.h | 1 +
> include/uapi/linux/prctl.h | 6 +-
> kernel/fork.c | 10 +
> kernel/sys.c | 9 +
> kernel/unwind/Makefile | 3 +-
> kernel/unwind/sframe.c | 615 +++++++++++++++++++++++++++++
> kernel/unwind/sframe.h | 72 ++++
> kernel/unwind/sframe_debug.h | 68 ++++
> kernel/unwind/user.c | 56 ++-
> mm/init-mm.c | 2 +
> 20 files changed, 1004 insertions(+), 32 deletions(-)
> create mode 100644 include/linux/sframe.h
> create mode 100644 kernel/unwind/sframe.c
> create mode 100644 kernel/unwind/sframe.h
> create mode 100644 kernel/unwind/sframe_debug.h
>
>--
>2.48.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
2025-10-23 8:09 ` Fangrui Song
@ 2025-10-23 14:23 ` Steven Rostedt
0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-10-23 14:23 UTC (permalink / raw)
To: Fangrui Song
Cc: Jens Remus, linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
On Thu, 23 Oct 2025 01:09:02 -0700
Fangrui Song <maskray@sourceware.org> wrote:
> Please consider dropping the statement, "soon will also be supported by LLVM."
> Speaking as LLVM's MC, lld/ELF, and binary utilities maintainer, I have
> significant concerns about the v2 format, specifically its apparent
> disregard for standard ELF and linker conventions
> (https://maskray.me/blog/2025-09-28-remarks-on-sframe#linking-and-execution-views)
Please note, v2 can be dropped entirely. There's no plans to have the Linux
kernel ship with v2. The patches for v2 for the Linux kernel are for
testing purposes only (which was what help find the issues with v2).
The plan is to have v3 be the first versions supported by an official
release of the Linux kernel with the assumptions that changes after v3 will
be minimal.
The reason there was such a big difference between v2 and v3 is because v2
was the first version to have a consumer try to use it in a more production
like environment. This found several corner cases that needed to be
addressed, and that the current layout of v2 was not acceptable.
No linker needs to support v2 as there will be no consumers of it.
-- Steve
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents
2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2025-10-23 16:04 ` Jens Remus
0 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-23 16:04 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik, Steven Rostedt (Google)
Hi Steve, et al.,
as discussed during yesterdays SFrame call I will be sending two RFC
fixup patches shortly as POC to demonstrate how this patch and
"[PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option"
could benefit from introducing an internal FDE representation (e.g.
struct sframe_fde_internal) similar to the used internal FRE
representation (struct sframe_fre).
On 10/22/2025 4:43 PM, Jens Remus wrote:
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> +static __always_inline int __read_fde(struct sframe_section *sec,
> + unsigned int fde_num,
> + struct sframe_fde *fde,
> + unsigned long *fde_start_base)
The goal would be to eliminate the passing through of fde_start_base as
well as the various computations of the effective function start address
(= *fde_start_base + fde->start_addr) throughout this module. The
internal FDE representation could then simply convey the effective
function start address via an "unsigned long func_start_addr" field.
> +{
> + unsigned long fde_addr, ip;
> +
> + fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
> + unsafe_copy_from_user(fde, (void __user *)fde_addr,
> + sizeof(struct sframe_fde), Efault);
> +
> + ip = fde_addr + fde->start_addr;
> + if (ip < sec->text_start || ip > sec->text_end)
> + return -EINVAL;
> +
> + *fde_start_base = fde_addr;
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
> +
> +static __always_inline int __find_fde(struct sframe_section *sec,
> + unsigned long ip,
> + struct sframe_fde *fde,
> + unsigned long *fde_start_base)
fde_start_base would get eliminated.
> +{
> + unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
> + struct sframe_fde __user *first, *low, *high, *found = NULL;
> + int ret;
> +
> + first = (void __user *)sec->fdes_start;
> + low = first;
> + high = first + sec->num_fdes - 1;
> +
> + while (low <= high) {
> + struct sframe_fde __user *mid;
> + s32 func_off;
> + unsigned long func_addr;
> +
> + mid = low + ((high - low) / 2);
> +
> + unsafe_get_user(func_off, (s32 __user *)mid, Efault);
> + func_addr = (unsigned long)mid + func_off;
> +
> + if (ip >= func_addr) {
> + if (func_addr < func_addr_low)
> + return -EFAULT;
> +
> + func_addr_low = func_addr;
> +
> + found = mid;
> + low = mid + 1;
> + } else {
> + if (func_addr > func_addr_high)
> + return -EFAULT;
> +
> + func_addr_high = func_addr;
> +
> + high = mid - 1;
> + }
> + }
> +
> + if (!found)
> + return -EINVAL;
> +
> + ret = __read_fde(sec, found - first, fde, fde_start_base);
fde_start_base would get eliminated.
> + if (ret)
> + return ret;
> +
> + /* make sure it's not in a gap */
> + if (ip < *fde_start_base + fde->start_addr ||
> + ip >= *fde_start_base + fde->start_addr + fde->func_size)
Would simplify to:
if (ip < fde->func_start_addr ||
ip >= fde->func_start_addr + fde->func_size)
> + return -EINVAL;
> +
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
> +static __always_inline int __find_fre(struct sframe_section *sec,
> + struct sframe_fde *fde,
> + unsigned long fde_start_base,
fde_start_base would get eliminated.
> + unsigned long ip,
> + struct unwind_user_frame *frame)
> +{
> + unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
> + struct sframe_fre *fre, *prev_fre = NULL;
> + struct sframe_fre fres[2];
> + unsigned long fre_addr;
> + bool which = false;
> + unsigned int i;
> + u32 ip_off;
> +
> + ip_off = ip - (fde_start_base + fde->start_addr);
Would simplify to:
ip_off = ip - fde->func_start_addr;
> +
> + if (fde_type == SFRAME_FDE_TYPE_PCMASK)
> + ip_off %= fde->rep_size;
> +
> + fre_addr = sec->fres_start + fde->fres_off;
> +
> + for (i = 0; i < fde->fres_num; i++) {
> + int ret;
> +
> + /*
> + * Alternate between the two fre_addr[] entries for 'fre' and
> + * 'prev_fre'.
> + */
> + fre = which ? fres : fres + 1;
> + which = !which;
> +
> + ret = __read_fre(sec, fde, fre_addr, fre);
> + if (ret)
> + return ret;
> +
> + fre_addr += fre->size;
> +
> + if (prev_fre && fre->ip_off <= prev_fre->ip_off)
> + return -EFAULT;
> +
> + if (fre->ip_off > ip_off)
> + break;
> +
> + prev_fre = fre;
> + }
> +
> + if (!prev_fre)
> + return -EINVAL;
> + fre = prev_fre;
> +
> + frame->cfa_off = fre->cfa_off;
> + frame->ra_off = fre->ra_off;
> + frame->fp_off = fre->fp_off;
> + frame->use_fp = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
> +
> + return 0;
> +}
> +
> +int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
> +{
> + struct mm_struct *mm = current->mm;
> + struct sframe_section *sec;
> + struct sframe_fde fde;
> + unsigned long fde_start_base;
fde_start_base would get eliminated.
> + int ret;
> +
> + if (!mm)
> + return -EINVAL;
> +
> + guard(srcu)(&sframe_srcu);
> +
> + sec = mtree_load(&mm->sframe_mt, ip);
> + if (!sec)
> + return -EINVAL;
> +
> + if (!user_read_access_begin((void __user *)sec->sframe_start,
> + sec->sframe_end - sec->sframe_start))
> + return -EFAULT;
> +
> + ret = __find_fde(sec, ip, &fde, &fde_start_base);
fde_start_base would get eliminated.
> + if (ret)
> + goto end;
> +
> + ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
fde_start_base would get eliminated.
> +end:
> + user_read_access_end();
> + return ret;
> +}
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com
IBM
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
` (16 preceding siblings ...)
2025-10-23 8:09 ` Fangrui Song
@ 2025-10-23 16:05 ` Jens Remus
2025-10-23 16:05 ` [RFC PATCH 2/2] fixup! unwind_user/sframe: Add .sframe validation option Jens Remus
17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-23 16:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
This RFC fixup is POC to demonstrate how the SFrame reading code
could benefit from introducing an internal FDE representation (struct
sframe_fde_internal) similar to the used internal FRE representation
(struct sframe_fre). The goal is to eliminate the passing through of
fde_start_base in many places as well as the various computations of the
effective function start address (= *fde_start_base + fde->start_addr)
throughout this module. The internal FDE representation simply conveys
the effective function start address via the "unsigned long
func_start_addr" field.
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
kernel/unwind/sframe.c | 52 ++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 20 deletions(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 3d7ac4eaa8b7..f88fc2c92c58 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -17,6 +17,15 @@
#include "sframe.h"
#include "sframe_debug.h"
+struct sframe_fde_internal {
+ unsigned long func_start_addr;
+ u32 func_size;
+ u32 fres_off;
+ u32 fres_num;
+ u8 info;
+ u8 rep_size;
+};
+
struct sframe_fre {
unsigned int size;
u32 ip_off;
@@ -45,20 +54,26 @@ static __always_inline unsigned char offset_size_enum_to_size(unsigned char off_
static __always_inline int __read_fde(struct sframe_section *sec,
unsigned int fde_num,
- struct sframe_fde *fde,
- unsigned long *fde_start_base)
+ struct sframe_fde_internal *fde)
{
- unsigned long fde_addr, ip;
+ unsigned long fde_addr, func_addr;
+ struct sframe_fde _fde;
fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
- unsafe_copy_from_user(fde, (void __user *)fde_addr,
+ unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
sizeof(struct sframe_fde), Efault);
- ip = fde_addr + fde->start_addr;
- if (ip < sec->text_start || ip > sec->text_end)
+ func_addr = fde_addr + _fde.start_addr;
+ if (func_addr < sec->text_start || func_addr > sec->text_end)
return -EINVAL;
- *fde_start_base = fde_addr;
+ fde->func_start_addr = func_addr;
+ fde->func_size = _fde.func_size;
+ fde->fres_off = _fde.fres_off;
+ fde->fres_num = _fde.fres_num;
+ fde->info = _fde.info;
+ fde->rep_size = _fde.rep_size;
+
return 0;
Efault:
@@ -67,8 +82,7 @@ static __always_inline int __read_fde(struct sframe_section *sec,
static __always_inline int __find_fde(struct sframe_section *sec,
unsigned long ip,
- struct sframe_fde *fde,
- unsigned long *fde_start_base)
+ struct sframe_fde_internal *fde)
{
unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
struct sframe_fde __user *first, *low, *high, *found = NULL;
@@ -109,13 +123,13 @@ static __always_inline int __find_fde(struct sframe_section *sec,
if (!found)
return -EINVAL;
- ret = __read_fde(sec, found - first, fde, fde_start_base);
+ ret = __read_fde(sec, found - first, fde);
if (ret)
return ret;
/* make sure it's not in a gap */
- if (ip < *fde_start_base + fde->start_addr ||
- ip >= *fde_start_base + fde->start_addr + fde->func_size)
+ if (ip < fde->func_start_addr ||
+ ip >= fde->func_start_addr + fde->func_size)
return -EINVAL;
return 0;
@@ -165,7 +179,7 @@ static __always_inline int __find_fde(struct sframe_section *sec,
s32: UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
static __always_inline int __read_fre(struct sframe_section *sec,
- struct sframe_fde *fde,
+ struct sframe_fde_internal *fde,
unsigned long fre_addr,
struct sframe_fre *fre)
{
@@ -244,8 +258,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
}
static __always_inline int __find_fre(struct sframe_section *sec,
- struct sframe_fde *fde,
- unsigned long fde_start_base,
+ struct sframe_fde_internal *fde,
unsigned long ip,
struct unwind_user_frame *frame)
{
@@ -257,7 +270,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
unsigned int i;
u32 ip_off;
- ip_off = ip - (fde_start_base + fde->start_addr);
+ ip_off = ip - fde->func_start_addr;
if (fde_type == SFRAME_FDE_TYPE_PCMASK)
ip_off %= fde->rep_size;
@@ -306,8 +319,7 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
{
struct mm_struct *mm = current->mm;
struct sframe_section *sec;
- struct sframe_fde fde;
- unsigned long fde_start_base;
+ struct sframe_fde_internal fde;
int ret;
if (!mm)
@@ -323,11 +335,11 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
sec->sframe_end - sec->sframe_start))
return -EFAULT;
- ret = __find_fde(sec, ip, &fde, &fde_start_base);
+ ret = __find_fde(sec, ip, &fde);
if (ret)
goto end;
- ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
+ ret = __find_fre(sec, &fde, ip, frame);
end:
user_read_access_end();
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH 2/2] fixup! unwind_user/sframe: Add .sframe validation option
2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2025-10-23 16:05 ` Jens Remus
0 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-23 16:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
This RFC fixup is POC to demonstrate how the SFrame validation code
would adjust if introducing an internal FDE representation (struct
sframe_fde_internal) similar to the used internal FRE representation
(struct sframe_fre) in the SFrame reading code. The goal is to
eliminate the passing through of fde_start_base in many places as well
as the various computations of the effective function start address
(= *fde_start_base + fde->start_addr) throughout this module. The
internal FDE representation simply conveys the effective function start
address via the "unsigned long func_start_addr" field.
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
kernel/unwind/sframe.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index f88fc2c92c58..f2977c010117 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -354,21 +354,21 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
#ifdef CONFIG_SFRAME_VALIDATION
static int safe_read_fde(struct sframe_section *sec,
- unsigned int fde_num, struct sframe_fde *fde,
- unsigned long *fde_start_base)
+ unsigned int fde_num, struct sframe_fde_internal *fde)
{
int ret;
if (!user_read_access_begin((void __user *)sec->sframe_start,
sec->sframe_end - sec->sframe_start))
return -EFAULT;
- ret = __read_fde(sec, fde_num, fde, fde_start_base);
+ ret = __read_fde(sec, fde_num, fde);
user_read_access_end();
return ret;
}
static int safe_read_fre(struct sframe_section *sec,
- struct sframe_fde *fde, unsigned long fre_addr,
+ struct sframe_fde_internal *fde,
+ unsigned long fre_addr,
struct sframe_fre *fre)
{
int ret;
@@ -388,18 +388,18 @@ static int sframe_validate_section(struct sframe_section *sec)
for (i = 0; i < sec->num_fdes; i++) {
struct sframe_fre *fre, *prev_fre = NULL;
- unsigned long ip, fde_start_base, fre_addr;
- struct sframe_fde fde;
+ unsigned long ip, fre_addr;
+ struct sframe_fde_internal fde;
struct sframe_fre fres[2];
bool which = false;
unsigned int j;
int ret;
- ret = safe_read_fde(sec, i, &fde, &fde_start_base);
+ ret = safe_read_fde(sec, i, &fde);
if (ret)
return ret;
- ip = fde_start_base + fde.start_addr;
+ ip = fde.func_start_addr;
if (ip <= prev_ip) {
dbg_sec("fde %u not sorted\n", i);
return -EFAULT;
@@ -416,8 +416,8 @@ static int sframe_validate_section(struct sframe_section *sec)
ret = safe_read_fre(sec, &fde, fre_addr, fre);
if (ret) {
dbg_sec("fde %u: __read_fre(%u) failed\n", i, j);
- dbg_sec("FDE: start_addr:0x%x func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
- fde.start_addr, fde.func_size,
+ dbg_sec("FDE: func_start_addr:0x%lx func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
+ fde.func_start_addr, fde.func_size,
fde.fres_off, fde.fres_num,
fde.info, fde.rep_size);
return ret;
--
2.48.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2025-10-24 13:44 ` Peter Zijlstra
2025-10-24 14:29 ` Jens Remus
0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2025-10-24 13:44 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik,
Steven Rostedt (Google)
On Wed, Oct 22, 2025 at 04:43:19PM +0200, Jens Remus wrote:
> @@ -26,12 +27,10 @@ get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
> return get_user(*word, addr);
> }
>
> -static int unwind_user_next_fp(struct unwind_user_state *state)
> +static int unwind_user_next_common(struct unwind_user_state *state,
> + const struct unwind_user_frame *frame,
> + struct pt_regs *regs)
> {
What is pt_regs for? AFAICT it isn't actually used in any of the
following patches.
> - const struct unwind_user_frame fp_frame = {
> - ARCH_INIT_USER_FP_FRAME(state->ws)
> - };
> - const struct unwind_user_frame *frame = &fp_frame;
> unsigned long cfa, fp, ra;
>
> if (frame->use_fp) {
> @@ -67,6 +66,26 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
> return 0;
> }
>
> +static int unwind_user_next_sframe(struct unwind_user_state *state)
> +{
> + struct unwind_user_frame _frame, *frame;
> +
> + /* sframe expects the frame to be local storage */
> + frame = &_frame;
> + if (sframe_find(state->ip, frame))
> + return -ENOENT;
> + return unwind_user_next_common(state, frame, task_pt_regs(current));
> +}
Would it not be simpler to write:
static int unwind_user_next_sframe(struct unwind_user_state *state)
{
struct unwind_user_frame frame;
/* sframe expects the frame to be local storage */
if (sframe_find(state->ip, &frame))
return -ENOENT;
return unwind_user_next_common(state, &frame, task_pt_regs(current));
}
hmm?
> +static int unwind_user_next_fp(struct unwind_user_state *state)
> +{
> + const struct unwind_user_frame fp_frame = {
> + ARCH_INIT_USER_FP_FRAME(state->ws)
> + };
> +
> + return unwind_user_next_common(state, &fp_frame, task_pt_regs(current));
> +}
> +
> static int unwind_user_next(struct unwind_user_state *state)
> {
> unsigned long iter_mask = state->available_types;
> @@ -80,6 +99,16 @@ static int unwind_user_next(struct unwind_user_state *state)
>
> state->current_type = type;
> switch (type) {
> + case UNWIND_USER_TYPE_SFRAME:
> + switch (unwind_user_next_sframe(state)) {
> + case 0:
> + return 0;
> + case -ENOENT:
> + continue; /* Try next method. */
> + default:
> + state->done = true;
> + }
> + break;
Should it remove SFRAME from state->available_types at this point?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
2025-10-24 13:44 ` Peter Zijlstra
@ 2025-10-24 14:29 ` Jens Remus
2025-10-24 19:00 ` Steven Rostedt
0 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-24 14:29 UTC (permalink / raw)
To: Peter Zijlstra, Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Borislav Petkov,
Dave Hansen, David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik,
Steven Rostedt (Google)
On 10/24/2025 3:44 PM, Peter Zijlstra wrote:
> On Wed, Oct 22, 2025 at 04:43:19PM +0200, Jens Remus wrote:
>
>> @@ -26,12 +27,10 @@ get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
>> return get_user(*word, addr);
>> }
>>
>> -static int unwind_user_next_fp(struct unwind_user_state *state)
>> +static int unwind_user_next_common(struct unwind_user_state *state,
>> + const struct unwind_user_frame *frame,
>> + struct pt_regs *regs)
>> {
>
> What is pt_regs for? AFAICT it isn't actually used in any of the
> following patches.
Good catch! No idea. It started to appear in v9 of the series:
[PATCH v8 06/12] unwind_user/sframe: Wire up unwind_user to sframe
https://lore.kernel.org/all/20250708021159.386608979@kernel.org/
[PATCH v9 06/11] unwind_user/sframe: Wire up unwind_user to sframe
https://lore.kernel.org/all/20250717012936.619600891@kernel.org/
My s390 support for unwind user sframe will make use of it, but it
should better be introduced there then.
@Steven: Any idea why you added pt_regs? Your v9 even had this other
instance of unused pt_regs:
+static struct unwind_user_frame *get_fp_frame(struct pt_regs *regs)
+{
+ return &fp_frame;
+}
>> @@ -67,6 +66,26 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
>> return 0;
>> }
>>
>> +static int unwind_user_next_sframe(struct unwind_user_state *state)
>> +{
>> + struct unwind_user_frame _frame, *frame;
>> +
>> + /* sframe expects the frame to be local storage */
>> + frame = &_frame;
>> + if (sframe_find(state->ip, frame))
>> + return -ENOENT;
>> + return unwind_user_next_common(state, frame, task_pt_regs(current));
>> +}
>
> Would it not be simpler to write:
>
> static int unwind_user_next_sframe(struct unwind_user_state *state)
> {
> struct unwind_user_frame frame;
>
> /* sframe expects the frame to be local storage */
> if (sframe_find(state->ip, &frame))
> return -ENOENT;
> return unwind_user_next_common(state, &frame, task_pt_regs(current));
> }
>
> hmm?
I agree. Must have been a leftover from changes from v8 to v9.
>> @@ -80,6 +99,16 @@ static int unwind_user_next(struct unwind_user_state *state)
>>
>> state->current_type = type;
>> switch (type) {
>> + case UNWIND_USER_TYPE_SFRAME:
>> + switch (unwind_user_next_sframe(state)) {
>> + case 0:
>> + return 0;
>> + case -ENOENT:
>> + continue; /* Try next method. */
>> + default:
>> + state->done = true;
>> + }
>> + break;
>
> Should it remove SFRAME from state->available_types at this point?
In the -ENOENT case? If the reason is that there was either no SFrame
section or no SFrame information (SFrame FRE) for the IP, then SFRAME
could potentially be successful with the next IP in the call chain.
Provided the other unwind methods do correctly unwind both SP and FP.
@Steven: What is your opinion on this?
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com
IBM
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
2025-10-24 14:29 ` Jens Remus
@ 2025-10-24 19:00 ` Steven Rostedt
0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-10-24 19:00 UTC (permalink / raw)
To: Jens Remus
Cc: Peter Zijlstra, Steven Rostedt, linux-kernel, linux-trace-kernel,
bpf, x86, linux-mm, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
On Fri, 24 Oct 2025 16:29:07 +0200
Jens Remus <jremus@linux.ibm.com> wrote:
> @Steven: Any idea why you added pt_regs? Your v9 even had this other
> instance of unused pt_regs:
>
> +static struct unwind_user_frame *get_fp_frame(struct pt_regs *regs)
> +{
> + return &fp_frame;
> +}
According to the history:
https://lore.kernel.org/linux-trace-kernel/20250717012848.927473176@kernel.org/
Which has:
Changes since v8: https://lore.kernel.org/linux-trace-kernel/20250708021115.894007410@kernel.org/
- Rebased on the changes by Mathieu in the kernel/unwind/user.c file
https://lore.kernel.org/all/20250710164301.3094-2-mathieu.desnoyers@efficios.com/
It looks like it came in from Mathieu's updates, which was trying to deal
with compat. But then after noticing that compat wasn't working on my tests
boxes, I removed it. The removal failed to notice that regs is now unused.
-- Steve
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers
2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2025-11-18 17:04 ` Jens Remus
2025-11-18 19:26 ` Steven Rostedt
0 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-11-18 17:04 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf
Cc: Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar,
Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Borislav Petkov,
Dave Hansen, David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik,
Steven Rostedt (Google)
Hello Josh and Steven!
On 10/22/2025 4:43 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> In preparation for unwinding user space stacks with sframe, add basic
> sframe compile infrastructure and support for reading the .sframe
> section header.
>
> sframe_add_section() reads the header and unconditionally returns an
> error, so it's not very useful yet. A subsequent patch will improve
> that.
>
> Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
>
> [ Jens Remus: Add support for PC-relative FDE function start address. ]
I took a closer look and wondered whether some parts should better be
moved to subsequent patches.
> diff --git a/include/linux/sframe.h b/include/linux/sframe.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_SFRAME_H
> +#define _LINUX_SFRAME_H
> +
> +#include <linux/mm_types.h>
Move to "[PATCH v11 04/15] unwind_user/sframe: Store sframe section data
in per-mm maple tree".
> +#include <linux/unwind_user_types.h>
Move to "[PATCH v11 06/15] unwind_user/sframe: Add support for reading
.sframe contents". find_sframe() needs the types.
> +
> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> +
> +struct sframe_section {
> + unsigned long sframe_start;
> + unsigned long sframe_end;
> + unsigned long text_start;
> + unsigned long text_end;
> +
> + unsigned long fdes_start;
> + unsigned long fres_start;
> + unsigned long fres_end;
> + unsigned int num_fdes;
> +
> + signed char ra_off;
> + signed char fp_off;
> +};
> +
> +extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> + unsigned long text_start, unsigned long text_end);
> +extern int sframe_remove_section(unsigned long sframe_addr);
> +
> +#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
> +
> +static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> + unsigned long text_start, unsigned long text_end)
> +{
> + return -ENOSYS;
> +}
> +static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
> +
> +#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
> +
> +#endif /* _LINUX_SFRAME_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> @@ -0,0 +1,137 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Userspace sframe access functions
> + */
> +
> +#define pr_fmt(fmt) "sframe: " fmt
> +
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/srcu.h>
> +#include <linux/uaccess.h>
> +#include <linux/mm.h>
Move to "[PATCH v11 04/15] unwind_user/sframe: Store sframe section data
in per-mm maple tree".
> +#include <linux/string_helpers.h>
> +#include <linux/sframe.h>
> +#include <linux/unwind_user_types.h>
> +
> +#include "sframe.h"
> +
> +#define dbg(fmt, ...) \
> + pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
> +
> +static void free_section(struct sframe_section *sec)
> +{
> + kfree(sec);
> +}
> +
> +static int sframe_read_header(struct sframe_section *sec)
> +{
> + unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
> + struct sframe_header shdr;
> + unsigned int num_fdes;
> +
> + if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
> + dbg("header usercopy failed\n");
> + return -EFAULT;
> + }
> +
> + if (shdr.preamble.magic != SFRAME_MAGIC ||
> + shdr.preamble.version != SFRAME_VERSION_2 ||
> + !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
> + !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
> + shdr.auxhdr_len) {
> + dbg("bad/unsupported sframe header\n");
> + return -EINVAL;
> + }
> +
> + if (!shdr.num_fdes || !shdr.num_fres) {
> + dbg("no fde/fre entries\n");
> + return -EINVAL;
> + }
> +
> + header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
> + if (header_end >= sec->sframe_end) {
> + dbg("header doesn't fit in section\n");
> + return -EINVAL;
> + }
> +
> + num_fdes = shdr.num_fdes;
> + fdes_start = header_end + shdr.fdes_off;
> + fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde));
> +
> + fres_start = header_end + shdr.fres_off;
> + fres_end = fres_start + shdr.fre_len;
> +
> + if (fres_start < fdes_end || fres_end > sec->sframe_end) {
> + dbg("inconsistent fde/fre offsets\n");
> + return -EINVAL;
> + }
> +
> + sec->num_fdes = num_fdes;
> + sec->fdes_start = fdes_start;
> + sec->fres_start = fres_start;
> + sec->fres_end = fres_end;
> +
> + sec->ra_off = shdr.cfa_fixed_ra_offset;
> + sec->fp_off = shdr.cfa_fixed_fp_offset;
> +
> + return 0;
> +}
> +
> +int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> + unsigned long text_start, unsigned long text_end)
> +{
> + struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
Move to "[PATCH v11 04/15] unwind_user/sframe: Store sframe section data
in per-mm maple tree".
> + struct vm_area_struct *sframe_vma, *text_vma;
> + struct mm_struct *mm = current->mm;
> + struct sframe_section *sec;
> + int ret;
> +
> + if (!sframe_start || !sframe_end || !text_start || !text_end) {
> + dbg("zero-length sframe/text address\n");
> + return -EINVAL;
> + }
> +
> + scoped_guard(mmap_read_lock, mm) {
> + sframe_vma = vma_lookup(mm, sframe_start);
> + if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
> + dbg("bad sframe address (0x%lx - 0x%lx)\n",
> + sframe_start, sframe_end);
> + return -EINVAL;
> + }
> +
> + text_vma = vma_lookup(mm, text_start);
> + if (!text_vma ||
> + !(text_vma->vm_flags & VM_EXEC) ||
> + text_end > text_vma->vm_end) {
> + dbg("bad text address (0x%lx - 0x%lx)\n",
> + text_start, text_end);
> + return -EINVAL;
> + }
> + }
> +
> + sec = kzalloc(sizeof(*sec), GFP_KERNEL);
> + if (!sec)
> + return -ENOMEM;
> +
> + sec->sframe_start = sframe_start;
> + sec->sframe_end = sframe_end;
> + sec->text_start = text_start;
> + sec->text_end = text_end;
> +
> + ret = sframe_read_header(sec);
> + if (ret)
> + goto err_free;
> +
> + /* TODO nowhere to store it yet - just free it and return an error */
An alternative would be to move sframe_add_section() to
"[PATCH v11 04/15] unwind_user/sframe: Store sframe section data in
per-mm maple tree" and reorder the patches as outlined below.
> + ret = -ENOSYS;
> +
> +err_free:
> + free_section(sec);
> + return ret;
> +}
> +
> +int sframe_remove_section(unsigned long sframe_start)
> +{
> + return -ENOSYS;
> +}
I wonder whether the series should be restructured as follows:
unwind_user/sframe: Store .sframe section data in per-mm maple tree
unwind_user/sframe: Detect .sframe sections in executables
unwind_user/sframe: Add support for reading .sframe headers
unwind_user/sframe: Add support for reading .sframe contents
unwind_user/sframe: Wire up unwind_user to sframe
x86/uaccess: Add unsafe_copy_from_user() implementation
unwind_user/sframe/x86: Enable sframe unwinding on x86
unwind_user: Stop when reaching an outermost frame
unwind_user/sframe: Add support for outermost frame indication
unwind_user/sframe: Remove .sframe section on detected corruption
unwind_user/sframe: Show file name in debug output
unwind_user/sframe: Add .sframe validation option
unwind_user/sframe: Add prctl() interface for registering .sframe sections
While moving sframe_add_section() and sframe_remove_section() from
"unwind_user/sframe: Add support for reading .sframe headers" to
"unwind_user/sframe: Store .sframe section data in per-mm maple tree" or
into a new second patch, as they depend on the first and are required
by the third.
What are your thoughts? The reordering might be wasted effort.
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com
IBM
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers
2025-11-18 17:04 ` Jens Remus
@ 2025-11-18 19:26 ` Steven Rostedt
0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-11-18 19:26 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
On Tue, 18 Nov 2025 18:04:27 +0100
Jens Remus <jremus@linux.ibm.com> wrote:
> I wonder whether the series should be restructured as follows:
>
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Wire up unwind_user to sframe
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe: Add prctl() interface for registering .sframe sections
>
> While moving sframe_add_section() and sframe_remove_section() from
> "unwind_user/sframe: Add support for reading .sframe headers" to
> "unwind_user/sframe: Store .sframe section data in per-mm maple tree" or
> into a new second patch, as they depend on the first and are required
> by the third.
>
> What are your thoughts? The reordering might be wasted effort.
If you feel it makes it better, sure, go ahead and do it.
-- Steve
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-11-18 19:26 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
2025-10-22 14:43 ` [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86 Jens Remus
2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
2025-11-18 17:04 ` Jens Remus
2025-11-18 19:26 ` Steven Rostedt
2025-10-22 14:43 ` [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree Jens Remus
2025-10-22 14:43 ` [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2025-10-23 16:04 ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
2025-10-24 13:44 ` Peter Zijlstra
2025-10-24 14:29 ` Jens Remus
2025-10-24 19:00 ` Steven Rostedt
2025-10-22 14:43 ` [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame Jens Remus
2025-10-22 14:43 ` [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication Jens Remus
2025-10-22 14:43 ` [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
2025-10-22 14:43 ` [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
2025-10-22 14:43 ` [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output Jens Remus
2025-10-22 14:43 ` [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option Jens Remus
2025-10-22 14:43 ` [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
2025-10-22 21:58 ` Steven Rostedt
2025-10-23 8:09 ` Fangrui Song
2025-10-23 14:23 ` Steven Rostedt
2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2025-10-23 16:05 ` [RFC PATCH 2/2] fixup! unwind_user/sframe: Add .sframe validation option Jens Remus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox