linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/15] unwind_deferred: Implement sframe handling
@ 2025-10-22 14:43 Jens Remus
  2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
                   ` (17 more replies)
  0 siblings, 18 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

This is the implementation of parsing the SFrame section in an ELF file.
It's a continuation of Josh's and Steve's last work that can be found
here:

   https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
   https://lore.kernel.org/all/20250827201548.448472904@kernel.org/

Currently the only way to get a user space stack trace from a stack
walk (and not just copying large amount of user stack into the kernel
ring buffer) is to use frame pointers. This has a few issues. The biggest
one is that compiling frame pointers into every application and library
has been shown to cause performance overhead.

Another issue is that the format of the frames may not always be consistent
between different compilers and some architectures (s390) has no defined
format to do a reliable stack walk. The only way to perform user space
profiling on these architectures is to copy the user stack into the kernel
buffer.

SFrames[1] is now supported in gcc binutils and soon will also be supported
by LLVM. SFrames acts more like ORC, and lives in the ELF executable
file as its own section. Like ORC it has two tables where the first table
is sorted by instruction pointers (IP) and using the current IP and finding
it's entry in the first table, it will take you to the second table which
will tell you where the return address of the current function is located
and then you can use that address to look it up in the first table to find
the return address of that function, and so on. This performs a user
space stack walk.

Now because the SFrame section lives in the ELF file it needs to be faulted
into memory when it is used. This means that walking the user space stack
requires being in a faultable context. As profilers like perf request a stack
trace in interrupt or NMI context, it cannot do the walking when it is
requested. Instead it must be deferred until it is safe to fault in user
space. One place this is known to be safe is when the task is about to return
back to user space.

This series makes the deferred unwind code implement SFrames.

[1] https://sourceware.org/binutils/wiki/sframe

Changes since v10:
- Rebase on v6.17-rc1 with Peter's unwind user fixes and x86 support
  series [2] and Steve's support for the deferred unwinding infrastructure
  series in perf [3] and perf tool [4] on top.
- Support for SFrame V2 PC-relative FDE function start address. (Jens)
- Support for SFrame V2 representing RA undefined as indication for
  outermost frames. (Jens)

[2]: [PATCH 00/12] Various fixes and x86 support,
     https://lore.kernel.org/all/20250924075948.579302904@infradead.org/
[3]: [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure,
     https://lore.kernel.org/all/20251007214008.080852573@kernel.org/
[4]: [PATCH v16 0/4] perf tool: Support the deferred unwinding infrastructure,
     https://lore.kernel.org/all/20250908175319.841517121@kernel.org/

Patches 1 and 2 are suggested fixups to patches from Peter's unwind user
fixes and x86 support series.  They keep the factoring out of the word
size from the frame's CFA, FP, and RA offsets local to unwind user fp, as
unwind user sframe does use absolute offsets.

Patches 3, 6, and 14 have been updated to exclusively support the recent
PC-relative SFrame FDE function start address encoding.  With Binutils 2.45
the SFrame V2 FDE function start address field value is an offset from the
field (i.e. PC-relative) instead of from the .sframe section start.  This
is indicated by the new SFrame header flag SFRAME_F_FDE_FUNC_START_PCREL.
Old SFrame V2 sections get rejected with dynamic debug message
"bad/unsupported sframe header".

Patches 9 and 10 add support to unwind user and unwind user sframe for
a recent change of the SFrame V2 format to represent an undefined
return address as an SFrame FRE without any offsets, which is used as
indication for outermost frames.  Note that currently only a development
build of Binutils mainline generates SFrame information including this
new indication for outermost frames.  SFrame information without the new
indication is still supported.  Without these patches unwind user sframe
would identify such new SFrame FREs without any offsets as corrupted and
remove the .sframe section, causing any any further stack tracing using
sframe to fail.

Regards,
Jens


Jens Remus (4):
  fixup! unwind: Implement compat fp unwind
  fixup! unwind_user/x86: Enable frame pointer unwinding on x86
  unwind_user: Stop when reaching an outermost frame
  unwind_user/sframe: Add support for outermost frame indication

Josh Poimboeuf (11):
  unwind_user/sframe: Add support for reading .sframe headers
  unwind_user/sframe: Store sframe section data in per-mm maple tree
  x86/uaccess: Add unsafe_copy_from_user() implementation
  unwind_user/sframe: Add support for reading .sframe contents
  unwind_user/sframe: Detect .sframe sections in executables
  unwind_user/sframe: Wire up unwind_user to sframe
  unwind_user/sframe/x86: Enable sframe unwinding on x86
  unwind_user/sframe: Remove .sframe section on detected corruption
  unwind_user/sframe: Show file name in debug output
  unwind_user/sframe: Add .sframe validation option
  unwind_user/sframe: Add prctl() interface for registering .sframe
    sections

 MAINTAINERS                        |   1 +
 arch/Kconfig                       |  23 ++
 arch/x86/Kconfig                   |   1 +
 arch/x86/include/asm/mmu.h         |   2 +-
 arch/x86/include/asm/uaccess.h     |  39 +-
 arch/x86/include/asm/unwind_user.h |  11 +-
 fs/binfmt_elf.c                    |  49 ++-
 include/linux/mm_types.h           |   3 +
 include/linux/sframe.h             |  60 +++
 include/linux/unwind_user_types.h  |   5 +-
 include/uapi/linux/elf.h           |   1 +
 include/uapi/linux/prctl.h         |   6 +-
 kernel/fork.c                      |  10 +
 kernel/sys.c                       |   9 +
 kernel/unwind/Makefile             |   3 +-
 kernel/unwind/sframe.c             | 615 +++++++++++++++++++++++++++++
 kernel/unwind/sframe.h             |  72 ++++
 kernel/unwind/sframe_debug.h       |  68 ++++
 kernel/unwind/user.c               |  56 ++-
 mm/init-mm.c                       |   2 +
 20 files changed, 1004 insertions(+), 32 deletions(-)
 create mode 100644 include/linux/sframe.h
 create mode 100644 kernel/unwind/sframe.c
 create mode 100644 kernel/unwind/sframe.h
 create mode 100644 kernel/unwind/sframe_debug.h

-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86 Jens Remus
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    This fixup resolves the following issue for unwind user sframe, that
    got introduced by Peter Zijlstra's patch "[PATCH 11/12] unwind:
    Implement compat fp unwind" [1]:
    
    Peter factored out the word size (i.e. 4 for 32-bit compat or
    8 for 64-bit) from the frame CFA, FP, and RA offsets.  This is an
    issue for unwind user sframe for two reasons:
    1. SFrame provides absolute offsets, which would require to be
       unnecessarily scaled down only to get scaled up again prior to use.
    2. Factoring out the word size from those offsets requires that all
       architectures/ABIs guarantee, that these offsets are always aligned
       to the word size.
    
    Limit the down-/upscaling by word size to unwind user (compat) fp.
    
    [1]: https://lore.kernel.org/lkml/20250924080119.613695709@infradead.org/

 kernel/unwind/user.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index bc1bf1e83d65..696004ee956a 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -8,19 +8,15 @@
 #include <linux/unwind_user.h>
 #include <linux/uaccess.h>
 
-static const struct unwind_user_frame fp_frame = {
-	ARCH_INIT_USER_FP_FRAME
-};
-
 #define for_each_user_frame(state) \
 	for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
 
 static inline int
-get_user_word(unsigned long *word, unsigned long base, int off, int size)
+get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
 {
-	unsigned long __user *addr = (void __user *)base + (off * size);
+	unsigned long __user *addr = (void __user *)base + off;
 #ifdef CONFIG_COMPAT
-	if (size == sizeof(int)) {
+	if (ws == sizeof(int)) {
 		unsigned int data;
 		int ret = get_user(data, (unsigned int __user *)addr);
 		*word = data;
@@ -32,6 +28,9 @@ get_user_word(unsigned long *word, unsigned long base, int off, int size)
 
 static int unwind_user_next_fp(struct unwind_user_state *state)
 {
+	const struct unwind_user_frame fp_frame = {
+		ARCH_INIT_USER_FP_FRAME(state->ws)
+	};
 	const struct unwind_user_frame *frame = &fp_frame;
 	unsigned long cfa, fp, ra;
 
@@ -44,7 +43,7 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
 	}
 
 	/* Get the Canonical Frame Address (CFA) */
-	cfa += state->ws * frame->cfa_off;
+	cfa += frame->cfa_off;
 
 	/* stack going in wrong direction? */
 	if (cfa <= state->sp)
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
  2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    This fixup adjusts patch "[PATCH 12/12] unwind_user/x86: Enable frame
    pointer unwinding on x86" [1] to my preceding fixup, which limits the
    down-/upscaling by word size to unwind user (compat) fp.
    
    [1]: https://lore.kernel.org/lkml/20250924080119.613695709@infradead.org/

 arch/x86/include/asm/unwind_user.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index e649b8fea2aa..5e0755ea3086 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -2,10 +2,10 @@
 #ifndef _ASM_X86_UNWIND_USER_H
 #define _ASM_X86_UNWIND_USER_H
 
-#define ARCH_INIT_USER_FP_FRAME				\
-	.cfa_off	=  2,				\
-	.ra_off		= -1,				\
-	.fp_off		= -2,				\
+#define ARCH_INIT_USER_FP_FRAME(ws)			\
+	.cfa_off	=  2*(ws),			\
+	.ra_off		= -1*(ws),			\
+	.fp_off		= -2*(ws),			\
 	.use_fp		= true,
 
 #endif /* _ASM_X86_UNWIND_USER_H */
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
  2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
  2025-10-22 14:43 ` [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86 Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-11-18 17:04   ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree Jens Remus
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

In preparation for unwinding user space stacks with sframe, add basic
sframe compile infrastructure and support for reading the .sframe
section header.

sframe_add_section() reads the header and unconditionally returns an
error, so it's not very useful yet.  A subsequent patch will improve
that.

Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/

[ Jens Remus: Add support for PC-relative FDE function start address. ]

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    Changes in v11:
    - Support for SFrame V2 PC-relative FDE function start address. (Jens)

 MAINTAINERS            |   1 +
 arch/Kconfig           |   3 +
 include/linux/sframe.h |  40 ++++++++++++
 kernel/unwind/Makefile |   3 +-
 kernel/unwind/sframe.c | 137 +++++++++++++++++++++++++++++++++++++++++
 kernel/unwind/sframe.h |  72 ++++++++++++++++++++++
 6 files changed, 255 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/sframe.h
 create mode 100644 kernel/unwind/sframe.c
 create mode 100644 kernel/unwind/sframe.h

diff --git a/MAINTAINERS b/MAINTAINERS
index fe168477caa4..251dcb49e112 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26305,6 +26305,7 @@ USERSPACE STACK UNWINDING
 M:	Josh Poimboeuf <jpoimboe@kernel.org>
 M:	Steven Rostedt <rostedt@goodmis.org>
 S:	Maintained
+F:	include/linux/sframe.h
 F:	include/linux/unwind*.h
 F:	kernel/unwind/
 
diff --git a/arch/Kconfig b/arch/Kconfig
index d1b4ffd6e085..69fcabf53088 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -451,6 +451,9 @@ config HAVE_UNWIND_USER_FP
 	bool
 	select UNWIND_USER
 
+config HAVE_UNWIND_USER_SFRAME
+	bool
+
 config HAVE_PERF_REGS
 	bool
 	help
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
new file mode 100644
index 000000000000..0584f661f698
--- /dev/null
+++ b/include/linux/sframe.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SFRAME_H
+#define _LINUX_SFRAME_H
+
+#include <linux/mm_types.h>
+#include <linux/unwind_user_types.h>
+
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+
+struct sframe_section {
+	unsigned long	sframe_start;
+	unsigned long	sframe_end;
+	unsigned long	text_start;
+	unsigned long	text_end;
+
+	unsigned long	fdes_start;
+	unsigned long	fres_start;
+	unsigned long	fres_end;
+	unsigned int	num_fdes;
+
+	signed char	ra_off;
+	signed char	fp_off;
+};
+
+extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+			      unsigned long text_start, unsigned long text_end);
+extern int sframe_remove_section(unsigned long sframe_addr);
+
+#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+				     unsigned long text_start, unsigned long text_end)
+{
+	return -ENOSYS;
+}
+static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+
+#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+#endif /* _LINUX_SFRAME_H */
diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile
index eae37bea54fd..146038165865 100644
--- a/kernel/unwind/Makefile
+++ b/kernel/unwind/Makefile
@@ -1 +1,2 @@
- obj-$(CONFIG_UNWIND_USER)	+= user.o deferred.o
+ obj-$(CONFIG_UNWIND_USER)		+= user.o deferred.o
+ obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME)	+= sframe.o
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
new file mode 100644
index 000000000000..b28ec77bc9a8
--- /dev/null
+++ b/kernel/unwind/sframe.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Userspace sframe access functions
+ */
+
+#define pr_fmt(fmt)	"sframe: " fmt
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <linux/string_helpers.h>
+#include <linux/sframe.h>
+#include <linux/unwind_user_types.h>
+
+#include "sframe.h"
+
+#define dbg(fmt, ...)							\
+	pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static void free_section(struct sframe_section *sec)
+{
+	kfree(sec);
+}
+
+static int sframe_read_header(struct sframe_section *sec)
+{
+	unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
+	struct sframe_header shdr;
+	unsigned int num_fdes;
+
+	if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
+		dbg("header usercopy failed\n");
+		return -EFAULT;
+	}
+
+	if (shdr.preamble.magic != SFRAME_MAGIC ||
+	    shdr.preamble.version != SFRAME_VERSION_2 ||
+	    !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
+	    !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
+	    shdr.auxhdr_len) {
+		dbg("bad/unsupported sframe header\n");
+		return -EINVAL;
+	}
+
+	if (!shdr.num_fdes || !shdr.num_fres) {
+		dbg("no fde/fre entries\n");
+		return -EINVAL;
+	}
+
+	header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
+	if (header_end >= sec->sframe_end) {
+		dbg("header doesn't fit in section\n");
+		return -EINVAL;
+	}
+
+	num_fdes   = shdr.num_fdes;
+	fdes_start = header_end + shdr.fdes_off;
+	fdes_end   = fdes_start + (num_fdes * sizeof(struct sframe_fde));
+
+	fres_start = header_end + shdr.fres_off;
+	fres_end   = fres_start + shdr.fre_len;
+
+	if (fres_start < fdes_end || fres_end > sec->sframe_end) {
+		dbg("inconsistent fde/fre offsets\n");
+		return -EINVAL;
+	}
+
+	sec->num_fdes		= num_fdes;
+	sec->fdes_start		= fdes_start;
+	sec->fres_start		= fres_start;
+	sec->fres_end		= fres_end;
+
+	sec->ra_off		= shdr.cfa_fixed_ra_offset;
+	sec->fp_off		= shdr.cfa_fixed_fp_offset;
+
+	return 0;
+}
+
+int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+		       unsigned long text_start, unsigned long text_end)
+{
+	struct maple_tree *sframe_mt = &current->mm->sframe_mt;
+	struct vm_area_struct *sframe_vma, *text_vma;
+	struct mm_struct *mm = current->mm;
+	struct sframe_section *sec;
+	int ret;
+
+	if (!sframe_start || !sframe_end || !text_start || !text_end) {
+		dbg("zero-length sframe/text address\n");
+		return -EINVAL;
+	}
+
+	scoped_guard(mmap_read_lock, mm) {
+		sframe_vma = vma_lookup(mm, sframe_start);
+		if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
+			dbg("bad sframe address (0x%lx - 0x%lx)\n",
+			    sframe_start, sframe_end);
+			return -EINVAL;
+		}
+
+		text_vma = vma_lookup(mm, text_start);
+		if (!text_vma ||
+		    !(text_vma->vm_flags & VM_EXEC) ||
+		    text_end > text_vma->vm_end) {
+			dbg("bad text address (0x%lx - 0x%lx)\n",
+			    text_start, text_end);
+			return -EINVAL;
+		}
+	}
+
+	sec = kzalloc(sizeof(*sec), GFP_KERNEL);
+	if (!sec)
+		return -ENOMEM;
+
+	sec->sframe_start	= sframe_start;
+	sec->sframe_end		= sframe_end;
+	sec->text_start		= text_start;
+	sec->text_end		= text_end;
+
+	ret = sframe_read_header(sec);
+	if (ret)
+		goto err_free;
+
+	/* TODO nowhere to store it yet - just free it and return an error */
+	ret = -ENOSYS;
+
+err_free:
+	free_section(sec);
+	return ret;
+}
+
+int sframe_remove_section(unsigned long sframe_start)
+{
+	return -ENOSYS;
+}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
new file mode 100644
index 000000000000..559a74322666
--- /dev/null
+++ b/kernel/unwind/sframe.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * From https://www.sourceware.org/binutils/docs/sframe-spec.html
+ */
+#ifndef _SFRAME_H
+#define _SFRAME_H
+
+#include <linux/types.h>
+
+#define SFRAME_VERSION_1			1
+#define SFRAME_VERSION_2			2
+#define SFRAME_MAGIC				0xdee2
+
+#define SFRAME_F_FDE_SORTED			0x1
+#define SFRAME_F_FRAME_POINTER			0x2
+#define SFRAME_F_FDE_FUNC_START_PCREL		0x4
+
+#define SFRAME_ABI_AARCH64_ENDIAN_BIG		1
+#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE	2
+#define SFRAME_ABI_AMD64_ENDIAN_LITTLE		3
+
+#define SFRAME_FDE_TYPE_PCINC			0
+#define SFRAME_FDE_TYPE_PCMASK			1
+
+struct sframe_preamble {
+	u16	magic;
+	u8	version;
+	u8	flags;
+} __packed;
+
+struct sframe_header {
+	struct sframe_preamble preamble;
+	u8	abi_arch;
+	s8	cfa_fixed_fp_offset;
+	s8	cfa_fixed_ra_offset;
+	u8	auxhdr_len;
+	u32	num_fdes;
+	u32	num_fres;
+	u32	fre_len;
+	u32	fdes_off;
+	u32	fres_off;
+} __packed;
+
+#define SFRAME_HEADER_SIZE(header) \
+	((sizeof(struct sframe_header) + header.auxhdr_len))
+
+#define SFRAME_AARCH64_PAUTH_KEY_A		0
+#define SFRAME_AARCH64_PAUTH_KEY_B		1
+
+struct sframe_fde {
+	s32	start_addr;
+	u32	func_size;
+	u32	fres_off;
+	u32	fres_num;
+	u8	info;
+	u8	rep_size;
+	u16 padding;
+} __packed;
+
+#define SFRAME_FUNC_FRE_TYPE(data)		(data & 0xf)
+#define SFRAME_FUNC_FDE_TYPE(data)		((data >> 4) & 0x1)
+#define SFRAME_FUNC_PAUTH_KEY(data)		((data >> 5) & 0x1)
+
+#define SFRAME_BASE_REG_FP			0
+#define SFRAME_BASE_REG_SP			1
+
+#define SFRAME_FRE_CFA_BASE_REG_ID(data)	(data & 0x1)
+#define SFRAME_FRE_OFFSET_COUNT(data)		((data >> 1) & 0xf)
+#define SFRAME_FRE_OFFSET_SIZE(data)		((data >> 5) & 0x3)
+#define SFRAME_FRE_MANGLED_RA_P(data)		((data >> 7) & 0x1)
+
+#endif /* _SFRAME_H */
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (2 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

Associate an sframe section with its mm by adding it to a per-mm maple
tree which is indexed by the corresponding text address range.  A single
sframe section can be associated with multiple text ranges.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 arch/x86/include/asm/mmu.h |  2 +-
 include/linux/mm_types.h   |  3 +++
 include/linux/sframe.h     | 13 +++++++++
 kernel/fork.c              | 10 +++++++
 kernel/unwind/sframe.c     | 55 +++++++++++++++++++++++++++++++++++---
 mm/init-mm.c               |  2 ++
 6 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 0fe9c569d171..227a32899a59 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -87,7 +87,7 @@ typedef struct {
 	.context = {							\
 		.ctx_id = 1,						\
 		.lock = __MUTEX_INITIALIZER(mm.context.lock),		\
-	}
+	},
 
 void leave_mm(void);
 #define leave_mm leave_mm
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 08bc2442db93..31fbd6663047 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1210,6 +1210,9 @@ struct mm_struct {
 #ifdef CONFIG_MM_ID
 		mm_id_t mm_id;
 #endif /* CONFIG_MM_ID */
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+		struct maple_tree sframe_mt;
+#endif
 	} __randomize_layout;
 
 	/*
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 0584f661f698..73bf6f0b30c2 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -22,18 +22,31 @@ struct sframe_section {
 	signed char	fp_off;
 };
 
+#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
+extern void sframe_free_mm(struct mm_struct *mm);
+
 extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 			      unsigned long text_start, unsigned long text_end);
 extern int sframe_remove_section(unsigned long sframe_addr);
 
+static inline bool current_has_sframe(void)
+{
+	struct mm_struct *mm = current->mm;
+
+	return mm && !mtree_empty(&mm->sframe_mt);
+}
+
 #else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
 
+#define INIT_MM_SFRAME
+static inline void sframe_free_mm(struct mm_struct *mm) {}
 static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 				     unsigned long text_start, unsigned long text_end)
 {
 	return -ENOSYS;
 }
 static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline bool current_has_sframe(void) { return false; }
 
 #endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index d827cc6c5362..9eb9b9a5d022 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -106,6 +106,7 @@
 #include <linux/pidfs.h>
 #include <linux/tick.h>
 #include <linux/unwind_deferred.h>
+#include <linux/sframe.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -690,6 +691,7 @@ void __mmdrop(struct mm_struct *mm)
 	mm_destroy_cid(mm);
 	percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
 	futex_hash_free(mm);
+	sframe_free_mm(mm);
 
 	free_mm(mm);
 }
@@ -1028,6 +1030,13 @@ static void mmap_init_lock(struct mm_struct *mm)
 #endif
 }
 
+static void mm_init_sframe(struct mm_struct *mm)
+{
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+	mt_init(&mm->sframe_mt);
+#endif
+}
+
 static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	struct user_namespace *user_ns)
 {
@@ -1056,6 +1065,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm->pmd_huge_pte = NULL;
 #endif
 	mm_init_uprobes_state(mm);
+	mm_init_sframe(mm);
 	hugetlb_count_init(mm);
 
 	if (current->mm) {
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index b28ec77bc9a8..149ce70e4229 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -123,15 +123,64 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 	if (ret)
 		goto err_free;
 
-	/* TODO nowhere to store it yet - just free it and return an error */
-	ret = -ENOSYS;
+	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
+	if (ret) {
+		dbg("mtree_insert_range failed: text=%lx-%lx\n",
+		    sec->text_start, sec->text_end);
+		goto err_free;
+	}
+
+	return 0;
 
 err_free:
 	free_section(sec);
 	return ret;
 }
 
+static int __sframe_remove_section(struct mm_struct *mm,
+				   struct sframe_section *sec)
+{
+	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
+		dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+		return -EINVAL;
+	}
+
+	free_section(sec);
+
+	return 0;
+}
+
 int sframe_remove_section(unsigned long sframe_start)
 {
-	return -ENOSYS;
+	struct mm_struct *mm = current->mm;
+	struct sframe_section *sec;
+	unsigned long index = 0;
+	bool found = false;
+	int ret = 0;
+
+	mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
+		if (sec->sframe_start == sframe_start) {
+			found = true;
+			ret |= __sframe_remove_section(mm, sec);
+		}
+	}
+
+	if (!found || ret)
+		return -EINVAL;
+
+	return 0;
+}
+
+void sframe_free_mm(struct mm_struct *mm)
+{
+	struct sframe_section *sec;
+	unsigned long index = 0;
+
+	if (!mm)
+		return;
+
+	mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
+		free_section(sec);
+
+	mtree_destroy(&mm->sframe_mt);
 }
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 4600e7605cab..b32fcf167cc2 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -11,6 +11,7 @@
 #include <linux/atomic.h>
 #include <linux/user_namespace.h>
 #include <linux/iommu.h>
+#include <linux/sframe.h>
 #include <asm/mmu.h>
 
 #ifndef INIT_MM_CONTEXT
@@ -46,6 +47,7 @@ struct mm_struct init_mm = {
 	.user_ns	= &init_user_ns,
 	.cpu_bitmap	= CPU_BITS_NONE,
 	INIT_MM_CONTEXT(init_mm)
+	INIT_MM_SFRAME
 };
 
 void setup_initial_init_mm(void *start_code, void *end_code,
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (3 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

Add an x86 implementation of unsafe_copy_from_user() similar to the
existing unsafe_copy_to_user().

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 arch/x86/include/asm/uaccess.h | 39 +++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 3a7755c1a441..3caf02d0503e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -599,7 +599,7 @@ _label:									\
  * We want the unsafe accessors to always be inlined and use
  * the error labels - thus the macro games.
  */
-#define unsafe_copy_loop(dst, src, len, type, label)				\
+#define unsafe_copy_to_user_loop(dst, src, len, type, label)			\
 	while (len >= sizeof(type)) {						\
 		unsafe_put_user(*(type *)(src),(type __user *)(dst),label);	\
 		dst += sizeof(type);						\
@@ -607,15 +607,34 @@ _label:									\
 		len -= sizeof(type);						\
 	}
 
-#define unsafe_copy_to_user(_dst,_src,_len,label)			\
-do {									\
-	char __user *__ucu_dst = (_dst);				\
-	const char *__ucu_src = (_src);					\
-	size_t __ucu_len = (_len);					\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label);	\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label);	\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label);	\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label);	\
+#define unsafe_copy_to_user(_dst, _src, _len, label)				\
+do {										\
+	void __user *__dst = (_dst);						\
+	const void *__src = (_src);						\
+	size_t __len = (_len);							\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u64, label);		\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u32, label);		\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u16, label);		\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u8,  label);		\
+} while (0)
+
+#define unsafe_copy_from_user_loop(dst, src, len, type, label)			\
+	while (len >= sizeof(type)) {						\
+		unsafe_get_user(*(type *)(dst), (type __user *)(src), label);	\
+		dst += sizeof(type);						\
+		src += sizeof(type);						\
+		len -= sizeof(type);						\
+	}
+
+#define unsafe_copy_from_user(_dst, _src, _len, label)				\
+do {										\
+	void *__dst = (_dst);							\
+	void __user *__src = (_src);						\
+	size_t __len = (_len);							\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u64, label);		\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u32, label);		\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u16, label);		\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u8,  label);		\
 } while (0)
 
 #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (4 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-23 16:04   ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

In preparation for using sframe to unwind user space stacks, add an
sframe_find() interface for finding the sframe information associated
with a given text address.

For performance, use user_read_access_begin() and the corresponding
unsafe_*() accessors.  Note that use of pr_debug() in uaccess-enabled
regions would break noinstr validation, so there aren't any debug
messages yet.  That will be added in a subsequent commit.

Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/

[ Jens Remus: Add support for PC-relative FDE function start address. ]

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    Changes in v11:
    - Support for SFrame V2 PC-relative FDE function start address. (Jens)

 include/linux/sframe.h       |   5 +
 kernel/unwind/sframe.c       | 318 ++++++++++++++++++++++++++++++++++-
 kernel/unwind/sframe_debug.h |  35 ++++
 3 files changed, 354 insertions(+), 4 deletions(-)
 create mode 100644 kernel/unwind/sframe_debug.h

diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 73bf6f0b30c2..9a72209696f9 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -3,11 +3,14 @@
 #define _LINUX_SFRAME_H
 
 #include <linux/mm_types.h>
+#include <linux/srcu.h>
 #include <linux/unwind_user_types.h>
 
 #ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
 
 struct sframe_section {
+	struct rcu_head	rcu;
+
 	unsigned long	sframe_start;
 	unsigned long	sframe_end;
 	unsigned long	text_start;
@@ -28,6 +31,7 @@ extern void sframe_free_mm(struct mm_struct *mm);
 extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 			      unsigned long text_start, unsigned long text_end);
 extern int sframe_remove_section(unsigned long sframe_addr);
+extern int sframe_find(unsigned long ip, struct unwind_user_frame *frame);
 
 static inline bool current_has_sframe(void)
 {
@@ -46,6 +50,7 @@ static inline int sframe_add_section(unsigned long sframe_start, unsigned long s
 	return -ENOSYS;
 }
 static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline int sframe_find(unsigned long ip, struct unwind_user_frame *frame) { return -ENOSYS; }
 static inline bool current_has_sframe(void) { return false; }
 
 #endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 149ce70e4229..5536374e2a22 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -15,9 +15,310 @@
 #include <linux/unwind_user_types.h>
 
 #include "sframe.h"
+#include "sframe_debug.h"
 
-#define dbg(fmt, ...)							\
-	pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+struct sframe_fre {
+	unsigned int	size;
+	u32		ip_off;
+	s32		cfa_off;
+	s32		ra_off;
+	s32		fp_off;
+	u8		info;
+};
+
+DEFINE_STATIC_SRCU(sframe_srcu);
+
+static __always_inline unsigned char fre_type_to_size(unsigned char fre_type)
+{
+	if (fre_type > 2)
+		return 0;
+	return 1 << fre_type;
+}
+
+static __always_inline unsigned char offset_size_enum_to_size(unsigned char off_size)
+{
+	if (off_size > 2)
+		return 0;
+	return 1 << off_size;
+}
+
+static __always_inline int __read_fde(struct sframe_section *sec,
+				      unsigned int fde_num,
+				      struct sframe_fde *fde,
+				      unsigned long *fde_start_base)
+{
+	unsigned long fde_addr, ip;
+
+	fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
+	unsafe_copy_from_user(fde, (void __user *)fde_addr,
+			      sizeof(struct sframe_fde), Efault);
+
+	ip = fde_addr + fde->start_addr;
+	if (ip < sec->text_start || ip > sec->text_end)
+		return -EINVAL;
+
+	*fde_start_base = fde_addr;
+	return 0;
+
+Efault:
+	return -EFAULT;
+}
+
+static __always_inline int __find_fde(struct sframe_section *sec,
+				      unsigned long ip,
+				      struct sframe_fde *fde,
+				      unsigned long *fde_start_base)
+{
+	unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
+	struct sframe_fde __user *first, *low, *high, *found = NULL;
+	int ret;
+
+	first = (void __user *)sec->fdes_start;
+	low = first;
+	high = first + sec->num_fdes - 1;
+
+	while (low <= high) {
+		struct sframe_fde __user *mid;
+		s32 func_off;
+		unsigned long func_addr;
+
+		mid = low + ((high - low) / 2);
+
+		unsafe_get_user(func_off, (s32 __user *)mid, Efault);
+		func_addr = (unsigned long)mid + func_off;
+
+		if (ip >= func_addr) {
+			if (func_addr < func_addr_low)
+				return -EFAULT;
+
+			func_addr_low = func_addr;
+
+			found = mid;
+			low = mid + 1;
+		} else {
+			if (func_addr > func_addr_high)
+				return -EFAULT;
+
+			func_addr_high = func_addr;
+
+			high = mid - 1;
+		}
+	}
+
+	if (!found)
+		return -EINVAL;
+
+	ret = __read_fde(sec, found - first, fde, fde_start_base);
+	if (ret)
+		return ret;
+
+	/* make sure it's not in a gap */
+	if (ip < *fde_start_base + fde->start_addr ||
+	    ip >= *fde_start_base + fde->start_addr + fde->func_size)
+		return -EINVAL;
+
+	return 0;
+
+Efault:
+	return -EFAULT;
+}
+
+#define ____UNSAFE_GET_USER_INC(to, from, type, label)			\
+({									\
+	type __to;							\
+	unsafe_get_user(__to, (type __user *)from, label);		\
+	from += sizeof(__to);						\
+	to = __to;							\
+})
+
+#define __UNSAFE_GET_USER_INC(to, from, size, label, u_or_s)		\
+({									\
+	switch (size) {							\
+	case 1:								\
+		____UNSAFE_GET_USER_INC(to, from, u_or_s##8, label);	\
+		break;							\
+	case 2:								\
+		____UNSAFE_GET_USER_INC(to, from, u_or_s##16, label);	\
+		break;							\
+	case 4:								\
+		____UNSAFE_GET_USER_INC(to, from, u_or_s##32, label);	\
+		break;							\
+	default:							\
+		return -EFAULT;						\
+	}								\
+})
+
+#define UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label)		\
+	__UNSAFE_GET_USER_INC(to, from, size, label, u)
+
+#define UNSAFE_GET_USER_SIGNED_INC(to, from, size, label)		\
+	__UNSAFE_GET_USER_INC(to, from, size, label, s)
+
+#define UNSAFE_GET_USER_INC(to, from, size, label)				\
+	_Generic(to,								\
+		 u8:	UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label),	\
+		 u16:	UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label),	\
+		 u32:	UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label),	\
+		 s8:	UNSAFE_GET_USER_SIGNED_INC(to, from, size, label),	\
+		 s16:	UNSAFE_GET_USER_SIGNED_INC(to, from, size, label),	\
+		 s32:	UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+
+static __always_inline int __read_fre(struct sframe_section *sec,
+				      struct sframe_fde *fde,
+				      unsigned long fre_addr,
+				      struct sframe_fre *fre)
+{
+	unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
+	unsigned char fre_type = SFRAME_FUNC_FRE_TYPE(fde->info);
+	unsigned char offset_count, offset_size;
+	s32 cfa_off, ra_off, fp_off;
+	unsigned long cur = fre_addr;
+	unsigned char addr_size;
+	u32 ip_off;
+	u8 info;
+
+	addr_size = fre_type_to_size(fre_type);
+	if (!addr_size)
+		return -EFAULT;
+
+	if (fre_addr + addr_size + 1 > sec->fres_end)
+		return -EFAULT;
+
+	UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
+	if (fde_type == SFRAME_FDE_TYPE_PCINC && ip_off > fde->func_size)
+		return -EFAULT;
+
+	UNSAFE_GET_USER_INC(info, cur, 1, Efault);
+	offset_count = SFRAME_FRE_OFFSET_COUNT(info);
+	offset_size  = offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(info));
+	if (!offset_count || !offset_size)
+		return -EFAULT;
+
+	if (cur + (offset_count * offset_size) > sec->fres_end)
+		return -EFAULT;
+
+	fre->size = addr_size + 1 + (offset_count * offset_size);
+
+	UNSAFE_GET_USER_INC(cfa_off, cur, offset_size, Efault);
+	offset_count--;
+
+	ra_off = sec->ra_off;
+	if (!ra_off) {
+		if (!offset_count--)
+			return -EFAULT;
+
+		UNSAFE_GET_USER_INC(ra_off, cur, offset_size, Efault);
+	}
+
+	fp_off = sec->fp_off;
+	if (!fp_off && offset_count) {
+		offset_count--;
+		UNSAFE_GET_USER_INC(fp_off, cur, offset_size, Efault);
+	}
+
+	if (offset_count)
+		return -EFAULT;
+
+	fre->ip_off		= ip_off;
+	fre->cfa_off		= cfa_off;
+	fre->ra_off		= ra_off;
+	fre->fp_off		= fp_off;
+	fre->info		= info;
+
+	return 0;
+
+Efault:
+	return -EFAULT;
+}
+
+static __always_inline int __find_fre(struct sframe_section *sec,
+				      struct sframe_fde *fde,
+				      unsigned long fde_start_base,
+				      unsigned long ip,
+				      struct unwind_user_frame *frame)
+{
+	unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
+	struct sframe_fre *fre, *prev_fre = NULL;
+	struct sframe_fre fres[2];
+	unsigned long fre_addr;
+	bool which = false;
+	unsigned int i;
+	u32 ip_off;
+
+	ip_off = ip - (fde_start_base + fde->start_addr);
+
+	if (fde_type == SFRAME_FDE_TYPE_PCMASK)
+		ip_off %= fde->rep_size;
+
+	fre_addr = sec->fres_start + fde->fres_off;
+
+	for (i = 0; i < fde->fres_num; i++) {
+		int ret;
+
+		/*
+		 * Alternate between the two fre_addr[] entries for 'fre' and
+		 * 'prev_fre'.
+		 */
+		fre = which ? fres : fres + 1;
+		which = !which;
+
+		ret = __read_fre(sec, fde, fre_addr, fre);
+		if (ret)
+			return ret;
+
+		fre_addr += fre->size;
+
+		if (prev_fre && fre->ip_off <= prev_fre->ip_off)
+			return -EFAULT;
+
+		if (fre->ip_off > ip_off)
+			break;
+
+		prev_fre = fre;
+	}
+
+	if (!prev_fre)
+		return -EINVAL;
+	fre = prev_fre;
+
+	frame->cfa_off = fre->cfa_off;
+	frame->ra_off  = fre->ra_off;
+	frame->fp_off  = fre->fp_off;
+	frame->use_fp  = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+
+	return 0;
+}
+
+int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
+{
+	struct mm_struct *mm = current->mm;
+	struct sframe_section *sec;
+	struct sframe_fde fde;
+	unsigned long fde_start_base;
+	int ret;
+
+	if (!mm)
+		return -EINVAL;
+
+	guard(srcu)(&sframe_srcu);
+
+	sec = mtree_load(&mm->sframe_mt, ip);
+	if (!sec)
+		return -EINVAL;
+
+	if (!user_read_access_begin((void __user *)sec->sframe_start,
+				    sec->sframe_end - sec->sframe_start))
+		return -EFAULT;
+
+	ret = __find_fde(sec, ip, &fde, &fde_start_base);
+	if (ret)
+		goto end;
+
+	ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
+end:
+	user_read_access_end();
+	return ret;
+}
 
 static void free_section(struct sframe_section *sec)
 {
@@ -120,8 +421,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 	sec->text_end		= text_end;
 
 	ret = sframe_read_header(sec);
-	if (ret)
+	if (ret) {
+		dbg_print_header(sec);
 		goto err_free;
+	}
 
 	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
 	if (ret) {
@@ -137,6 +440,13 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 	return ret;
 }
 
+static void sframe_free_srcu(struct rcu_head *rcu)
+{
+	struct sframe_section *sec = container_of(rcu, struct sframe_section, rcu);
+
+	free_section(sec);
+}
+
 static int __sframe_remove_section(struct mm_struct *mm,
 				   struct sframe_section *sec)
 {
@@ -145,7 +455,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
 		return -EINVAL;
 	}
 
-	free_section(sec);
+	call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
 
 	return 0;
 }
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
new file mode 100644
index 000000000000..055c8c8fae24
--- /dev/null
+++ b/kernel/unwind/sframe_debug.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SFRAME_DEBUG_H
+#define _SFRAME_DEBUG_H
+
+#include <linux/sframe.h>
+#include "sframe.h"
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+
+#define dbg(fmt, ...)							\
+	pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static __always_inline void dbg_print_header(struct sframe_section *sec)
+{
+	unsigned long fdes_end;
+
+	fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde));
+
+	dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+	    "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+	    "ra_off:%d fp_off:%d\n",
+	    sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+	    sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+	    sec->ra_off, sec->fp_off);
+}
+
+#else /* !CONFIG_DYNAMIC_DEBUG */
+
+#define dbg(args...)			no_printk(args)
+
+static inline void dbg_print_header(struct sframe_section *sec) {}
+
+#endif /* !CONFIG_DYNAMIC_DEBUG */
+
+#endif /* _SFRAME_DEBUG_H */
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (5 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

When loading an ELF executable, automatically detect an .sframe section
and associate it with the mm_struct.

Cc: linux-mm@kvack.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 fs/binfmt_elf.c          | 49 +++++++++++++++++++++++++++++++++++++---
 include/uapi/linux/elf.h |  1 +
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 264fba0d44bd..1fd7623cf9a5 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -47,6 +47,7 @@
 #include <linux/dax.h>
 #include <linux/uaccess.h>
 #include <linux/rseq.h>
+#include <linux/sframe.h>
 #include <asm/param.h>
 #include <asm/page.h>
 
@@ -622,6 +623,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
 	return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
 }
 
+static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
+			   unsigned long base_addr)
+{
+	unsigned long sframe_start, sframe_end, text_start, text_end;
+
+	sframe_start = base_addr + sframe->p_vaddr;
+	sframe_end   = sframe_start + sframe->p_memsz;
+
+	text_start   = base_addr + text->p_vaddr;
+	text_end     = text_start + text->p_memsz;
+
+	/* Ignore return value, sframe section isn't critical */
+	sframe_add_section(sframe_start, sframe_end, text_start, text_end);
+}
+
 /* This is much more generalized than the library routine read function,
    so we keep this separate.  Technically the library read function
    is only provided so that we can read a.out libraries that have
@@ -632,7 +648,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 		unsigned long no_base, struct elf_phdr *interp_elf_phdata,
 		struct arch_elf_state *arch_state)
 {
-	struct elf_phdr *eppnt;
+	struct elf_phdr *eppnt, *sframe_phdr = NULL;
 	unsigned long load_addr = 0;
 	int load_addr_set = 0;
 	unsigned long error = ~0UL;
@@ -658,7 +674,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 
 	eppnt = interp_elf_phdata;
 	for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
-		if (eppnt->p_type == PT_LOAD) {
+		switch (eppnt->p_type) {
+		case PT_LOAD: {
 			int elf_type = MAP_PRIVATE;
 			int elf_prot = make_prot(eppnt->p_flags, arch_state,
 						 true, true);
@@ -697,6 +714,20 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				error = -ENOMEM;
 				goto out;
 			}
+			break;
+		}
+		case PT_GNU_SFRAME:
+			sframe_phdr = eppnt;
+			break;
+		}
+	}
+
+	if (sframe_phdr) {
+		eppnt = interp_elf_phdata;
+		for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
+			if (eppnt->p_flags & PF_X) {
+				elf_add_sframe(eppnt, sframe_phdr, load_addr);
+			}
 		}
 	}
 
@@ -821,7 +852,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	int first_pt_load = 1;
 	unsigned long error;
 	struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
-	struct elf_phdr *elf_property_phdata = NULL;
+	struct elf_phdr *elf_property_phdata = NULL, *sframe_phdr = NULL;
 	unsigned long elf_brk;
 	bool brk_moved = false;
 	int retval, i;
@@ -930,6 +961,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				executable_stack = EXSTACK_DISABLE_X;
 			break;
 
+		case PT_GNU_SFRAME:
+			sframe_phdr = elf_ppnt;
+			break;
+
 		case PT_LOPROC ... PT_HIPROC:
 			retval = arch_elf_pt_proc(elf_ex, elf_ppnt,
 						  bprm->file, false,
@@ -1227,6 +1262,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			elf_brk = k;
 	}
 
+	if (sframe_phdr) {
+		for (i = 0, elf_ppnt = elf_phdata;
+		     i < elf_ex->e_phnum; i++, elf_ppnt++) {
+			if ((elf_ppnt->p_flags & PF_X))
+				elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
+		}
+	}
+
 	e_entry = elf_ex->e_entry + load_bias;
 	phdr_addr += load_bias;
 	elf_brk += load_bias;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 819ded2d39de..92c16c94fca8 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -41,6 +41,7 @@ typedef __u16	Elf64_Versym;
 #define PT_GNU_STACK	(PT_LOOS + 0x474e551)
 #define PT_GNU_RELRO	(PT_LOOS + 0x474e552)
 #define PT_GNU_PROPERTY	(PT_LOOS + 0x474e553)
+#define PT_GNU_SFRAME	(PT_LOOS + 0x474e554)
 
 
 /* ARM MTE memory tag segment type */
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (6 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-24 13:44   ` Peter Zijlstra
  2025-10-22 14:43 ` [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame Jens Remus
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

Now that the sframe infrastructure is fully in place, make it work by
hooking it up to the unwind_user interface.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 arch/Kconfig                      |  1 +
 include/linux/unwind_user_types.h |  4 ++-
 kernel/unwind/user.c              | 41 +++++++++++++++++++++++++++----
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 69fcabf53088..277b87af949f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -453,6 +453,7 @@ config HAVE_UNWIND_USER_FP
 
 config HAVE_UNWIND_USER_SFRAME
 	bool
+	select UNWIND_USER
 
 config HAVE_PERF_REGS
 	bool
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 938f7e623332..ee0ce855e045 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -9,7 +9,8 @@
  * available.
  */
 enum unwind_user_type_bits {
-	UNWIND_USER_TYPE_FP_BIT =		0,
+	UNWIND_USER_TYPE_SFRAME_BIT =		0,
+	UNWIND_USER_TYPE_FP_BIT =		1,
 
 	NR_UNWIND_USER_TYPE_BITS,
 };
@@ -17,6 +18,7 @@ enum unwind_user_type_bits {
 enum unwind_user_type {
 	/* Type "none" for the start of stack walk iteration. */
 	UNWIND_USER_TYPE_NONE =			0,
+	UNWIND_USER_TYPE_SFRAME =		BIT(UNWIND_USER_TYPE_SFRAME_BIT),
 	UNWIND_USER_TYPE_FP =			BIT(UNWIND_USER_TYPE_FP_BIT),
 };
 
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 696004ee956a..f6c543cb255b 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -7,6 +7,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/unwind_user.h>
 #include <linux/uaccess.h>
+#include <linux/sframe.h>
 
 #define for_each_user_frame(state) \
 	for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
@@ -26,12 +27,10 @@ get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
 	return get_user(*word, addr);
 }
 
-static int unwind_user_next_fp(struct unwind_user_state *state)
+static int unwind_user_next_common(struct unwind_user_state *state,
+				   const struct unwind_user_frame *frame,
+				   struct pt_regs *regs)
 {
-	const struct unwind_user_frame fp_frame = {
-		ARCH_INIT_USER_FP_FRAME(state->ws)
-	};
-	const struct unwind_user_frame *frame = &fp_frame;
 	unsigned long cfa, fp, ra;
 
 	if (frame->use_fp) {
@@ -67,6 +66,26 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
 	return 0;
 }
 
+static int unwind_user_next_sframe(struct unwind_user_state *state)
+{
+	struct unwind_user_frame _frame, *frame;
+
+	/* sframe expects the frame to be local storage */
+	frame = &_frame;
+	if (sframe_find(state->ip, frame))
+		return -ENOENT;
+	return unwind_user_next_common(state, frame, task_pt_regs(current));
+}
+
+static int unwind_user_next_fp(struct unwind_user_state *state)
+{
+	const struct unwind_user_frame fp_frame = {
+		ARCH_INIT_USER_FP_FRAME(state->ws)
+	};
+
+	return unwind_user_next_common(state, &fp_frame, task_pt_regs(current));
+}
+
 static int unwind_user_next(struct unwind_user_state *state)
 {
 	unsigned long iter_mask = state->available_types;
@@ -80,6 +99,16 @@ static int unwind_user_next(struct unwind_user_state *state)
 
 		state->current_type = type;
 		switch (type) {
+		case UNWIND_USER_TYPE_SFRAME:
+			switch (unwind_user_next_sframe(state)) {
+			case 0:
+				return 0;
+			case -ENOENT:
+				continue;	/* Try next method. */
+			default:
+				state->done = true;
+			}
+			break;
 		case UNWIND_USER_TYPE_FP:
 			if (!unwind_user_next_fp(state))
 				return 0;
@@ -108,6 +137,8 @@ static int unwind_user_start(struct unwind_user_state *state)
 		return -EINVAL;
 	}
 
+	if (current_has_sframe())
+		state->available_types |= UNWIND_USER_TYPE_SFRAME;
 	if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP))
 		state->available_types |= UNWIND_USER_TYPE_FP;
 
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (7 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication Jens Remus
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

Add an indication for an outermost frame to the unwind user frame
structure and stop unwinding when reaching an outermost frame.

This will be used by unwind user sframe, as SFrame may represent an
undefined return address as indication for an outermost frame.

Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    Changes in v11:
    - New patch. (Jens)

 arch/x86/include/asm/unwind_user.h | 3 ++-
 include/linux/unwind_user_types.h  | 1 +
 kernel/unwind/user.c               | 6 ++++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 5e0755ea3086..08684bbeff0c 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -6,6 +6,7 @@
 	.cfa_off	=  2*(ws),			\
 	.ra_off		= -1*(ws),			\
 	.fp_off		= -2*(ws),			\
-	.use_fp		= true,
+	.use_fp		= true,				\
+	.outermost	= false,
 
 #endif /* _ASM_X86_UNWIND_USER_H */
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index ee0ce855e045..e07fee69d315 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -32,6 +32,7 @@ struct unwind_user_frame {
 	s32 ra_off;
 	s32 fp_off;
 	bool use_fp;
+	bool outermost;
 };
 
 struct unwind_user_state {
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index f6c543cb255b..c8034a447c16 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -33,6 +33,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
 {
 	unsigned long cfa, fp, ra;
 
+	/* Stop unwinding when reaching an outermost frame. */
+	if (frame->outermost) {
+		state->done = true;
+		return 0;
+	}
+
 	if (frame->use_fp) {
 		if (state->fp < state->sp)
 			return -EINVAL;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (8 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

SFrame may represent an undefined return address (RA) as SFrame FRE
without any offsets as indication for an outermost frame.

Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    Changes in v11:
    - New patch. (Jens)

 kernel/unwind/sframe.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 5536374e2a22..bc3e2eb00325 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -24,6 +24,7 @@ struct sframe_fre {
 	s32		ra_off;
 	s32		fp_off;
 	u8		info;
+	bool		ra_undefined;
 };
 
 DEFINE_STATIC_SRCU(sframe_srcu);
@@ -173,6 +174,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
 	unsigned char offset_count, offset_size;
 	s32 cfa_off, ra_off, fp_off;
 	unsigned long cur = fre_addr;
+	bool ra_undefined = false;
 	unsigned char addr_size;
 	u32 ip_off;
 	u8 info;
@@ -191,7 +193,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
 	UNSAFE_GET_USER_INC(info, cur, 1, Efault);
 	offset_count = SFRAME_FRE_OFFSET_COUNT(info);
 	offset_size  = offset_size_enum_to_size(SFRAME_FRE_OFFSET_SIZE(info));
-	if (!offset_count || !offset_size)
+	if (!offset_size)
 		return -EFAULT;
 
 	if (cur + (offset_count * offset_size) > sec->fres_end)
@@ -199,6 +201,14 @@ static __always_inline int __read_fre(struct sframe_section *sec,
 
 	fre->size = addr_size + 1 + (offset_count * offset_size);
 
+	if (!offset_count) {
+		cfa_off		= 0;
+		ra_off		= 0;
+		fp_off		= 0;
+		ra_undefined	= true;
+		goto done;
+	}
+
 	UNSAFE_GET_USER_INC(cfa_off, cur, offset_size, Efault);
 	offset_count--;
 
@@ -219,11 +229,13 @@ static __always_inline int __read_fre(struct sframe_section *sec,
 	if (offset_count)
 		return -EFAULT;
 
+done:
 	fre->ip_off		= ip_off;
 	fre->cfa_off		= cfa_off;
 	fre->ra_off		= ra_off;
 	fre->fp_off		= fp_off;
 	fre->info		= info;
+	fre->ra_undefined	= ra_undefined;
 
 	return 0;
 
@@ -285,6 +297,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
 	frame->ra_off  = fre->ra_off;
 	frame->fp_off  = fre->fp_off;
 	frame->use_fp  = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+	frame->outermost = fre->ra_undefined;
 
 	return 0;
 }
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (9 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

The x86 sframe 2.0 implementation works fairly well, starting with
binutils 2.41 (though some bugs are getting fixed in later versions).
Enable it.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8f94c58d4de8..c3518f145f0d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -300,6 +300,7 @@ config X86
 	select HAVE_UACCESS_VALIDATION		if HAVE_OBJTOOL
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_UNWIND_USER_FP		if X86_64
+	select HAVE_UNWIND_USER_SFRAME		if X86_64
 	select HAVE_USER_RETURN_NOTIFIER
 	select HAVE_GENERIC_VDSO
 	select VDSO_GETRANDOM			if X86_64
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (10 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output Jens Remus
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

To avoid continued attempted use of a bad .sframe section, remove it
on demand when the first sign of corruption is detected.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 kernel/unwind/sframe.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index bc3e2eb00325..77ef1f0bb9c5 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -330,6 +330,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
 	ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
 end:
 	user_read_access_end();
+
+	if (ret == -EFAULT)
+		WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+
 	return ret;
 }
 
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (11 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option Jens Remus
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

When debugging sframe issues, the error messages aren't all that helpful
without knowing what file a corresponding .sframe section belongs to.
Prefix debug output strings with the file name.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 include/linux/sframe.h       |  4 +++-
 kernel/unwind/sframe.c       | 23 ++++++++++--------
 kernel/unwind/sframe_debug.h | 45 +++++++++++++++++++++++++++++++-----
 3 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 9a72209696f9..b79c5ec09229 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -10,7 +10,9 @@
 
 struct sframe_section {
 	struct rcu_head	rcu;
-
+#ifdef CONFIG_DYNAMIC_DEBUG
+	const char	*filename;
+#endif
 	unsigned long	sframe_start;
 	unsigned long	sframe_end;
 	unsigned long	text_start;
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 77ef1f0bb9c5..82eaf3c5d6b0 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -331,14 +331,17 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
 end:
 	user_read_access_end();
 
-	if (ret == -EFAULT)
+	if (ret == -EFAULT) {
+		dbg_sec("removing bad .sframe section\n");
 		WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+	}
 
 	return ret;
 }
 
 static void free_section(struct sframe_section *sec)
 {
+	dbg_free(sec);
 	kfree(sec);
 }
 
@@ -349,7 +352,7 @@ static int sframe_read_header(struct sframe_section *sec)
 	unsigned int num_fdes;
 
 	if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
-		dbg("header usercopy failed\n");
+		dbg_sec("header usercopy failed\n");
 		return -EFAULT;
 	}
 
@@ -358,18 +361,18 @@ static int sframe_read_header(struct sframe_section *sec)
 	    !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
 	    !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
 	    shdr.auxhdr_len) {
-		dbg("bad/unsupported sframe header\n");
+		dbg_sec("bad/unsupported sframe header\n");
 		return -EINVAL;
 	}
 
 	if (!shdr.num_fdes || !shdr.num_fres) {
-		dbg("no fde/fre entries\n");
+		dbg_sec("no fde/fre entries\n");
 		return -EINVAL;
 	}
 
 	header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
 	if (header_end >= sec->sframe_end) {
-		dbg("header doesn't fit in section\n");
+		dbg_sec("header doesn't fit in section\n");
 		return -EINVAL;
 	}
 
@@ -381,7 +384,7 @@ static int sframe_read_header(struct sframe_section *sec)
 	fres_end   = fres_start + shdr.fre_len;
 
 	if (fres_start < fdes_end || fres_end > sec->sframe_end) {
-		dbg("inconsistent fde/fre offsets\n");
+		dbg_sec("inconsistent fde/fre offsets\n");
 		return -EINVAL;
 	}
 
@@ -437,6 +440,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 	sec->text_start		= text_start;
 	sec->text_end		= text_end;
 
+	dbg_init(sec);
+
 	ret = sframe_read_header(sec);
 	if (ret) {
 		dbg_print_header(sec);
@@ -445,8 +450,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 
 	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
 	if (ret) {
-		dbg("mtree_insert_range failed: text=%lx-%lx\n",
-		    sec->text_start, sec->text_end);
+		dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
+			sec->text_start, sec->text_end);
 		goto err_free;
 	}
 
@@ -468,7 +473,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
 				   struct sframe_section *sec)
 {
 	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
-		dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+		dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
 		return -EINVAL;
 	}
 
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
index 055c8c8fae24..7794bf0bd78c 100644
--- a/kernel/unwind/sframe_debug.h
+++ b/kernel/unwind/sframe_debug.h
@@ -10,26 +10,59 @@
 #define dbg(fmt, ...)							\
 	pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
 
+#define dbg_sec(fmt, ...)						\
+	dbg("%s: " fmt, sec->filename, ##__VA_ARGS__)
+
 static __always_inline void dbg_print_header(struct sframe_section *sec)
 {
 	unsigned long fdes_end;
 
 	fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde));
 
-	dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
-	    "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
-	    "ra_off:%d fp_off:%d\n",
-	    sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
-	    sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
-	    sec->ra_off, sec->fp_off);
+	dbg_sec("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+		"fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+		"ra_off:%d fp_off:%d\n",
+		sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+		sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+		sec->ra_off, sec->fp_off);
+}
+
+static inline void dbg_init(struct sframe_section *sec)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	guard(mmap_read_lock)(mm);
+	vma = vma_lookup(mm, sec->sframe_start);
+	if (!vma)
+		sec->filename = kstrdup("(vma gone???)", GFP_KERNEL);
+	else if (vma->vm_file)
+		sec->filename = kstrdup_quotable_file(vma->vm_file, GFP_KERNEL);
+	else if (vma->vm_ops && vma->vm_ops->name)
+		sec->filename = kstrdup(vma->vm_ops->name(vma), GFP_KERNEL);
+	else if (arch_vma_name(vma))
+		sec->filename = kstrdup(arch_vma_name(vma), GFP_KERNEL);
+	else if (!vma->vm_mm)
+		sec->filename = kstrdup("(vdso)", GFP_KERNEL);
+	else
+		sec->filename = kstrdup("(anonymous)", GFP_KERNEL);
+}
+
+static inline void dbg_free(struct sframe_section *sec)
+{
+	kfree(sec->filename);
 }
 
 #else /* !CONFIG_DYNAMIC_DEBUG */
 
 #define dbg(args...)			no_printk(args)
+#define dbg_sec(args...	)		no_printk(args)
 
 static inline void dbg_print_header(struct sframe_section *sec) {}
 
+static inline void dbg_init(struct sframe_section *sec) {}
+static inline void dbg_free(struct sframe_section *sec) {}
+
 #endif /* !CONFIG_DYNAMIC_DEBUG */
 
 #endif /* _SFRAME_DEBUG_H */
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (12 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 14:43 ` [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

Add a debug feature to validate all .sframe sections when first loading
the file rather than on demand.

[ Jens Remus: Add support for PC-relative FDE function start address. ]

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    Changes in v11:
    - Support for SFrame V2 PC-relative FDE function start address. (Jens)

 arch/Kconfig           | 19 +++++++++
 kernel/unwind/sframe.c | 97 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 277b87af949f..918ebe3c5a85 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -455,6 +455,25 @@ config HAVE_UNWIND_USER_SFRAME
 	bool
 	select UNWIND_USER
 
+config SFRAME_VALIDATION
+	bool "Enable .sframe section debugging"
+	depends on HAVE_UNWIND_USER_SFRAME
+	depends on DYNAMIC_DEBUG
+	help
+	  When adding an .sframe section for a task, validate the entire
+	  section immediately rather than on demand.
+
+	  This is a debug feature which is helpful for rooting out .sframe
+	  section issues.  If the .sframe section is corrupt, it will fail to
+	  load immediately, with more information provided in dynamic printks.
+
+	  This has a significant page cache footprint due to its reading of the
+	  entire .sframe section for every loaded executable and shared
+	  library.  Also, it's done for all processes, even those which don't
+	  get stack traced by the kernel.  Not recommended for general use.
+
+	  If unsure, say N.
+
 config HAVE_PERF_REGS
 	bool
 	help
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 82eaf3c5d6b0..3d7ac4eaa8b7 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -339,6 +339,99 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
 	return ret;
 }
 
+#ifdef CONFIG_SFRAME_VALIDATION
+
+static int safe_read_fde(struct sframe_section *sec,
+			 unsigned int fde_num, struct sframe_fde *fde,
+			 unsigned long *fde_start_base)
+{
+	int ret;
+
+	if (!user_read_access_begin((void __user *)sec->sframe_start,
+				    sec->sframe_end - sec->sframe_start))
+		return -EFAULT;
+	ret = __read_fde(sec, fde_num, fde, fde_start_base);
+	user_read_access_end();
+	return ret;
+}
+
+static int safe_read_fre(struct sframe_section *sec,
+			 struct sframe_fde *fde, unsigned long fre_addr,
+			 struct sframe_fre *fre)
+{
+	int ret;
+
+	if (!user_read_access_begin((void __user *)sec->sframe_start,
+				    sec->sframe_end - sec->sframe_start))
+		return -EFAULT;
+	ret = __read_fre(sec, fde, fre_addr, fre);
+	user_read_access_end();
+	return ret;
+}
+
+static int sframe_validate_section(struct sframe_section *sec)
+{
+	unsigned long prev_ip = 0;
+	unsigned int i;
+
+	for (i = 0; i < sec->num_fdes; i++) {
+		struct sframe_fre *fre, *prev_fre = NULL;
+		unsigned long ip, fde_start_base, fre_addr;
+		struct sframe_fde fde;
+		struct sframe_fre fres[2];
+		bool which = false;
+		unsigned int j;
+		int ret;
+
+		ret = safe_read_fde(sec, i, &fde, &fde_start_base);
+		if (ret)
+			return ret;
+
+		ip = fde_start_base + fde.start_addr;
+		if (ip <= prev_ip) {
+			dbg_sec("fde %u not sorted\n", i);
+			return -EFAULT;
+		}
+		prev_ip = ip;
+
+		fre_addr = sec->fres_start + fde.fres_off;
+		for (j = 0; j < fde.fres_num; j++) {
+			int ret;
+
+			fre = which ? fres : fres + 1;
+			which = !which;
+
+			ret = safe_read_fre(sec, &fde, fre_addr, fre);
+			if (ret) {
+				dbg_sec("fde %u: __read_fre(%u) failed\n", i, j);
+				dbg_sec("FDE: start_addr:0x%x func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
+					fde.start_addr, fde.func_size,
+					fde.fres_off, fde.fres_num,
+					fde.info, fde.rep_size);
+				return ret;
+			}
+
+			fre_addr += fre->size;
+
+			if (prev_fre && fre->ip_off <= prev_fre->ip_off) {
+				dbg_sec("fde %u: fre %u not sorted\n", i, j);
+				return -EFAULT;
+			}
+
+			prev_fre = fre;
+		}
+	}
+
+	return 0;
+}
+
+#else /*  !CONFIG_SFRAME_VALIDATION */
+
+static int sframe_validate_section(struct sframe_section *sec) { return 0; }
+
+#endif /* !CONFIG_SFRAME_VALIDATION */
+
+
 static void free_section(struct sframe_section *sec)
 {
 	dbg_free(sec);
@@ -448,6 +541,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
 		goto err_free;
 	}
 
+	ret = sframe_validate_section(sec);
+	if (ret)
+		goto err_free;
+
 	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
 	if (ret) {
 		dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (13 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option Jens Remus
@ 2025-10-22 14:43 ` Jens Remus
  2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-22 14:43 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

From: Josh Poimboeuf <jpoimboe@kernel.org>

The kernel doesn't have direct visibility to the ELF contents of shared
libraries.  Add some prctl() interfaces which allow glibc to tell the
kernel where to find .sframe sections.

[
  This adds an interface for prctl() for testing loading of sframes for
  libraries. But this interface should really be a system call. This patch
  is for testing purposes only and should not be applied to mainline.
]

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 include/uapi/linux/prctl.h | 6 +++++-
 kernel/sys.c               | 9 +++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index ed3aed264aeb..b807baa8a53b 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -358,7 +358,7 @@ struct prctl_mm_map {
  * configuration.  All bits may be locked via this call, including
  * undefined bits.
  */
-#define PR_LOCK_SHADOW_STACK_STATUS      76
+#define PR_LOCK_SHADOW_STACK_STATUS	76
 
 /*
  * Controls the mode of timer_create() for CRIU restore operations.
@@ -376,4 +376,8 @@ struct prctl_mm_map {
 # define PR_FUTEX_HASH_SET_SLOTS	1
 # define PR_FUTEX_HASH_GET_SLOTS	2
 
+/* SFRAME management */
+#define PR_ADD_SFRAME			79
+#define PR_REMOVE_SFRAME		80
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 1e28b40053ce..e6ce79a3a7aa 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -65,6 +65,7 @@
 #include <linux/rcupdate.h>
 #include <linux/uidgid.h>
 #include <linux/cred.h>
+#include <linux/sframe.h>
 
 #include <linux/nospec.h>
 
@@ -2805,6 +2806,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_FUTEX_HASH:
 		error = futex_hash_prctl(arg2, arg3, arg4);
 		break;
+	case PR_ADD_SFRAME:
+		error = sframe_add_section(arg2, arg3, arg4, arg5);
+		break;
+	case PR_REMOVE_SFRAME:
+		if (arg3 || arg4 || arg5)
+			return -EINVAL;
+		error = sframe_remove_section(arg2);
+		break;
 	default:
 		trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5);
 		error = -EINVAL;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (14 preceding siblings ...)
  2025-10-22 14:43 ` [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
@ 2025-10-22 20:39 ` Andrew Morton
  2025-10-22 21:58   ` Steven Rostedt
  2025-10-23  8:09 ` Fangrui Song
  2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
  17 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2025-10-22 20:39 UTC (permalink / raw)
  To: Jens Remus
  Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Florian Weimer, Kees Cook, Carlos O'Donell,
	Sam James, Borislav Petkov, Dave Hansen, David Hildenbrand,
	H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Heiko Carstens, Vasily Gorbik

On Wed, 22 Oct 2025 16:43:11 +0200 Jens Remus <jremus@linux.ibm.com> wrote:

> This is the implementation of parsing the SFrame section in an ELF file.

Presently x86_64-only, it seems.  Can we expect to see this implemented
for other architectures?

Would a selftest for this be appropriate?  To give testers some way of
exercising the code and make to life better for people who are enabling
this on other architectures.

In what tree do you anticipate this project being carried?




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
  2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
@ 2025-10-22 21:58   ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-10-22 21:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jens Remus, linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Florian Weimer, Kees Cook, Carlos O'Donell,
	Sam James, Borislav Petkov, Dave Hansen, David Hildenbrand,
	H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
	Heiko Carstens, Vasily Gorbik

On Wed, 22 Oct 2025 13:39:32 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 22 Oct 2025 16:43:11 +0200 Jens Remus <jremus@linux.ibm.com> wrote:
> 
> > This is the implementation of parsing the SFrame section in an ELF file.  
> 
> Presently x86_64-only, it seems.  Can we expect to see this implemented
> for other architectures?

Yes, and Jens is here to port it to the s390 :-)

Currently Peter Zijlstra and I are updating the deferred unwinder. Jens is
working on getting sframes to work with it. His interest is getting it for
s390 whereas ours is for x86.

> 
> Would a selftest for this be appropriate?  To give testers some way of
> exercising the code and make to life better for people who are enabling
> this on other architectures.

Yes we should definitely have selftests. But we are far from getting there.
One requirement is that the toolchain used to build the test must support
adding sframes.

> 
> In what tree do you anticipate this project being carried?
> 

It will likely go between tip or my tree.

-- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (15 preceding siblings ...)
  2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
@ 2025-10-23  8:09 ` Fangrui Song
  2025-10-23 14:23   ` Steven Rostedt
  2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
  17 siblings, 1 reply; 28+ messages in thread
From: Fangrui Song @ 2025-10-23  8:09 UTC (permalink / raw)
  To: Jens Remus
  Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
	Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
	David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik

On 2025-10-22, Jens Remus wrote:
>This is the implementation of parsing the SFrame section in an ELF file.
>It's a continuation of Josh's and Steve's last work that can be found
>here:
>
>   https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
>   https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>
>Currently the only way to get a user space stack trace from a stack
>walk (and not just copying large amount of user stack into the kernel
>ring buffer) is to use frame pointers. This has a few issues. The biggest
>one is that compiling frame pointers into every application and library
>has been shown to cause performance overhead.
>
>Another issue is that the format of the frames may not always be consistent
>between different compilers and some architectures (s390) has no defined
>format to do a reliable stack walk. The only way to perform user space
>profiling on these architectures is to copy the user stack into the kernel
>buffer.
>
>SFrames[1] is now supported in gcc binutils and soon will also be supported
>by LLVM. 

Please consider dropping the statement, "soon will also be supported by LLVM."
Speaking as LLVM's MC, lld/ELF, and binary utilities maintainer, I have significant concerns about the v2 format, specifically its apparent disregard for standard ELF and linker conventions
(https://maskray.me/blog/2025-09-28-remarks-on-sframe#linking-and-execution-views)

To arm64 maintainers, it is critical time to revisit a unwind
information format, as I have outlined in my blog post:

A sorted address table like .eh_frame_hdr might still be needed, but the
design could be very different for arm64.

I am curious whether anyone has thought about a library that parses .eh_frame and generates SFrame.
If objtool integrates  this library, it can generate SFrame for vmlinux and modules without relying on assembler/linker.
Linker and assembler requires a level of stability that is currently concerning on the toolchain side.

(https://sourceware.org/pipermail/binutils/2025-October/144974.html
"This "linker will DTRT" assertion glosses over significant
implementation complexity. Each version needs not just a reader but
version-specific *merging* logic in every linker—fundamentally different
from simply reading a format.")

>SFrames acts more like ORC, and lives in the ELF executable
>file as its own section. Like ORC it has two tables where the first table
>is sorted by instruction pointers (IP) and using the current IP and finding
>it's entry in the first table, it will take you to the second table which
>will tell you where the return address of the current function is located
>and then you can use that address to look it up in the first table to find
>the return address of that function, and so on. This performs a user
>space stack walk.
>
>Now because the SFrame section lives in the ELF file it needs to be faulted
>into memory when it is used. This means that walking the user space stack
>requires being in a faultable context. As profilers like perf request a stack
>trace in interrupt or NMI context, it cannot do the walking when it is
>requested. Instead it must be deferred until it is safe to fault in user
>space. One place this is known to be safe is when the task is about to return
>back to user space.
>
>This series makes the deferred unwind code implement SFrames.
>
>[1] https://sourceware.org/binutils/wiki/sframe
>
>Changes since v10:
>- Rebase on v6.17-rc1 with Peter's unwind user fixes and x86 support
>  series [2] and Steve's support for the deferred unwinding infrastructure
>  series in perf [3] and perf tool [4] on top.
>- Support for SFrame V2 PC-relative FDE function start address. (Jens)
>- Support for SFrame V2 representing RA undefined as indication for
>  outermost frames. (Jens)
>
>[2]: [PATCH 00/12] Various fixes and x86 support,
>     https://lore.kernel.org/all/20250924075948.579302904@infradead.org/
>[3]: [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure,
>     https://lore.kernel.org/all/20251007214008.080852573@kernel.org/
>[4]: [PATCH v16 0/4] perf tool: Support the deferred unwinding infrastructure,
>     https://lore.kernel.org/all/20250908175319.841517121@kernel.org/
>
>Patches 1 and 2 are suggested fixups to patches from Peter's unwind user
>fixes and x86 support series.  They keep the factoring out of the word
>size from the frame's CFA, FP, and RA offsets local to unwind user fp, as
>unwind user sframe does use absolute offsets.
>
>Patches 3, 6, and 14 have been updated to exclusively support the recent
>PC-relative SFrame FDE function start address encoding.  With Binutils 2.45
>the SFrame V2 FDE function start address field value is an offset from the
>field (i.e. PC-relative) instead of from the .sframe section start.  This
>is indicated by the new SFrame header flag SFRAME_F_FDE_FUNC_START_PCREL.
>Old SFrame V2 sections get rejected with dynamic debug message
>"bad/unsupported sframe header".
>
>Patches 9 and 10 add support to unwind user and unwind user sframe for
>a recent change of the SFrame V2 format to represent an undefined
>return address as an SFrame FRE without any offsets, which is used as
>indication for outermost frames.  Note that currently only a development
>build of Binutils mainline generates SFrame information including this
>new indication for outermost frames.  SFrame information without the new
>indication is still supported.  Without these patches unwind user sframe
>would identify such new SFrame FREs without any offsets as corrupted and
>remove the .sframe section, causing any any further stack tracing using
>sframe to fail.
>
>Regards,
>Jens
>
>
>Jens Remus (4):
>  fixup! unwind: Implement compat fp unwind
>  fixup! unwind_user/x86: Enable frame pointer unwinding on x86
>  unwind_user: Stop when reaching an outermost frame
>  unwind_user/sframe: Add support for outermost frame indication
>
>Josh Poimboeuf (11):
>  unwind_user/sframe: Add support for reading .sframe headers
>  unwind_user/sframe: Store sframe section data in per-mm maple tree
>  x86/uaccess: Add unsafe_copy_from_user() implementation
>  unwind_user/sframe: Add support for reading .sframe contents
>  unwind_user/sframe: Detect .sframe sections in executables
>  unwind_user/sframe: Wire up unwind_user to sframe
>  unwind_user/sframe/x86: Enable sframe unwinding on x86
>  unwind_user/sframe: Remove .sframe section on detected corruption
>  unwind_user/sframe: Show file name in debug output
>  unwind_user/sframe: Add .sframe validation option
>  unwind_user/sframe: Add prctl() interface for registering .sframe
>    sections
>
> MAINTAINERS                        |   1 +
> arch/Kconfig                       |  23 ++
> arch/x86/Kconfig                   |   1 +
> arch/x86/include/asm/mmu.h         |   2 +-
> arch/x86/include/asm/uaccess.h     |  39 +-
> arch/x86/include/asm/unwind_user.h |  11 +-
> fs/binfmt_elf.c                    |  49 ++-
> include/linux/mm_types.h           |   3 +
> include/linux/sframe.h             |  60 +++
> include/linux/unwind_user_types.h  |   5 +-
> include/uapi/linux/elf.h           |   1 +
> include/uapi/linux/prctl.h         |   6 +-
> kernel/fork.c                      |  10 +
> kernel/sys.c                       |   9 +
> kernel/unwind/Makefile             |   3 +-
> kernel/unwind/sframe.c             | 615 +++++++++++++++++++++++++++++
> kernel/unwind/sframe.h             |  72 ++++
> kernel/unwind/sframe_debug.h       |  68 ++++
> kernel/unwind/user.c               |  56 ++-
> mm/init-mm.c                       |   2 +
> 20 files changed, 1004 insertions(+), 32 deletions(-)
> create mode 100644 include/linux/sframe.h
> create mode 100644 kernel/unwind/sframe.c
> create mode 100644 kernel/unwind/sframe.h
> create mode 100644 kernel/unwind/sframe_debug.h
>
>-- 
>2.48.1
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
  2025-10-23  8:09 ` Fangrui Song
@ 2025-10-23 14:23   ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-10-23 14:23 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Jens Remus, linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
	Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
	David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik

On Thu, 23 Oct 2025 01:09:02 -0700
Fangrui Song <maskray@sourceware.org> wrote:

> Please consider dropping the statement, "soon will also be supported by LLVM."
> Speaking as LLVM's MC, lld/ELF, and binary utilities maintainer, I have
> significant concerns about the v2 format, specifically its apparent
> disregard for standard ELF and linker conventions
> (https://maskray.me/blog/2025-09-28-remarks-on-sframe#linking-and-execution-views)

Please note, v2 can be dropped entirely. There's no plans to have the Linux
kernel ship with v2. The patches for v2 for the Linux kernel are for
testing purposes only (which was what help find the issues with v2).

The plan is to have v3 be the first versions supported by an official
release of the Linux kernel with the assumptions that changes after v3 will
be minimal.

The reason there was such a big difference between v2 and v3 is because v2
was the first version to have a consumer try to use it in a more production
like environment. This found several corner cases that needed to be
addressed, and that the current layout of v2 was not acceptable.

No linker needs to support v2 as there will be no consumers of it.

-- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents
  2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2025-10-23 16:04   ` Jens Remus
  0 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-23 16:04 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik, Steven Rostedt (Google)

Hi Steve, et al.,

as discussed during yesterdays SFrame call I will be sending two RFC
fixup patches shortly as POC to demonstrate how this patch and
"[PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option"
could benefit from introducing an internal FDE representation (e.g.
struct sframe_fde_internal) similar to the used internal FRE
representation (struct sframe_fre).

On 10/22/2025 4:43 PM, Jens Remus wrote:

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

> +static __always_inline int __read_fde(struct sframe_section *sec,
> +				      unsigned int fde_num,
> +				      struct sframe_fde *fde,
> +				      unsigned long *fde_start_base)

The goal would be to eliminate the passing through of fde_start_base as
well as the various computations of the effective function start address
(= *fde_start_base + fde->start_addr) throughout this module.  The
internal FDE representation could then simply convey the effective
function start address via an "unsigned long func_start_addr" field.

> +{
> +	unsigned long fde_addr, ip;
> +
> +	fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
> +	unsafe_copy_from_user(fde, (void __user *)fde_addr,
> +			      sizeof(struct sframe_fde), Efault);
> +
> +	ip = fde_addr + fde->start_addr;
> +	if (ip < sec->text_start || ip > sec->text_end)
> +		return -EINVAL;
> +
> +	*fde_start_base = fde_addr;
> +	return 0;
> +
> +Efault:
> +	return -EFAULT;
> +}
> +
> +static __always_inline int __find_fde(struct sframe_section *sec,
> +				      unsigned long ip,
> +				      struct sframe_fde *fde,
> +				      unsigned long *fde_start_base)

fde_start_base would get eliminated.

> +{
> +	unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
> +	struct sframe_fde __user *first, *low, *high, *found = NULL;
> +	int ret;
> +
> +	first = (void __user *)sec->fdes_start;
> +	low = first;
> +	high = first + sec->num_fdes - 1;
> +
> +	while (low <= high) {
> +		struct sframe_fde __user *mid;
> +		s32 func_off;
> +		unsigned long func_addr;
> +
> +		mid = low + ((high - low) / 2);
> +
> +		unsafe_get_user(func_off, (s32 __user *)mid, Efault);
> +		func_addr = (unsigned long)mid + func_off;
> +
> +		if (ip >= func_addr) {
> +			if (func_addr < func_addr_low)
> +				return -EFAULT;
> +
> +			func_addr_low = func_addr;
> +
> +			found = mid;
> +			low = mid + 1;
> +		} else {
> +			if (func_addr > func_addr_high)
> +				return -EFAULT;
> +
> +			func_addr_high = func_addr;
> +
> +			high = mid - 1;
> +		}
> +	}
> +
> +	if (!found)
> +		return -EINVAL;
> +
> +	ret = __read_fde(sec, found - first, fde, fde_start_base);

fde_start_base would get eliminated.

> +	if (ret)
> +		return ret;
> +
> +	/* make sure it's not in a gap */
> +	if (ip < *fde_start_base + fde->start_addr ||
> +	    ip >= *fde_start_base + fde->start_addr + fde->func_size)

Would simplify to:

	if (ip < fde->func_start_addr ||
	    ip >= fde->func_start_addr + fde->func_size)

> +		return -EINVAL;
> +
> +	return 0;
> +
> +Efault:
> +	return -EFAULT;
> +}

> +static __always_inline int __find_fre(struct sframe_section *sec,
> +				      struct sframe_fde *fde,
> +				      unsigned long fde_start_base,

fde_start_base would get eliminated.

> +				      unsigned long ip,
> +				      struct unwind_user_frame *frame)
> +{
> +	unsigned char fde_type = SFRAME_FUNC_FDE_TYPE(fde->info);
> +	struct sframe_fre *fre, *prev_fre = NULL;
> +	struct sframe_fre fres[2];
> +	unsigned long fre_addr;
> +	bool which = false;
> +	unsigned int i;
> +	u32 ip_off;
> +
> +	ip_off = ip - (fde_start_base + fde->start_addr);

Would simplify to:

	ip_off = ip - fde->func_start_addr;

> +
> +	if (fde_type == SFRAME_FDE_TYPE_PCMASK)
> +		ip_off %= fde->rep_size;
> +
> +	fre_addr = sec->fres_start + fde->fres_off;
> +
> +	for (i = 0; i < fde->fres_num; i++) {
> +		int ret;
> +
> +		/*
> +		 * Alternate between the two fre_addr[] entries for 'fre' and
> +		 * 'prev_fre'.
> +		 */
> +		fre = which ? fres : fres + 1;
> +		which = !which;
> +
> +		ret = __read_fre(sec, fde, fre_addr, fre);
> +		if (ret)
> +			return ret;
> +
> +		fre_addr += fre->size;
> +
> +		if (prev_fre && fre->ip_off <= prev_fre->ip_off)
> +			return -EFAULT;
> +
> +		if (fre->ip_off > ip_off)
> +			break;
> +
> +		prev_fre = fre;
> +	}
> +
> +	if (!prev_fre)
> +		return -EINVAL;
> +	fre = prev_fre;
> +
> +	frame->cfa_off = fre->cfa_off;
> +	frame->ra_off  = fre->ra_off;
> +	frame->fp_off  = fre->fp_off;
> +	frame->use_fp  = SFRAME_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
> +
> +	return 0;
> +}
> +
> +int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct sframe_section *sec;
> +	struct sframe_fde fde;
> +	unsigned long fde_start_base;

fde_start_base would get eliminated.

> +	int ret;
> +
> +	if (!mm)
> +		return -EINVAL;
> +
> +	guard(srcu)(&sframe_srcu);
> +
> +	sec = mtree_load(&mm->sframe_mt, ip);
> +	if (!sec)
> +		return -EINVAL;
> +
> +	if (!user_read_access_begin((void __user *)sec->sframe_start,
> +				    sec->sframe_end - sec->sframe_start))
> +		return -EFAULT;
> +
> +	ret = __find_fde(sec, ip, &fde, &fde_start_base);

fde_start_base would get eliminated.

> +	if (ret)
> +		goto end;
> +
> +	ret = __find_fre(sec, &fde, fde_start_base, ip, frame);

fde_start_base would get eliminated.

> +end:
> +	user_read_access_end();
> +	return ret;
> +}
Regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com

IBM

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents
  2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
                   ` (16 preceding siblings ...)
  2025-10-23  8:09 ` Fangrui Song
@ 2025-10-23 16:05 ` Jens Remus
  2025-10-23 16:05   ` [RFC PATCH 2/2] fixup! unwind_user/sframe: Add .sframe validation option Jens Remus
  17 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-23 16:05 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

This RFC fixup is POC to demonstrate how the SFrame reading code
could benefit from introducing an internal FDE representation (struct
sframe_fde_internal) similar to the used internal FRE representation
(struct sframe_fre).  The goal is to eliminate the passing through of
fde_start_base in many places as well as the various computations of the
effective function start address (= *fde_start_base + fde->start_addr)
throughout this module.  The internal FDE representation simply conveys
the effective function start address via the "unsigned long
func_start_addr" field.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 kernel/unwind/sframe.c | 52 ++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 3d7ac4eaa8b7..f88fc2c92c58 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -17,6 +17,15 @@
 #include "sframe.h"
 #include "sframe_debug.h"
 
+struct sframe_fde_internal {
+	unsigned long	func_start_addr;
+	u32		func_size;
+	u32		fres_off;
+	u32		fres_num;
+	u8		info;
+	u8		rep_size;
+};
+
 struct sframe_fre {
 	unsigned int	size;
 	u32		ip_off;
@@ -45,20 +54,26 @@ static __always_inline unsigned char offset_size_enum_to_size(unsigned char off_
 
 static __always_inline int __read_fde(struct sframe_section *sec,
 				      unsigned int fde_num,
-				      struct sframe_fde *fde,
-				      unsigned long *fde_start_base)
+				      struct sframe_fde_internal *fde)
 {
-	unsigned long fde_addr, ip;
+	unsigned long fde_addr, func_addr;
+	struct sframe_fde _fde;
 
 	fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde));
-	unsafe_copy_from_user(fde, (void __user *)fde_addr,
+	unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
 			      sizeof(struct sframe_fde), Efault);
 
-	ip = fde_addr + fde->start_addr;
-	if (ip < sec->text_start || ip > sec->text_end)
+	func_addr = fde_addr + _fde.start_addr;
+	if (func_addr < sec->text_start || func_addr > sec->text_end)
 		return -EINVAL;
 
-	*fde_start_base = fde_addr;
+	fde->func_start_addr	= func_addr;
+	fde->func_size		= _fde.func_size;
+	fde->fres_off		= _fde.fres_off;
+	fde->fres_num		= _fde.fres_num;
+	fde->info		= _fde.info;
+	fde->rep_size		= _fde.rep_size;
+
 	return 0;
 
 Efault:
@@ -67,8 +82,7 @@ static __always_inline int __read_fde(struct sframe_section *sec,
 
 static __always_inline int __find_fde(struct sframe_section *sec,
 				      unsigned long ip,
-				      struct sframe_fde *fde,
-				      unsigned long *fde_start_base)
+				      struct sframe_fde_internal *fde)
 {
 	unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
 	struct sframe_fde __user *first, *low, *high, *found = NULL;
@@ -109,13 +123,13 @@ static __always_inline int __find_fde(struct sframe_section *sec,
 	if (!found)
 		return -EINVAL;
 
-	ret = __read_fde(sec, found - first, fde, fde_start_base);
+	ret = __read_fde(sec, found - first, fde);
 	if (ret)
 		return ret;
 
 	/* make sure it's not in a gap */
-	if (ip < *fde_start_base + fde->start_addr ||
-	    ip >= *fde_start_base + fde->start_addr + fde->func_size)
+	if (ip < fde->func_start_addr ||
+	    ip >= fde->func_start_addr + fde->func_size)
 		return -EINVAL;
 
 	return 0;
@@ -165,7 +179,7 @@ static __always_inline int __find_fde(struct sframe_section *sec,
 		 s32:	UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
 
 static __always_inline int __read_fre(struct sframe_section *sec,
-				      struct sframe_fde *fde,
+				      struct sframe_fde_internal *fde,
 				      unsigned long fre_addr,
 				      struct sframe_fre *fre)
 {
@@ -244,8 +258,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
 }
 
 static __always_inline int __find_fre(struct sframe_section *sec,
-				      struct sframe_fde *fde,
-				      unsigned long fde_start_base,
+				      struct sframe_fde_internal *fde,
 				      unsigned long ip,
 				      struct unwind_user_frame *frame)
 {
@@ -257,7 +270,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
 	unsigned int i;
 	u32 ip_off;
 
-	ip_off = ip - (fde_start_base + fde->start_addr);
+	ip_off = ip - fde->func_start_addr;
 
 	if (fde_type == SFRAME_FDE_TYPE_PCMASK)
 		ip_off %= fde->rep_size;
@@ -306,8 +319,7 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
 {
 	struct mm_struct *mm = current->mm;
 	struct sframe_section *sec;
-	struct sframe_fde fde;
-	unsigned long fde_start_base;
+	struct sframe_fde_internal fde;
 	int ret;
 
 	if (!mm)
@@ -323,11 +335,11 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
 				    sec->sframe_end - sec->sframe_start))
 		return -EFAULT;
 
-	ret = __find_fde(sec, ip, &fde, &fde_start_base);
+	ret = __find_fde(sec, ip, &fde);
 	if (ret)
 		goto end;
 
-	ret = __find_fre(sec, &fde, fde_start_base, ip, frame);
+	ret = __find_fre(sec, &fde, ip, frame);
 end:
 	user_read_access_end();
 
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/2] fixup! unwind_user/sframe: Add .sframe validation option
  2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2025-10-23 16:05   ` Jens Remus
  0 siblings, 0 replies; 28+ messages in thread
From: Jens Remus @ 2025-10-23 16:05 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
  Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
	Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
	Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
	Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
	Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
	Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
	Vasily Gorbik

This RFC fixup is POC to demonstrate how the SFrame validation code
would adjust if introducing an internal FDE representation (struct
sframe_fde_internal) similar to the used internal FRE representation
(struct sframe_fre) in the SFrame reading code.  The goal is to
eliminate the passing through of fde_start_base in many places as well
as the various computations of the effective function start address
(= *fde_start_base + fde->start_addr) throughout this module.  The
internal FDE representation simply conveys the effective function start
address via the "unsigned long func_start_addr" field.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 kernel/unwind/sframe.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index f88fc2c92c58..f2977c010117 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -354,21 +354,21 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
 #ifdef CONFIG_SFRAME_VALIDATION
 
 static int safe_read_fde(struct sframe_section *sec,
-			 unsigned int fde_num, struct sframe_fde *fde,
-			 unsigned long *fde_start_base)
+			 unsigned int fde_num, struct sframe_fde_internal *fde)
 {
 	int ret;
 
 	if (!user_read_access_begin((void __user *)sec->sframe_start,
 				    sec->sframe_end - sec->sframe_start))
 		return -EFAULT;
-	ret = __read_fde(sec, fde_num, fde, fde_start_base);
+	ret = __read_fde(sec, fde_num, fde);
 	user_read_access_end();
 	return ret;
 }
 
 static int safe_read_fre(struct sframe_section *sec,
-			 struct sframe_fde *fde, unsigned long fre_addr,
+			 struct sframe_fde_internal *fde,
+			 unsigned long fre_addr,
 			 struct sframe_fre *fre)
 {
 	int ret;
@@ -388,18 +388,18 @@ static int sframe_validate_section(struct sframe_section *sec)
 
 	for (i = 0; i < sec->num_fdes; i++) {
 		struct sframe_fre *fre, *prev_fre = NULL;
-		unsigned long ip, fde_start_base, fre_addr;
-		struct sframe_fde fde;
+		unsigned long ip, fre_addr;
+		struct sframe_fde_internal fde;
 		struct sframe_fre fres[2];
 		bool which = false;
 		unsigned int j;
 		int ret;
 
-		ret = safe_read_fde(sec, i, &fde, &fde_start_base);
+		ret = safe_read_fde(sec, i, &fde);
 		if (ret)
 			return ret;
 
-		ip = fde_start_base + fde.start_addr;
+		ip = fde.func_start_addr;
 		if (ip <= prev_ip) {
 			dbg_sec("fde %u not sorted\n", i);
 			return -EFAULT;
@@ -416,8 +416,8 @@ static int sframe_validate_section(struct sframe_section *sec)
 			ret = safe_read_fre(sec, &fde, fre_addr, fre);
 			if (ret) {
 				dbg_sec("fde %u: __read_fre(%u) failed\n", i, j);
-				dbg_sec("FDE: start_addr:0x%x func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
-					fde.start_addr, fde.func_size,
+				dbg_sec("FDE: func_start_addr:0x%lx func_size:0x%x fres_off:0x%x fres_num:%d info:%u rep_size:%u\n",
+					fde.func_start_addr, fde.func_size,
 					fde.fres_off, fde.fres_num,
 					fde.info, fde.rep_size);
 				return ret;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
  2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2025-10-24 13:44   ` Peter Zijlstra
  2025-10-24 14:29     ` Jens Remus
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2025-10-24 13:44 UTC (permalink / raw)
  To: Jens Remus
  Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
	Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
	David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik,
	Steven Rostedt (Google)

On Wed, Oct 22, 2025 at 04:43:19PM +0200, Jens Remus wrote:

> @@ -26,12 +27,10 @@ get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
>  	return get_user(*word, addr);
>  }
>  
> -static int unwind_user_next_fp(struct unwind_user_state *state)
> +static int unwind_user_next_common(struct unwind_user_state *state,
> +				   const struct unwind_user_frame *frame,
> +				   struct pt_regs *regs)
>  {

What is pt_regs for? AFAICT it isn't actually used in any of the
following patches.

> -	const struct unwind_user_frame fp_frame = {
> -		ARCH_INIT_USER_FP_FRAME(state->ws)
> -	};
> -	const struct unwind_user_frame *frame = &fp_frame;
>  	unsigned long cfa, fp, ra;
>  
>  	if (frame->use_fp) {
> @@ -67,6 +66,26 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
>  	return 0;
>  }
>  
> +static int unwind_user_next_sframe(struct unwind_user_state *state)
> +{
> +	struct unwind_user_frame _frame, *frame;
> +
> +	/* sframe expects the frame to be local storage */
> +	frame = &_frame;
> +	if (sframe_find(state->ip, frame))
> +		return -ENOENT;
> +	return unwind_user_next_common(state, frame, task_pt_regs(current));
> +}

Would it not be simpler to write:

static int unwind_user_next_sframe(struct unwind_user_state *state)
{
	struct unwind_user_frame frame;

	/* sframe expects the frame to be local storage */
	if (sframe_find(state->ip, &frame))
		return -ENOENT;
	return unwind_user_next_common(state, &frame, task_pt_regs(current));
}

hmm?

> +static int unwind_user_next_fp(struct unwind_user_state *state)
> +{
> +	const struct unwind_user_frame fp_frame = {
> +		ARCH_INIT_USER_FP_FRAME(state->ws)
> +	};
> +
> +	return unwind_user_next_common(state, &fp_frame, task_pt_regs(current));
> +}
> +
>  static int unwind_user_next(struct unwind_user_state *state)
>  {
>  	unsigned long iter_mask = state->available_types;
> @@ -80,6 +99,16 @@ static int unwind_user_next(struct unwind_user_state *state)
>  
>  		state->current_type = type;
>  		switch (type) {
> +		case UNWIND_USER_TYPE_SFRAME:
> +			switch (unwind_user_next_sframe(state)) {
> +			case 0:
> +				return 0;
> +			case -ENOENT:
> +				continue;	/* Try next method. */
> +			default:
> +				state->done = true;
> +			}
> +			break;

Should it remove SFRAME from state->available_types at this point?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
  2025-10-24 13:44   ` Peter Zijlstra
@ 2025-10-24 14:29     ` Jens Remus
  2025-10-24 19:00       ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-10-24 14:29 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
	Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
	Kees Cook, Carlos O'Donell, Sam James, Borislav Petkov,
	Dave Hansen, David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik,
	Steven Rostedt (Google)

On 10/24/2025 3:44 PM, Peter Zijlstra wrote:
> On Wed, Oct 22, 2025 at 04:43:19PM +0200, Jens Remus wrote:
> 
>> @@ -26,12 +27,10 @@ get_user_word(unsigned long *word, unsigned long base, int off, unsigned int ws)
>>  	return get_user(*word, addr);
>>  }
>>  
>> -static int unwind_user_next_fp(struct unwind_user_state *state)
>> +static int unwind_user_next_common(struct unwind_user_state *state,
>> +				   const struct unwind_user_frame *frame,
>> +				   struct pt_regs *regs)
>>  {
> 
> What is pt_regs for? AFAICT it isn't actually used in any of the
> following patches.

Good catch!  No idea.  It started to appear in v9 of the series:

[PATCH v8 06/12] unwind_user/sframe: Wire up unwind_user to sframe
https://lore.kernel.org/all/20250708021159.386608979@kernel.org/

[PATCH v9 06/11] unwind_user/sframe: Wire up unwind_user to sframe
https://lore.kernel.org/all/20250717012936.619600891@kernel.org/

My s390 support for unwind user sframe will make use of it, but it
should better be introduced there then.

@Steven: Any idea why you added pt_regs?  Your v9 even had this other
instance of unused pt_regs:

+static struct unwind_user_frame *get_fp_frame(struct pt_regs *regs)
+{
+	return &fp_frame;
+}

>> @@ -67,6 +66,26 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
>>  	return 0;
>>  }
>>  
>> +static int unwind_user_next_sframe(struct unwind_user_state *state)
>> +{
>> +	struct unwind_user_frame _frame, *frame;
>> +
>> +	/* sframe expects the frame to be local storage */
>> +	frame = &_frame;
>> +	if (sframe_find(state->ip, frame))
>> +		return -ENOENT;
>> +	return unwind_user_next_common(state, frame, task_pt_regs(current));
>> +}
> 
> Would it not be simpler to write:
> 
> static int unwind_user_next_sframe(struct unwind_user_state *state)
> {
> 	struct unwind_user_frame frame;
> 
> 	/* sframe expects the frame to be local storage */
> 	if (sframe_find(state->ip, &frame))
> 		return -ENOENT;
> 	return unwind_user_next_common(state, &frame, task_pt_regs(current));
> }
> 
> hmm?

I agree.  Must have been a leftover from changes from v8 to v9.

>> @@ -80,6 +99,16 @@ static int unwind_user_next(struct unwind_user_state *state)
>>  
>>  		state->current_type = type;
>>  		switch (type) {
>> +		case UNWIND_USER_TYPE_SFRAME:
>> +			switch (unwind_user_next_sframe(state)) {
>> +			case 0:
>> +				return 0;
>> +			case -ENOENT:
>> +				continue;	/* Try next method. */
>> +			default:
>> +				state->done = true;
>> +			}
>> +			break;
> 
> Should it remove SFRAME from state->available_types at this point?

In the -ENOENT case?  If the reason is that there was either no SFrame
section or no SFrame information (SFrame FRE) for the IP, then SFRAME
could potentially be successful with the next IP in the call chain.
Provided the other unwind methods do correctly unwind both SP and FP.

@Steven: What is your opinion on this?

Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com

IBM

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe
  2025-10-24 14:29     ` Jens Remus
@ 2025-10-24 19:00       ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-10-24 19:00 UTC (permalink / raw)
  To: Jens Remus
  Cc: Peter Zijlstra, Steven Rostedt, linux-kernel, linux-trace-kernel,
	bpf, x86, linux-mm, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
	Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
	David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik

On Fri, 24 Oct 2025 16:29:07 +0200
Jens Remus <jremus@linux.ibm.com> wrote:

> @Steven: Any idea why you added pt_regs?  Your v9 even had this other
> instance of unused pt_regs:
> 
> +static struct unwind_user_frame *get_fp_frame(struct pt_regs *regs)
> +{
> +	return &fp_frame;
> +}

According to the history:

  https://lore.kernel.org/linux-trace-kernel/20250717012848.927473176@kernel.org/

Which has:

  Changes since v8: https://lore.kernel.org/linux-trace-kernel/20250708021115.894007410@kernel.org/

  - Rebased on the changes by Mathieu in the kernel/unwind/user.c file
    https://lore.kernel.org/all/20250710164301.3094-2-mathieu.desnoyers@efficios.com/

It looks like it came in from Mathieu's updates, which was trying to deal
with compat. But then after noticing that compat wasn't working on my tests
boxes, I removed it. The removal failed to notice that regs is now unused.

-- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers
  2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2025-11-18 17:04   ` Jens Remus
  2025-11-18 19:26     ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Remus @ 2025-11-18 17:04 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar,
	Jiri Olsa, Arnaldo Carvalho de Melo, Namhyung Kim,
	Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
	Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
	Kees Cook, Carlos O'Donell, Sam James, Borislav Petkov,
	Dave Hansen, David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik,
	Steven Rostedt (Google)

Hello Josh and Steven!

On 10/22/2025 4:43 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
> 
> In preparation for unwinding user space stacks with sframe, add basic
> sframe compile infrastructure and support for reading the .sframe
> section header.
> 
> sframe_add_section() reads the header and unconditionally returns an
> error, so it's not very useful yet.  A subsequent patch will improve
> that.
> 
> Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
> 
> [ Jens Remus: Add support for PC-relative FDE function start address. ]

I took a closer look and wondered whether some parts should better be
moved to subsequent patches.

> diff --git a/include/linux/sframe.h b/include/linux/sframe.h

> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_SFRAME_H
> +#define _LINUX_SFRAME_H
> +
> +#include <linux/mm_types.h>

Move to "[PATCH v11 04/15] unwind_user/sframe: Store sframe section data
in per-mm maple tree".

> +#include <linux/unwind_user_types.h>

Move to "[PATCH v11 06/15] unwind_user/sframe: Add support for reading
.sframe contents".  find_sframe() needs the types.

> +
> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> +
> +struct sframe_section {
> +	unsigned long	sframe_start;
> +	unsigned long	sframe_end;
> +	unsigned long	text_start;
> +	unsigned long	text_end;
> +
> +	unsigned long	fdes_start;
> +	unsigned long	fres_start;
> +	unsigned long	fres_end;
> +	unsigned int	num_fdes;
> +
> +	signed char	ra_off;
> +	signed char	fp_off;
> +};
> +
> +extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> +			      unsigned long text_start, unsigned long text_end);
> +extern int sframe_remove_section(unsigned long sframe_addr);
> +
> +#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
> +
> +static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> +				     unsigned long text_start, unsigned long text_end)
> +{
> +	return -ENOSYS;
> +}
> +static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
> +
> +#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
> +
> +#endif /* _LINUX_SFRAME_H */

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

> @@ -0,0 +1,137 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Userspace sframe access functions
> + */
> +
> +#define pr_fmt(fmt)	"sframe: " fmt
> +
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/srcu.h>
> +#include <linux/uaccess.h>
> +#include <linux/mm.h>

Move to "[PATCH v11 04/15] unwind_user/sframe: Store sframe section data
in per-mm maple tree".

> +#include <linux/string_helpers.h>
> +#include <linux/sframe.h>
> +#include <linux/unwind_user_types.h>
> +
> +#include "sframe.h"
> +
> +#define dbg(fmt, ...)							\
> +	pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
> +
> +static void free_section(struct sframe_section *sec)
> +{
> +	kfree(sec);
> +}
> +
> +static int sframe_read_header(struct sframe_section *sec)
> +{
> +	unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
> +	struct sframe_header shdr;
> +	unsigned int num_fdes;
> +
> +	if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
> +		dbg("header usercopy failed\n");
> +		return -EFAULT;
> +	}
> +
> +	if (shdr.preamble.magic != SFRAME_MAGIC ||
> +	    shdr.preamble.version != SFRAME_VERSION_2 ||
> +	    !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
> +	    !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
> +	    shdr.auxhdr_len) {
> +		dbg("bad/unsupported sframe header\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!shdr.num_fdes || !shdr.num_fres) {
> +		dbg("no fde/fre entries\n");
> +		return -EINVAL;
> +	}
> +
> +	header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
> +	if (header_end >= sec->sframe_end) {
> +		dbg("header doesn't fit in section\n");
> +		return -EINVAL;
> +	}
> +
> +	num_fdes   = shdr.num_fdes;
> +	fdes_start = header_end + shdr.fdes_off;
> +	fdes_end   = fdes_start + (num_fdes * sizeof(struct sframe_fde));
> +
> +	fres_start = header_end + shdr.fres_off;
> +	fres_end   = fres_start + shdr.fre_len;
> +
> +	if (fres_start < fdes_end || fres_end > sec->sframe_end) {
> +		dbg("inconsistent fde/fre offsets\n");
> +		return -EINVAL;
> +	}
> +
> +	sec->num_fdes		= num_fdes;
> +	sec->fdes_start		= fdes_start;
> +	sec->fres_start		= fres_start;
> +	sec->fres_end		= fres_end;
> +
> +	sec->ra_off		= shdr.cfa_fixed_ra_offset;
> +	sec->fp_off		= shdr.cfa_fixed_fp_offset;
> +
> +	return 0;
> +}
> +
> +int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
> +		       unsigned long text_start, unsigned long text_end)
> +{
> +	struct maple_tree *sframe_mt = &current->mm->sframe_mt;

Move to "[PATCH v11 04/15] unwind_user/sframe: Store sframe section data
in per-mm maple tree".

> +	struct vm_area_struct *sframe_vma, *text_vma;
> +	struct mm_struct *mm = current->mm;
> +	struct sframe_section *sec;
> +	int ret;
> +
> +	if (!sframe_start || !sframe_end || !text_start || !text_end) {
> +		dbg("zero-length sframe/text address\n");
> +		return -EINVAL;
> +	}
> +
> +	scoped_guard(mmap_read_lock, mm) {
> +		sframe_vma = vma_lookup(mm, sframe_start);
> +		if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
> +			dbg("bad sframe address (0x%lx - 0x%lx)\n",
> +			    sframe_start, sframe_end);
> +			return -EINVAL;
> +		}
> +
> +		text_vma = vma_lookup(mm, text_start);
> +		if (!text_vma ||
> +		    !(text_vma->vm_flags & VM_EXEC) ||
> +		    text_end > text_vma->vm_end) {
> +			dbg("bad text address (0x%lx - 0x%lx)\n",
> +			    text_start, text_end);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	sec = kzalloc(sizeof(*sec), GFP_KERNEL);
> +	if (!sec)
> +		return -ENOMEM;
> +
> +	sec->sframe_start	= sframe_start;
> +	sec->sframe_end		= sframe_end;
> +	sec->text_start		= text_start;
> +	sec->text_end		= text_end;
> +
> +	ret = sframe_read_header(sec);
> +	if (ret)
> +		goto err_free;
> +
> +	/* TODO nowhere to store it yet - just free it and return an error */

An alternative would be to move sframe_add_section() to
"[PATCH v11 04/15] unwind_user/sframe: Store sframe section data in
per-mm maple tree" and reorder the patches as outlined below.

> +	ret = -ENOSYS;
> +
> +err_free:
> +	free_section(sec);
> +	return ret;
> +}
> +
> +int sframe_remove_section(unsigned long sframe_start)
> +{
> +	return -ENOSYS;
> +}

I wonder whether the series should be restructured as follows:

unwind_user/sframe: Store .sframe section data in per-mm maple tree
unwind_user/sframe: Detect .sframe sections in executables
unwind_user/sframe: Add support for reading .sframe headers
unwind_user/sframe: Add support for reading .sframe contents
unwind_user/sframe: Wire up unwind_user to sframe
x86/uaccess: Add unsafe_copy_from_user() implementation
unwind_user/sframe/x86: Enable sframe unwinding on x86
unwind_user: Stop when reaching an outermost frame
unwind_user/sframe: Add support for outermost frame indication
unwind_user/sframe: Remove .sframe section on detected corruption
unwind_user/sframe: Show file name in debug output
unwind_user/sframe: Add .sframe validation option
unwind_user/sframe: Add prctl() interface for registering .sframe sections

While moving sframe_add_section() and sframe_remove_section() from
"unwind_user/sframe: Add support for reading .sframe headers" to
"unwind_user/sframe: Store .sframe section data in per-mm maple tree" or
into a new second patch, as they depend on the first and are required
by the third.

What are your thoughts?  The reordering might be wasted effort.

Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com

IBM

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers
  2025-11-18 17:04   ` Jens Remus
@ 2025-11-18 19:26     ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2025-11-18 19:26 UTC (permalink / raw)
  To: Jens Remus
  Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
	Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
	Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
	Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
	Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
	David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
	Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
	Vlastimil Babka, Heiko Carstens, Vasily Gorbik

On Tue, 18 Nov 2025 18:04:27 +0100
Jens Remus <jremus@linux.ibm.com> wrote:

> I wonder whether the series should be restructured as follows:
> 
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Wire up unwind_user to sframe
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe: Add prctl() interface for registering .sframe sections
> 
> While moving sframe_add_section() and sframe_remove_section() from
> "unwind_user/sframe: Add support for reading .sframe headers" to
> "unwind_user/sframe: Store .sframe section data in per-mm maple tree" or
> into a new second patch, as they depend on the first and are required
> by the third.
> 
> What are your thoughts?  The reordering might be wasted effort.

If you feel it makes it better, sure, go ahead and do it.

-- Steve


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-11-18 19:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-22 14:43 [PATCH v11 00/15] unwind_deferred: Implement sframe handling Jens Remus
2025-10-22 14:43 ` [PATCH v11 01/15] fixup! unwind: Implement compat fp unwind Jens Remus
2025-10-22 14:43 ` [PATCH v11 02/15] fixup! unwind_user/x86: Enable frame pointer unwinding on x86 Jens Remus
2025-10-22 14:43 ` [PATCH v11 03/15] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
2025-11-18 17:04   ` Jens Remus
2025-11-18 19:26     ` Steven Rostedt
2025-10-22 14:43 ` [PATCH v11 04/15] unwind_user/sframe: Store sframe section data in per-mm maple tree Jens Remus
2025-10-22 14:43 ` [PATCH v11 05/15] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
2025-10-22 14:43 ` [PATCH v11 06/15] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2025-10-23 16:04   ` Jens Remus
2025-10-22 14:43 ` [PATCH v11 07/15] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
2025-10-22 14:43 ` [PATCH v11 08/15] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
2025-10-24 13:44   ` Peter Zijlstra
2025-10-24 14:29     ` Jens Remus
2025-10-24 19:00       ` Steven Rostedt
2025-10-22 14:43 ` [PATCH v11 09/15] unwind_user: Stop when reaching an outermost frame Jens Remus
2025-10-22 14:43 ` [PATCH v11 10/15] unwind_user/sframe: Add support for outermost frame indication Jens Remus
2025-10-22 14:43 ` [PATCH v11 11/15] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
2025-10-22 14:43 ` [PATCH v11 12/15] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
2025-10-22 14:43 ` [PATCH v11 13/15] unwind_user/sframe: Show file name in debug output Jens Remus
2025-10-22 14:43 ` [PATCH v11 14/15] unwind_user/sframe: Add .sframe validation option Jens Remus
2025-10-22 14:43 ` [PATCH v11 15/15] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
2025-10-22 20:39 ` [PATCH v11 00/15] unwind_deferred: Implement sframe handling Andrew Morton
2025-10-22 21:58   ` Steven Rostedt
2025-10-23  8:09 ` Fangrui Song
2025-10-23 14:23   ` Steven Rostedt
2025-10-23 16:05 ` [RFC PATCH 1/2] fixup! unwind_user/sframe: Add support for reading .sframe contents Jens Remus
2025-10-23 16:05   ` [RFC PATCH 2/2] fixup! unwind_user/sframe: Add .sframe validation option Jens Remus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox