* [PATCH v13 01/18] unwind_user/sframe: Add support for reading .sframe headers
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 02/18] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
` (18 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for unwinding user space stacks with sframe, add basic
sframe compile infrastructure and support for reading the .sframe
section header.
sframe_add_section() reads the header and unconditionally returns an
error, so it's not very useful yet. A subsequent patch will improve
that.
Link: https://lore.kernel.org/all/f27e8463783febfa0dabb0432a3dd6be8ad98412.1737511963.git.jpoimboe@kernel.org/
[ Jens Remus: Add support for SFrame V3. Add support for PC-relative
FDE function start offset. Cleanup includes and indentation. ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- Update to SFrame V3:
- Add and use SFRAME_VERSION_3 definition.
- Add helper macros to access SFrame V3 FDE type.
- Rename SFRAME_FUNC_*() macros to SFRAME_FDE_*().
- Rename SFRAME_FDE_TYPE_PC* defines to SFRAME_FDE_PCTYPE_* and
SFRAME_FUNC_FDE_TYPE() macro to SFRAME_V3_FDE_PCTYPE().
- Reword OFFSET to DATAWORD in SFRAME_FRE_OFFSET_{COUNT|SIZE}()
macros.
- Rename version-specific SFRAME_*() macros to SFRAME_V3_*().
- Update struct sframe_fde and rename to sframe_fde_v3:
- Change field start_addr from s32 to s64 and rename to
func_start_off.
- Change field fres_num from u32 to u16.
- New field u8 info2.
- Remove u16 padding field.
- Split FDE into function descriptor entry (struct sframe_fde_v3) and
attributes (struct sframe_fde_v3).
- Rename macro parameter "data" to "info" to hint at fde/fre info
word and wrap it in parenthesis.
- Group SFRAME_* definitions so that related ones are together.
- Reword commit message (my changes).
MAINTAINERS | 1 +
arch/Kconfig | 3 +
include/linux/sframe.h | 37 +++++++++++
kernel/unwind/Makefile | 3 +-
kernel/unwind/sframe.c | 136 +++++++++++++++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 81 ++++++++++++++++++++++++
6 files changed, 260 insertions(+), 1 deletion(-)
create mode 100644 include/linux/sframe.h
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 5b11839cba9d..79cff4f4df56 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27226,6 +27226,7 @@ USERSPACE STACK UNWINDING
M: Josh Poimboeuf <jpoimboe@kernel.org>
M: Steven Rostedt <rostedt@goodmis.org>
S: Maintained
+F: include/linux/sframe.h
F: include/linux/unwind*.h
F: kernel/unwind/
diff --git a/arch/Kconfig b/arch/Kconfig
index 31220f512b16..7d5517364d8b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -486,6 +486,9 @@ config HAVE_UNWIND_USER_FP
bool
select UNWIND_USER
+config HAVE_UNWIND_USER_SFRAME
+ bool
+
config HAVE_PERF_REGS
bool
help
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
new file mode 100644
index 000000000000..0642595534f9
--- /dev/null
+++ b/include/linux/sframe.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SFRAME_H
+#define _LINUX_SFRAME_H
+
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+
+struct sframe_section {
+ unsigned long sframe_start;
+ unsigned long sframe_end;
+ unsigned long text_start;
+ unsigned long text_end;
+
+ unsigned long fdes_start;
+ unsigned long fres_start;
+ unsigned long fres_end;
+ unsigned int num_fdes;
+
+ signed char ra_off;
+ signed char fp_off;
+};
+
+extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end);
+extern int sframe_remove_section(unsigned long sframe_addr);
+
+#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ return -ENOSYS;
+}
+static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+
+#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
+
+#endif /* _LINUX_SFRAME_H */
diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile
index eae37bea54fd..146038165865 100644
--- a/kernel/unwind/Makefile
+++ b/kernel/unwind/Makefile
@@ -1 +1,2 @@
- obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_UNWIND_USER) += user.o deferred.o
+ obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME) += sframe.o
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
new file mode 100644
index 000000000000..c128c1eeaca2
--- /dev/null
+++ b/kernel/unwind/sframe.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Userspace sframe access functions
+ */
+
+#define pr_fmt(fmt) "sframe: " fmt
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <linux/string_helpers.h>
+#include <linux/sframe.h>
+#include <linux/unwind_user_types.h>
+
+#include "sframe.h"
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static void free_section(struct sframe_section *sec)
+{
+ kfree(sec);
+}
+
+static int sframe_read_header(struct sframe_section *sec)
+{
+ unsigned long header_end, fdes_start, fdes_end, fres_start, fres_end;
+ struct sframe_header shdr;
+ unsigned int num_fdes;
+
+ if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
+ dbg("header usercopy failed\n");
+ return -EFAULT;
+ }
+
+ if (shdr.preamble.magic != SFRAME_MAGIC ||
+ shdr.preamble.version != SFRAME_VERSION_3 ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
+ !(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
+ shdr.auxhdr_len) {
+ dbg("bad/unsupported sframe header\n");
+ return -EINVAL;
+ }
+
+ if (!shdr.num_fdes || !shdr.num_fres) {
+ dbg("no fde/fre entries\n");
+ return -EINVAL;
+ }
+
+ header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
+ if (header_end >= sec->sframe_end) {
+ dbg("header doesn't fit in section\n");
+ return -EINVAL;
+ }
+
+ num_fdes = shdr.num_fdes;
+ fdes_start = header_end + shdr.fdes_off;
+ fdes_end = fdes_start + (num_fdes * sizeof(struct sframe_fde_v3));
+
+ fres_start = header_end + shdr.fres_off;
+ fres_end = fres_start + shdr.fre_len;
+
+ if (fres_start < fdes_end || fres_end > sec->sframe_end) {
+ dbg("inconsistent fde/fre offsets\n");
+ return -EINVAL;
+ }
+
+ sec->num_fdes = num_fdes;
+ sec->fdes_start = fdes_start;
+ sec->fres_start = fres_start;
+ sec->fres_end = fres_end;
+
+ sec->ra_off = shdr.cfa_fixed_ra_offset;
+ sec->fp_off = shdr.cfa_fixed_fp_offset;
+
+ return 0;
+}
+
+int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
+ unsigned long text_start, unsigned long text_end)
+{
+ struct vm_area_struct *sframe_vma, *text_vma;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ int ret;
+
+ if (!sframe_start || !sframe_end || !text_start || !text_end) {
+ dbg("zero-length sframe/text address\n");
+ return -EINVAL;
+ }
+
+ scoped_guard(mmap_read_lock, mm) {
+ sframe_vma = vma_lookup(mm, sframe_start);
+ if (!sframe_vma || sframe_end > sframe_vma->vm_end) {
+ dbg("bad sframe address (0x%lx - 0x%lx)\n",
+ sframe_start, sframe_end);
+ return -EINVAL;
+ }
+
+ text_vma = vma_lookup(mm, text_start);
+ if (!text_vma ||
+ !(text_vma->vm_flags & VM_EXEC) ||
+ text_end > text_vma->vm_end) {
+ dbg("bad text address (0x%lx - 0x%lx)\n",
+ text_start, text_end);
+ return -EINVAL;
+ }
+ }
+
+ sec = kzalloc(sizeof(*sec), GFP_KERNEL);
+ if (!sec)
+ return -ENOMEM;
+
+ sec->sframe_start = sframe_start;
+ sec->sframe_end = sframe_end;
+ sec->text_start = text_start;
+ sec->text_end = text_end;
+
+ ret = sframe_read_header(sec);
+ if (ret)
+ goto err_free;
+
+ /* TODO nowhere to store it yet - just free it and return an error */
+ ret = -ENOSYS;
+
+err_free:
+ free_section(sec);
+ return ret;
+}
+
+int sframe_remove_section(unsigned long sframe_start)
+{
+ return -ENOSYS;
+}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
new file mode 100644
index 000000000000..07a2e99855f9
--- /dev/null
+++ b/kernel/unwind/sframe.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * From https://www.sourceware.org/binutils/docs/sframe-spec.html
+ */
+#ifndef _SFRAME_H
+#define _SFRAME_H
+
+#include <linux/types.h>
+
+#define SFRAME_VERSION_1 1
+#define SFRAME_VERSION_2 2
+#define SFRAME_VERSION_3 3
+#define SFRAME_MAGIC 0xdee2
+
+#define SFRAME_F_FDE_SORTED 0x1
+#define SFRAME_F_FRAME_POINTER 0x2
+#define SFRAME_F_FDE_FUNC_START_PCREL 0x4
+
+#define SFRAME_ABI_AARCH64_ENDIAN_BIG 1
+#define SFRAME_ABI_AARCH64_ENDIAN_LITTLE 2
+#define SFRAME_ABI_AMD64_ENDIAN_LITTLE 3
+
+struct sframe_preamble {
+ u16 magic;
+ u8 version;
+ u8 flags;
+} __packed;
+
+struct sframe_header {
+ struct sframe_preamble preamble;
+ u8 abi_arch;
+ s8 cfa_fixed_fp_offset;
+ s8 cfa_fixed_ra_offset;
+ u8 auxhdr_len;
+ u32 num_fdes;
+ u32 num_fres;
+ u32 fre_len;
+ u32 fdes_off;
+ u32 fres_off;
+} __packed;
+
+#define SFRAME_HEADER_SIZE(header) \
+ ((sizeof(struct sframe_header) + (header).auxhdr_len))
+
+struct sframe_fde_v3 {
+ s64 func_start_off;
+ u32 func_size;
+ u32 fres_off;
+} __packed;
+
+struct sframe_fda_v3 {
+ u16 fres_num;
+ u8 info;
+ u8 info2;
+ u8 rep_size;
+} __packed;
+
+#define SFRAME_FDE_PCTYPE_INC 0
+#define SFRAME_FDE_PCTYPE_MASK 1
+
+#define SFRAME_AARCH64_PAUTH_KEY_A 0
+#define SFRAME_AARCH64_PAUTH_KEY_B 1
+
+#define SFRAME_V3_FDE_FRE_TYPE(info) ((info) & 0xf)
+#define SFRAME_V3_FDE_PCTYPE(info) (((info) >> 4) & 0x1)
+#define SFRAME_V3_AARCH64_FDE_PAUTH_KEY(info) (((info) >> 5) & 0x1)
+
+#define SFRAME_FDE_TYPE_REGULAR 0
+
+#define SFRAME_V3_FDE_TYPE_MASK 0x0f
+#define SFRAME_V3_FDE_TYPE(info2) ((info2) & SFRAME_V3_FDE_TYPE_MASK)
+
+#define SFRAME_BASE_REG_FP 0
+#define SFRAME_BASE_REG_SP 1
+
+#define SFRAME_V3_FRE_CFA_BASE_REG_ID(info) ((info) & 0x1)
+#define SFRAME_V3_FRE_DATAWORD_COUNT(info) (((info) >> 1) & 0xf)
+#define SFRAME_V3_FRE_DATAWORD_SIZE(info) (((info) >> 5) & 0x3)
+#define SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(info) (((info) >> 7) & 0x1)
+
+#endif /* _SFRAME_H */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 02/18] unwind_user/sframe: Store .sframe section data in per-mm maple tree
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
2026-01-27 15:05 ` [PATCH v13 01/18] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 03/18] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
` (17 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Associate an .sframe section with its mm by adding it to a per-mm maple
tree which is indexed by the corresponding text address range. A single
.sframe section can be associated with multiple text ranges.
[ Jens Remus: Minor cleanups. Reword commit subject/message. ]
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: x86@kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/mmu.h | 2 +-
include/linux/mm_types.h | 3 ++
include/linux/sframe.h | 15 ++++++++++
kernel/fork.c | 10 +++++++
kernel/unwind/sframe.c | 56 ++++++++++++++++++++++++++++++++++++--
mm/init-mm.c | 2 ++
6 files changed, 84 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 0fe9c569d171..227a32899a59 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -87,7 +87,7 @@ typedef struct {
.context = { \
.ctx_id = 1, \
.lock = __MUTEX_INITIALIZER(mm.context.lock), \
- }
+ },
void leave_mm(void);
#define leave_mm leave_mm
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9f6de068295d..c764d00fa522 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1323,6 +1323,9 @@ struct mm_struct {
#ifdef CONFIG_MM_ID
mm_id_t mm_id;
#endif /* CONFIG_MM_ID */
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ struct maple_tree sframe_mt;
+#endif
} __randomize_layout;
/*
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 0642595534f9..7ea6a97ed8af 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -2,6 +2,8 @@
#ifndef _LINUX_SFRAME_H
#define _LINUX_SFRAME_H
+#include <linux/mm_types.h>
+
#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
struct sframe_section {
@@ -19,18 +21,31 @@ struct sframe_section {
signed char fp_off;
};
+#define INIT_MM_SFRAME .sframe_mt = MTREE_INIT(sframe_mt, 0),
+extern void sframe_free_mm(struct mm_struct *mm);
+
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+static inline bool current_has_sframe(void)
+{
+ struct mm_struct *mm = current->mm;
+
+ return mm && !mtree_empty(&mm->sframe_mt);
+}
+
#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
+#define INIT_MM_SFRAME
+static inline void sframe_free_mm(struct mm_struct *mm) {}
static inline int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/fork.c b/kernel/fork.c
index b1f3915d5f8e..d8a7a32f8b81 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -107,6 +107,7 @@
#include <linux/tick.h>
#include <linux/unwind_deferred.h>
#include <linux/pgalloc.h>
+#include <linux/sframe.h>
#include <linux/uaccess.h>
#include <asm/mmu_context.h>
@@ -732,6 +733,7 @@ void __mmdrop(struct mm_struct *mm)
mm_pasid_drop(mm);
mm_destroy_cid(mm);
percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS);
+ sframe_free_mm(mm);
free_mm(mm);
}
@@ -1067,6 +1069,13 @@ static void mmap_init_lock(struct mm_struct *mm)
#endif
}
+static void mm_init_sframe(struct mm_struct *mm)
+{
+#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
+ mt_init(&mm->sframe_mt);
+#endif
+}
+
static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
struct user_namespace *user_ns)
{
@@ -1095,6 +1104,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->pmd_huge_pte = NULL;
#endif
mm_init_uprobes_state(mm);
+ mm_init_sframe(mm);
hugetlb_count_init(mm);
mm_flags_clear_all(mm);
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index c128c1eeaca2..fdb2a4908563 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end)
{
+ struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
struct vm_area_struct *sframe_vma, *text_vma;
struct mm_struct *mm = current->mm;
struct sframe_section *sec;
@@ -122,15 +123,64 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
if (ret)
goto err_free;
- /* TODO nowhere to store it yet - just free it and return an error */
- ret = -ENOSYS;
+ ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
+ if (ret) {
+ dbg("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
+ goto err_free;
+ }
+
+ return 0;
err_free:
free_section(sec);
return ret;
}
+static int __sframe_remove_section(struct mm_struct *mm,
+ struct sframe_section *sec)
+{
+ if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
+ dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ return -EINVAL;
+ }
+
+ free_section(sec);
+
+ return 0;
+}
+
int sframe_remove_section(unsigned long sframe_start)
{
- return -ENOSYS;
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ unsigned long index = 0;
+ bool found = false;
+ int ret = 0;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
+ if (sec->sframe_start == sframe_start) {
+ found = true;
+ ret |= __sframe_remove_section(mm, sec);
+ }
+ }
+
+ if (!found || ret)
+ return -EINVAL;
+
+ return 0;
+}
+
+void sframe_free_mm(struct mm_struct *mm)
+{
+ struct sframe_section *sec;
+ unsigned long index = 0;
+
+ if (!mm)
+ return;
+
+ mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX)
+ free_section(sec);
+
+ mtree_destroy(&mm->sframe_mt);
}
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 4600e7605cab..b32fcf167cc2 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -11,6 +11,7 @@
#include <linux/atomic.h>
#include <linux/user_namespace.h>
#include <linux/iommu.h>
+#include <linux/sframe.h>
#include <asm/mmu.h>
#ifndef INIT_MM_CONTEXT
@@ -46,6 +47,7 @@ struct mm_struct init_mm = {
.user_ns = &init_user_ns,
.cpu_bitmap = CPU_BITS_NONE,
INIT_MM_CONTEXT(init_mm)
+ INIT_MM_SFRAME
};
void setup_initial_init_mm(void *start_code, void *end_code,
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 03/18] x86/uaccess: Add unsafe_copy_from_user() implementation
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
2026-01-27 15:05 ` [PATCH v13 01/18] unwind_user/sframe: Add support for reading .sframe headers Jens Remus
2026-01-27 15:05 ` [PATCH v13 02/18] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 04/18] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
` (16 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add an x86 implementation of unsafe_copy_from_user() similar to the
existing unsafe_copy_to_user().
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/uaccess.h | 39 +++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 367297b188c3..dfe143235967 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -598,7 +598,7 @@ _label: \
* We want the unsafe accessors to always be inlined and use
* the error labels - thus the macro games.
*/
-#define unsafe_copy_loop(dst, src, len, type, label) \
+#define unsafe_copy_to_user_loop(dst, src, len, type, label) \
while (len >= sizeof(type)) { \
unsafe_put_user(*(type *)(src),(type __user *)(dst),label); \
dst += sizeof(type); \
@@ -606,15 +606,34 @@ _label: \
len -= sizeof(type); \
}
-#define unsafe_copy_to_user(_dst,_src,_len,label) \
-do { \
- char __user *__ucu_dst = (_dst); \
- const char *__ucu_src = (_src); \
- size_t __ucu_len = (_len); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label); \
- unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label); \
+#define unsafe_copy_to_user(_dst, _src, _len, label) \
+do { \
+ void __user *__dst = (_dst); \
+ const void *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_to_user_loop(__dst, __src, __len, u8, label); \
+} while (0)
+
+#define unsafe_copy_from_user_loop(dst, src, len, type, label) \
+ while (len >= sizeof(type)) { \
+ unsafe_get_user(*(type *)(dst), (type __user *)(src), label); \
+ dst += sizeof(type); \
+ src += sizeof(type); \
+ len -= sizeof(type); \
+ }
+
+#define unsafe_copy_from_user(_dst, _src, _len, label) \
+do { \
+ void *__dst = (_dst); \
+ void __user *__src = (_src); \
+ size_t __len = (_len); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u64, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u32, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u16, label); \
+ unsafe_copy_from_user_loop(__dst, __src, __len, u8, label); \
} while (0)
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 04/18] unwind_user/sframe: Add support for reading .sframe contents
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (2 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 03/18] x86/uaccess: Add unsafe_copy_from_user() implementation Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 05/18] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
` (15 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
In preparation for using sframe to unwind user space stacks, add an
sframe_find() interface for finding the sframe information associated
with a given text address.
For performance, use user_read_access_begin() and the corresponding
unsafe_*() accessors. Note that use of pr_debug() in uaccess-enabled
regions would break noinstr validation, so there aren't any debug
messages yet. That will be added in a subsequent commit.
Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
[ Jens Remus: Add initial support for SFrame V3 (limited to regular
FDEs). Add support for PC-relative FDE function start offset. Simplify
logic by using an internal FDE representation. Rename struct sframe_fre
to sframe_fre_internal to align with struct sframe_fde_internal.
Cleanup includes. Fix checkpatch errors "spaces required around that
':'". ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- Update to SFrame V3:
- Adjust to SFRAME_V3_*() macros and macro/define renames.
- Adjust to struct sframe_fde_v3 rename.
- Adjust to s64 FDE function start offset.
- Rename local variables fde_type to fde_pctype.
- Add and maintain struct sframe_fde_internal field u8 info2.
- Adjust to FDE split into function descriptor entry
(struct sframe_fde_v3) and attributes (struct sframe_fde_v3).
- Rename offset_count/offset_size to dataword_count/dataword_count.
- Limit __read_fre() to SFrame V3 regular FDEs (FDE_TYPE_REGULAR). A
subsequent patch will add support for flexible FDEs (FDE_TYPE_FLEX).
- Rename struct sframe_fde_internal field func_start_addr to func_addr.
- Add support u64/s64 in UNSAFE_GET_USER_INC() for s64 FDE function
start offset.
- Reduce indentation of assignments to fre.
- Reword commit message (my changes).
include/linux/sframe.h | 6 +
kernel/unwind/sframe.c | 346 ++++++++++++++++++++++++++++++++++-
kernel/unwind/sframe_debug.h | 35 ++++
3 files changed, 383 insertions(+), 4 deletions(-)
create mode 100644 kernel/unwind/sframe_debug.h
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 7ea6a97ed8af..9a72209696f9 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -3,10 +3,14 @@
#define _LINUX_SFRAME_H
#include <linux/mm_types.h>
+#include <linux/srcu.h>
+#include <linux/unwind_user_types.h>
#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
struct sframe_section {
+ struct rcu_head rcu;
+
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
@@ -27,6 +31,7 @@ extern void sframe_free_mm(struct mm_struct *mm);
extern int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
unsigned long text_start, unsigned long text_end);
extern int sframe_remove_section(unsigned long sframe_addr);
+extern int sframe_find(unsigned long ip, struct unwind_user_frame *frame);
static inline bool current_has_sframe(void)
{
@@ -45,6 +50,7 @@ static inline int sframe_add_section(unsigned long sframe_start, unsigned long s
return -ENOSYS;
}
static inline int sframe_remove_section(unsigned long sframe_addr) { return -ENOSYS; }
+static inline int sframe_find(unsigned long ip, struct unwind_user_frame *frame) { return -ENOSYS; }
static inline bool current_has_sframe(void) { return false; }
#endif /* CONFIG_HAVE_UNWIND_USER_SFRAME */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index fdb2a4908563..9b3779924395 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -15,9 +15,338 @@
#include <linux/unwind_user_types.h>
#include "sframe.h"
+#include "sframe_debug.h"
+
+struct sframe_fde_internal {
+ unsigned long func_addr;
+ u32 func_size;
+ u32 fda_off;
+ u32 fres_off;
+ u32 fres_num;
+ u8 info;
+ u8 info2;
+ u8 rep_size;
+};
+
+struct sframe_fre_internal {
+ unsigned int size;
+ u32 ip_off;
+ s32 cfa_off;
+ s32 ra_off;
+ s32 fp_off;
+ u8 info;
+};
+
+DEFINE_STATIC_SRCU(sframe_srcu);
+
+static __always_inline unsigned char fre_type_to_size(unsigned char fre_type)
+{
+ if (fre_type > 2)
+ return 0;
+ return 1 << fre_type;
+}
+
+static __always_inline unsigned char dataword_size_enum_to_size(unsigned char dataword_size)
+{
+ if (dataword_size > 2)
+ return 0;
+ return 1 << dataword_size;
+}
+
+static __always_inline int __read_fde(struct sframe_section *sec,
+ unsigned int fde_num,
+ struct sframe_fde_internal *fde)
+{
+ unsigned long fde_addr, fda_addr, func_addr;
+ struct sframe_fde_v3 _fde;
+ struct sframe_fda_v3 _fda;
+
+ fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde_v3));
+ unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
+ sizeof(struct sframe_fde_v3), Efault);
+
+ func_addr = fde_addr + _fde.func_start_off;
+ if (func_addr < sec->text_start || func_addr > sec->text_end)
+ return -EINVAL;
+
+ fda_addr = sec->fres_start + _fde.fres_off;
+ if (fda_addr + sizeof(struct sframe_fda_v3) > sec->fres_end)
+ return -EINVAL;
+ unsafe_copy_from_user(&_fda, (void __user *)fda_addr,
+ sizeof(struct sframe_fda_v3), Efault);
+
+ fde->func_addr = func_addr;
+ fde->func_size = _fde.func_size;
+ fde->fda_off = _fde.fres_off;
+ fde->fres_off = _fde.fres_off + sizeof(struct sframe_fda_v3);
+ fde->fres_num = _fda.fres_num;
+ fde->info = _fda.info;
+ fde->info2 = _fda.info2;
+ fde->rep_size = _fda.rep_size;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fde(struct sframe_section *sec,
+ unsigned long ip,
+ struct sframe_fde_internal *fde)
+{
+ unsigned long func_addr_low = 0, func_addr_high = ULONG_MAX;
+ struct sframe_fde_v3 __user *first, *low, *high, *found = NULL;
+ int ret;
+
+ first = (void __user *)sec->fdes_start;
+ low = first;
+ high = first + sec->num_fdes - 1;
+
+ while (low <= high) {
+ struct sframe_fde_v3 __user *mid;
+ s64 func_off;
+ unsigned long func_addr;
+
+ mid = low + ((high - low) / 2);
+
+ unsafe_get_user(func_off, (s64 __user *)mid, Efault);
+ func_addr = (unsigned long)mid + func_off;
+
+ if (ip >= func_addr) {
+ if (func_addr < func_addr_low)
+ return -EFAULT;
+
+ func_addr_low = func_addr;
+
+ found = mid;
+ low = mid + 1;
+ } else {
+ if (func_addr > func_addr_high)
+ return -EFAULT;
+
+ func_addr_high = func_addr;
+
+ high = mid - 1;
+ }
+ }
+
+ if (!found)
+ return -EINVAL;
+
+ ret = __read_fde(sec, found - first, fde);
+ if (ret)
+ return ret;
+
+ /* make sure it's not in a gap */
+ if (ip < fde->func_addr || ip >= fde->func_addr + fde->func_size)
+ return -EINVAL;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+#define ____UNSAFE_GET_USER_INC(to, from, type, label) \
+({ \
+ type __to; \
+ unsafe_get_user(__to, (type __user *)from, label); \
+ from += sizeof(__to); \
+ to = __to; \
+})
+
+#define __UNSAFE_GET_USER_INC(to, from, size, label, u_or_s) \
+({ \
+ switch (size) { \
+ case 1: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##8, label); \
+ break; \
+ case 2: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##16, label); \
+ break; \
+ case 4: \
+ ____UNSAFE_GET_USER_INC(to, from, u_or_s##32, label); \
+ break; \
+ default: \
+ return -EFAULT; \
+ } \
+})
+
+#define UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, u)
+
+#define UNSAFE_GET_USER_SIGNED_INC(to, from, size, label) \
+ __UNSAFE_GET_USER_INC(to, from, size, label, s)
+
+#define UNSAFE_GET_USER_INC(to, from, size, label) \
+ _Generic(to, \
+ u8 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u16 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u32 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ u64 : UNSAFE_GET_USER_UNSIGNED_INC(to, from, size, label), \
+ s8 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s16 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s32 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
+ s64 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+
+static __always_inline int __read_fre(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long fre_addr,
+ struct sframe_fre_internal *fre)
+{
+ unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
+ unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
+ unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(fde->info);
+ unsigned char dataword_count, dataword_size;
+ s32 cfa_off, ra_off, fp_off;
+ unsigned long cur = fre_addr;
+ unsigned char addr_size;
+ u32 ip_off;
+ u8 info;
+
+ addr_size = fre_type_to_size(fre_type);
+ if (!addr_size)
+ return -EFAULT;
+
+ if (fre_addr + addr_size + 1 > sec->fres_end)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
+ if (fde_pctype == SFRAME_FDE_PCTYPE_INC && ip_off > fde->func_size)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(info, cur, 1, Efault);
+ dataword_count = SFRAME_V3_FRE_DATAWORD_COUNT(info);
+ dataword_size = dataword_size_enum_to_size(SFRAME_V3_FRE_DATAWORD_SIZE(info));
+ if (!dataword_count || !dataword_size)
+ return -EFAULT;
+
+ if (cur + (dataword_count * dataword_size) > sec->fres_end)
+ return -EFAULT;
+
+ /* TODO: Support for flexible FDEs not implemented yet. */
+ if (fde_type != SFRAME_FDE_TYPE_REGULAR)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
+ dataword_count--;
+
+ ra_off = sec->ra_off;
+ if (!ra_off) {
+ if (!dataword_count--)
+ return -EFAULT;
+
+ UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
+ }
+
+ fp_off = sec->fp_off;
+ if (!fp_off && dataword_count) {
+ dataword_count--;
+ UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ }
+
+ if (dataword_count)
+ return -EFAULT;
+
+ fre->size = addr_size + 1 + (dataword_count * dataword_size);
+ fre->ip_off = ip_off;
+ fre->cfa_off = cfa_off;
+ fre->ra_off = ra_off;
+ fre->fp_off = fp_off;
+ fre->info = info;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int __find_fre(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long ip,
+ struct unwind_user_frame *frame)
+{
+ unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
+ struct sframe_fre_internal *fre, *prev_fre = NULL;
+ struct sframe_fre_internal fres[2];
+ unsigned long fre_addr;
+ bool which = false;
+ unsigned int i;
+ u32 ip_off;
+
+ ip_off = ip - fde->func_addr;
+
+ if (fde_pctype == SFRAME_FDE_PCTYPE_MASK)
+ ip_off %= fde->rep_size;
+
+ fre_addr = sec->fres_start + fde->fres_off;
+
+ for (i = 0; i < fde->fres_num; i++) {
+ int ret;
+
+ /*
+ * Alternate between the two fre_addr[] entries for 'fre' and
+ * 'prev_fre'.
+ */
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ if (ret)
+ return ret;
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off)
+ return -EFAULT;
+
+ if (fre->ip_off > ip_off)
+ break;
+
+ prev_fre = fre;
+ }
+
+ if (!prev_fre)
+ return -EINVAL;
+ fre = prev_fre;
+
+ frame->cfa_off = fre->cfa_off;
+ frame->ra_off = fre->ra_off;
+ frame->fp_off = fre->fp_off;
+ frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+
+ return 0;
+}
+
+int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
+{
+ struct mm_struct *mm = current->mm;
+ struct sframe_section *sec;
+ struct sframe_fde_internal fde;
+ int ret;
+
+ if (!mm)
+ return -EINVAL;
+
+ guard(srcu)(&sframe_srcu);
+
+ sec = mtree_load(&mm->sframe_mt, ip);
+ if (!sec)
+ return -EINVAL;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
-#define dbg(fmt, ...) \
- pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+ ret = __find_fde(sec, ip, &fde);
+ if (ret)
+ goto end;
+
+ ret = __find_fre(sec, &fde, ip, frame);
+end:
+ user_read_access_end();
+ return ret;
+}
static void free_section(struct sframe_section *sec)
{
@@ -120,8 +449,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_end = text_end;
ret = sframe_read_header(sec);
- if (ret)
+ if (ret) {
+ dbg_print_header(sec);
goto err_free;
+ }
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
@@ -137,6 +468,13 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
return ret;
}
+static void sframe_free_srcu(struct rcu_head *rcu)
+{
+ struct sframe_section *sec = container_of(rcu, struct sframe_section, rcu);
+
+ free_section(sec);
+}
+
static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
@@ -145,7 +483,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
return -EINVAL;
}
- free_section(sec);
+ call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
return 0;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
new file mode 100644
index 000000000000..36352124cde8
--- /dev/null
+++ b/kernel/unwind/sframe_debug.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SFRAME_DEBUG_H
+#define _SFRAME_DEBUG_H
+
+#include <linux/sframe.h>
+#include "sframe.h"
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+
+#define dbg(fmt, ...) \
+ pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+
+static __always_inline void dbg_print_header(struct sframe_section *sec)
+{
+ unsigned long fdes_end;
+
+ fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde_v3));
+
+ dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+#else /* !CONFIG_DYNAMIC_DEBUG */
+
+#define dbg(args...) no_printk(args)
+
+static inline void dbg_print_header(struct sframe_section *sec) {}
+
+#endif /* !CONFIG_DYNAMIC_DEBUG */
+
+#endif /* _SFRAME_DEBUG_H */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 05/18] unwind_user/sframe: Detect .sframe sections in executables
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (3 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 04/18] unwind_user/sframe: Add support for reading .sframe contents Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 06/18] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
` (14 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
When loading an ELF executable, automatically detect an .sframe section
and associate it with the mm_struct.
[ Jens Remus: Fix checkpatch warning "braces {} are not necessary for
single statement blocks". ]
Cc: linux-mm@kvack.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
fs/binfmt_elf.c | 48 +++++++++++++++++++++++++++++++++++++---
include/uapi/linux/elf.h | 1 +
2 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 3eb734c192e9..fc6ecb4d239e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -47,6 +47,7 @@
#include <linux/dax.h>
#include <linux/uaccess.h>
#include <uapi/linux/rseq.h>
+#include <linux/sframe.h>
#include <asm/param.h>
#include <asm/page.h>
@@ -637,6 +638,21 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
}
+static void elf_add_sframe(struct elf_phdr *text, struct elf_phdr *sframe,
+ unsigned long base_addr)
+{
+ unsigned long sframe_start, sframe_end, text_start, text_end;
+
+ sframe_start = base_addr + sframe->p_vaddr;
+ sframe_end = sframe_start + sframe->p_memsz;
+
+ text_start = base_addr + text->p_vaddr;
+ text_end = text_start + text->p_memsz;
+
+ /* Ignore return value, sframe section isn't critical */
+ sframe_add_section(sframe_start, sframe_end, text_start, text_end);
+}
+
/* This is much more generalized than the library routine read function,
so we keep this separate. Technically the library read function
is only provided so that we can read a.out libraries that have
@@ -647,7 +663,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
unsigned long no_base, struct elf_phdr *interp_elf_phdata,
struct arch_elf_state *arch_state)
{
- struct elf_phdr *eppnt;
+ struct elf_phdr *eppnt, *sframe_phdr = NULL;
unsigned long load_addr = 0;
int load_addr_set = 0;
unsigned long error = ~0UL;
@@ -673,7 +689,8 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
eppnt = interp_elf_phdata;
for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
- if (eppnt->p_type == PT_LOAD) {
+ switch (eppnt->p_type) {
+ case PT_LOAD: {
int elf_type = MAP_PRIVATE;
int elf_prot = make_prot(eppnt->p_flags, arch_state,
true, true);
@@ -712,6 +729,19 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
error = -ENOMEM;
goto out;
}
+ break;
+ }
+ case PT_GNU_SFRAME:
+ sframe_phdr = eppnt;
+ break;
+ }
+ }
+
+ if (sframe_phdr) {
+ eppnt = interp_elf_phdata;
+ for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
+ if (eppnt->p_flags & PF_X)
+ elf_add_sframe(eppnt, sframe_phdr, load_addr);
}
}
@@ -836,7 +866,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
int first_pt_load = 1;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
- struct elf_phdr *elf_property_phdata = NULL;
+ struct elf_phdr *elf_property_phdata = NULL, *sframe_phdr = NULL;
unsigned long elf_brk;
bool brk_moved = false;
int retval, i;
@@ -945,6 +975,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
executable_stack = EXSTACK_DISABLE_X;
break;
+ case PT_GNU_SFRAME:
+ sframe_phdr = elf_ppnt;
+ break;
+
case PT_LOPROC ... PT_HIPROC:
retval = arch_elf_pt_proc(elf_ex, elf_ppnt,
bprm->file, false,
@@ -1242,6 +1276,14 @@ static int load_elf_binary(struct linux_binprm *bprm)
elf_brk = k;
}
+ if (sframe_phdr) {
+ for (i = 0, elf_ppnt = elf_phdata;
+ i < elf_ex->e_phnum; i++, elf_ppnt++) {
+ if ((elf_ppnt->p_flags & PF_X))
+ elf_add_sframe(elf_ppnt, sframe_phdr, load_bias);
+ }
+ }
+
e_entry = elf_ex->e_entry + load_bias;
phdr_addr += load_bias;
elf_brk += load_bias;
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 819ded2d39de..92c16c94fca8 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -41,6 +41,7 @@ typedef __u16 Elf64_Versym;
#define PT_GNU_STACK (PT_LOOS + 0x474e551)
#define PT_GNU_RELRO (PT_LOOS + 0x474e552)
#define PT_GNU_PROPERTY (PT_LOOS + 0x474e553)
+#define PT_GNU_SFRAME (PT_LOOS + 0x474e554)
/* ARM MTE memory tag segment type */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 06/18] unwind_user/sframe: Wire up unwind_user to sframe
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (4 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 05/18] unwind_user/sframe: Detect .sframe sections in executables Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 07/18] unwind_user: Stop when reaching an outermost frame Jens Remus
` (13 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Now that the sframe infrastructure is fully in place, make it work by
hooking it up to the unwind_user interface.
[ Jens Remus: Remove unused pt_regs from unwind_user_next_common() and
its callers. Simplify unwind_user_next_sframe(). ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/Kconfig | 1 +
include/linux/unwind_user_types.h | 4 +++-
kernel/unwind/user.c | 23 +++++++++++++++++++++++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 7d5517364d8b..2ece3df821b5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -488,6 +488,7 @@ config HAVE_UNWIND_USER_FP
config HAVE_UNWIND_USER_SFRAME
bool
+ select UNWIND_USER
config HAVE_PERF_REGS
bool
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 412729a269bc..43e4b160883f 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -9,7 +9,8 @@
* available.
*/
enum unwind_user_type_bits {
- UNWIND_USER_TYPE_FP_BIT = 0,
+ UNWIND_USER_TYPE_SFRAME_BIT = 0,
+ UNWIND_USER_TYPE_FP_BIT = 1,
NR_UNWIND_USER_TYPE_BITS,
};
@@ -17,6 +18,7 @@ enum unwind_user_type_bits {
enum unwind_user_type {
/* Type "none" for the start of stack walk iteration. */
UNWIND_USER_TYPE_NONE = 0,
+ UNWIND_USER_TYPE_SFRAME = BIT(UNWIND_USER_TYPE_SFRAME_BIT),
UNWIND_USER_TYPE_FP = BIT(UNWIND_USER_TYPE_FP_BIT),
};
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 90ab3c1a205e..1fb272419733 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -7,6 +7,7 @@
#include <linux/sched/task_stack.h>
#include <linux/unwind_user.h>
#include <linux/uaccess.h>
+#include <linux/sframe.h>
#define for_each_user_frame(state) \
for (unwind_user_start(state); !(state)->done; unwind_user_next(state))
@@ -82,6 +83,16 @@ static int unwind_user_next_fp(struct unwind_user_state *state)
return unwind_user_next_common(state, &fp_frame);
}
+static int unwind_user_next_sframe(struct unwind_user_state *state)
+{
+ struct unwind_user_frame frame;
+
+ /* sframe expects the frame to be local storage */
+ if (sframe_find(state->ip, &frame))
+ return -ENOENT;
+ return unwind_user_next_common(state, &frame);
+}
+
static int unwind_user_next(struct unwind_user_state *state)
{
unsigned long iter_mask = state->available_types;
@@ -95,6 +106,16 @@ static int unwind_user_next(struct unwind_user_state *state)
state->current_type = type;
switch (type) {
+ case UNWIND_USER_TYPE_SFRAME:
+ switch (unwind_user_next_sframe(state)) {
+ case 0:
+ return 0;
+ case -ENOENT:
+ continue; /* Try next method. */
+ default:
+ state->done = true;
+ }
+ break;
case UNWIND_USER_TYPE_FP:
if (!unwind_user_next_fp(state))
return 0;
@@ -123,6 +144,8 @@ static int unwind_user_start(struct unwind_user_state *state)
return -EINVAL;
}
+ if (current_has_sframe())
+ state->available_types |= UNWIND_USER_TYPE_SFRAME;
if (IS_ENABLED(CONFIG_HAVE_UNWIND_USER_FP))
state->available_types |= UNWIND_USER_TYPE_FP;
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 07/18] unwind_user: Stop when reaching an outermost frame
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (5 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 06/18] unwind_user/sframe: Wire up unwind_user to sframe Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 08/18] unwind_user/sframe: Add support for outermost frame indication Jens Remus
` (12 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
Add an indication for an outermost frame to the unwind user frame
structure and stop unwinding when reaching an outermost frame.
This will be used by unwind user sframe, as SFrame may represent an
undefined return address as indication for an outermost frame.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
arch/x86/include/asm/unwind_user.h | 6 ++++--
include/linux/unwind_user_types.h | 1 +
kernel/unwind/user.c | 6 ++++++
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 6e469044e4de..2dfb5ef11e36 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -23,13 +23,15 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
.cfa_off = 2*(ws), \
.ra_off = -1*(ws), \
.fp_off = -2*(ws), \
- .use_fp = true,
+ .use_fp = true, \
+ .outermost = false,
#define ARCH_INIT_USER_FP_ENTRY_FRAME(ws) \
.cfa_off = 1*(ws), \
.ra_off = -1*(ws), \
.fp_off = 0, \
- .use_fp = false,
+ .use_fp = false, \
+ .outermost = false,
static inline bool unwind_user_at_function_start(struct pt_regs *regs)
{
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 43e4b160883f..616cc5ee4586 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -32,6 +32,7 @@ struct unwind_user_frame {
s32 ra_off;
s32 fp_off;
bool use_fp;
+ bool outermost;
};
struct unwind_user_state {
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 1fb272419733..fdb1001e3750 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -32,6 +32,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
{
unsigned long cfa, fp, ra;
+ /* Stop unwinding when reaching an outermost frame. */
+ if (frame->outermost) {
+ state->done = true;
+ return 0;
+ }
+
/* Get the Canonical Frame Address (CFA) */
if (frame->use_fp) {
if (state->fp < state->sp)
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 08/18] unwind_user/sframe: Add support for outermost frame indication
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (6 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 07/18] unwind_user: Stop when reaching an outermost frame Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 09/18] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
` (11 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
SFrame may represent an undefined return address (RA) as SFrame FRE
without any offsets as indication for an outermost frame.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- Use SFRAME_V3_FRE_RA_UNDEFINED_P() instead of struct
sframe_fre_internal.ra_undefined field.
- Reduce indentation of assignments.
kernel/unwind/sframe.c | 15 ++++++++++++++-
kernel/unwind/sframe.h | 1 +
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 9b3779924395..6004b43aac34 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -218,7 +218,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
UNSAFE_GET_USER_INC(info, cur, 1, Efault);
dataword_count = SFRAME_V3_FRE_DATAWORD_COUNT(info);
dataword_size = dataword_size_enum_to_size(SFRAME_V3_FRE_DATAWORD_SIZE(info));
- if (!dataword_count || !dataword_size)
+ if (!dataword_size)
return -EFAULT;
if (cur + (dataword_count * dataword_size) > sec->fres_end)
@@ -228,6 +228,17 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (fde_type != SFRAME_FDE_TYPE_REGULAR)
return -EFAULT;
+ if (!dataword_count) {
+ /*
+ * A FRE without data words indicates RA undefined /
+ * outermost frame.
+ */
+ cfa_off = 0;
+ ra_off = 0;
+ fp_off = 0;
+ goto done;
+ }
+
UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
dataword_count--;
@@ -248,6 +259,7 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (dataword_count)
return -EFAULT;
+done:
fre->size = addr_size + 1 + (dataword_count * dataword_size);
fre->ip_off = ip_off;
fre->cfa_off = cfa_off;
@@ -314,6 +326,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
frame->ra_off = fre->ra_off;
frame->fp_off = fre->fp_off;
frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
+ frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
return 0;
}
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
index 07a2e99855f9..3fcc15534e5a 100644
--- a/kernel/unwind/sframe.h
+++ b/kernel/unwind/sframe.h
@@ -77,5 +77,6 @@ struct sframe_fda_v3 {
#define SFRAME_V3_FRE_DATAWORD_COUNT(info) (((info) >> 1) & 0xf)
#define SFRAME_V3_FRE_DATAWORD_SIZE(info) (((info) >> 5) & 0x3)
#define SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(info) (((info) >> 7) & 0x1)
+#define SFRAME_V3_FRE_RA_UNDEFINED_P(info) (SFRAME_V3_FRE_DATAWORD_COUNT(info) == 0)
#endif /* _SFRAME_H */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 09/18] unwind_user/sframe: Remove .sframe section on detected corruption
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (7 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 08/18] unwind_user/sframe: Add support for outermost frame indication Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 10/18] unwind_user/sframe: Show file name in debug output Jens Remus
` (10 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
To avoid continued attempted use of a bad .sframe section, remove it
on demand when the first sign of corruption is detected.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
kernel/unwind/sframe.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 6004b43aac34..9a63f8828593 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -358,6 +358,10 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
ret = __find_fre(sec, &fde, ip, frame);
end:
user_read_access_end();
+
+ if (ret == -EFAULT)
+ WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+
return ret;
}
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 10/18] unwind_user/sframe: Show file name in debug output
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (8 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 09/18] unwind_user/sframe: Remove .sframe section on detected corruption Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 11/18] unwind_user/sframe: Add .sframe validation option Jens Remus
` (9 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
When debugging sframe issues, the error messages aren't all that helpful
without knowing what file a corresponding .sframe section belongs to.
Prefix debug output strings with the file name.
[ Jens Remus: Fix checkpatch error "space prohibited before that close
parenthesis ')'". ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
include/linux/sframe.h | 4 +++-
kernel/unwind/sframe.c | 23 ++++++++++--------
kernel/unwind/sframe_debug.h | 45 +++++++++++++++++++++++++++++++-----
3 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/include/linux/sframe.h b/include/linux/sframe.h
index 9a72209696f9..b79c5ec09229 100644
--- a/include/linux/sframe.h
+++ b/include/linux/sframe.h
@@ -10,7 +10,9 @@
struct sframe_section {
struct rcu_head rcu;
-
+#ifdef CONFIG_DYNAMIC_DEBUG
+ const char *filename;
+#endif
unsigned long sframe_start;
unsigned long sframe_end;
unsigned long text_start;
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 9a63f8828593..3276aa6d0c77 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -359,14 +359,17 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
end:
user_read_access_end();
- if (ret == -EFAULT)
+ if (ret == -EFAULT) {
+ dbg_sec("removing bad .sframe section\n");
WARN_ON_ONCE(sframe_remove_section(sec->sframe_start));
+ }
return ret;
}
static void free_section(struct sframe_section *sec)
{
+ dbg_free(sec);
kfree(sec);
}
@@ -377,7 +380,7 @@ static int sframe_read_header(struct sframe_section *sec)
unsigned int num_fdes;
if (copy_from_user(&shdr, (void __user *)sec->sframe_start, sizeof(shdr))) {
- dbg("header usercopy failed\n");
+ dbg_sec("header usercopy failed\n");
return -EFAULT;
}
@@ -386,18 +389,18 @@ static int sframe_read_header(struct sframe_section *sec)
!(shdr.preamble.flags & SFRAME_F_FDE_SORTED) ||
!(shdr.preamble.flags & SFRAME_F_FDE_FUNC_START_PCREL) ||
shdr.auxhdr_len) {
- dbg("bad/unsupported sframe header\n");
+ dbg_sec("bad/unsupported sframe header\n");
return -EINVAL;
}
if (!shdr.num_fdes || !shdr.num_fres) {
- dbg("no fde/fre entries\n");
+ dbg_sec("no fde/fre entries\n");
return -EINVAL;
}
header_end = sec->sframe_start + SFRAME_HEADER_SIZE(shdr);
if (header_end >= sec->sframe_end) {
- dbg("header doesn't fit in section\n");
+ dbg_sec("header doesn't fit in section\n");
return -EINVAL;
}
@@ -409,7 +412,7 @@ static int sframe_read_header(struct sframe_section *sec)
fres_end = fres_start + shdr.fre_len;
if (fres_start < fdes_end || fres_end > sec->sframe_end) {
- dbg("inconsistent fde/fre offsets\n");
+ dbg_sec("inconsistent fde/fre offsets\n");
return -EINVAL;
}
@@ -465,6 +468,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
sec->text_start = text_start;
sec->text_end = text_end;
+ dbg_init(sec);
+
ret = sframe_read_header(sec);
if (ret) {
dbg_print_header(sec);
@@ -473,8 +478,8 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
- dbg("mtree_insert_range failed: text=%lx-%lx\n",
- sec->text_start, sec->text_end);
+ dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
+ sec->text_start, sec->text_end);
goto err_free;
}
@@ -496,7 +501,7 @@ static int __sframe_remove_section(struct mm_struct *mm,
struct sframe_section *sec)
{
if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
- dbg("mtree_erase failed: text=%lx\n", sec->text_start);
+ dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
return -EINVAL;
}
diff --git a/kernel/unwind/sframe_debug.h b/kernel/unwind/sframe_debug.h
index 36352124cde8..e568be4172b1 100644
--- a/kernel/unwind/sframe_debug.h
+++ b/kernel/unwind/sframe_debug.h
@@ -10,26 +10,59 @@
#define dbg(fmt, ...) \
pr_debug("%s (%d): " fmt, current->comm, current->pid, ##__VA_ARGS__)
+#define dbg_sec(fmt, ...) \
+ dbg("%s: " fmt, sec->filename, ##__VA_ARGS__)
+
static __always_inline void dbg_print_header(struct sframe_section *sec)
{
unsigned long fdes_end;
fdes_end = sec->fdes_start + (sec->num_fdes * sizeof(struct sframe_fde_v3));
- dbg("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
- "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
- "ra_off:%d fp_off:%d\n",
- sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
- sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
- sec->ra_off, sec->fp_off);
+ dbg_sec("SEC: sframe:0x%lx-0x%lx text:0x%lx-0x%lx "
+ "fdes:0x%lx-0x%lx fres:0x%lx-0x%lx "
+ "ra_off:%d fp_off:%d\n",
+ sec->sframe_start, sec->sframe_end, sec->text_start, sec->text_end,
+ sec->fdes_start, fdes_end, sec->fres_start, sec->fres_end,
+ sec->ra_off, sec->fp_off);
+}
+
+static inline void dbg_init(struct sframe_section *sec)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+
+ guard(mmap_read_lock)(mm);
+ vma = vma_lookup(mm, sec->sframe_start);
+ if (!vma)
+ sec->filename = kstrdup("(vma gone???)", GFP_KERNEL);
+ else if (vma->vm_file)
+ sec->filename = kstrdup_quotable_file(vma->vm_file, GFP_KERNEL);
+ else if (vma->vm_ops && vma->vm_ops->name)
+ sec->filename = kstrdup(vma->vm_ops->name(vma), GFP_KERNEL);
+ else if (arch_vma_name(vma))
+ sec->filename = kstrdup(arch_vma_name(vma), GFP_KERNEL);
+ else if (!vma->vm_mm)
+ sec->filename = kstrdup("(vdso)", GFP_KERNEL);
+ else
+ sec->filename = kstrdup("(anonymous)", GFP_KERNEL);
+}
+
+static inline void dbg_free(struct sframe_section *sec)
+{
+ kfree(sec->filename);
}
#else /* !CONFIG_DYNAMIC_DEBUG */
#define dbg(args...) no_printk(args)
+#define dbg_sec(args...) no_printk(args)
static inline void dbg_print_header(struct sframe_section *sec) {}
+static inline void dbg_init(struct sframe_section *sec) {}
+static inline void dbg_free(struct sframe_section *sec) {}
+
#endif /* !CONFIG_DYNAMIC_DEBUG */
#endif /* _SFRAME_DEBUG_H */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 11/18] unwind_user/sframe: Add .sframe validation option
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (9 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 10/18] unwind_user/sframe: Show file name in debug output Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 12/18] unwind_user: Enable archs that pass RA in a register Jens Remus
` (8 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
Add a debug feature to validate all .sframe sections when first loading
the file rather than on demand.
[ Jens Remus: Add support for SFrame V3. Add support for PC-relative
FDE function start offset. Adjust to rename of struct sframe_fre to
sframe_fre_internal. Use %#x/%#lx format specifiers. ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- Update to SFrame V3:
- Print struct sframe_fde_internal fields fda_off and info2 in debug
message.
- Adjust to rename of struct sframe_fde_internal field func_start_addr
to func_addr.
- Use format strings "%#x" and "%#lx" instead of "0x%x" and "0x%lx".
- Reword commit message (my changes).
arch/Kconfig | 19 ++++++++
kernel/unwind/sframe.c | 99 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 118 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 2ece3df821b5..dcb553136e0c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -490,6 +490,25 @@ config HAVE_UNWIND_USER_SFRAME
bool
select UNWIND_USER
+config SFRAME_VALIDATION
+ bool "Enable .sframe section debugging"
+ depends on HAVE_UNWIND_USER_SFRAME
+ depends on DYNAMIC_DEBUG
+ help
+ When adding an .sframe section for a task, validate the entire
+ section immediately rather than on demand.
+
+ This is a debug feature which is helpful for rooting out .sframe
+ section issues. If the .sframe section is corrupt, it will fail to
+ load immediately, with more information provided in dynamic printks.
+
+ This has a significant page cache footprint due to its reading of the
+ entire .sframe section for every loaded executable and shared
+ library. Also, it's done for all processes, even those which don't
+ get stack traced by the kernel. Not recommended for general use.
+
+ If unsure, say N.
+
config HAVE_PERF_REGS
bool
help
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 3276aa6d0c77..cf353bdbc907 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -367,6 +367,101 @@ int sframe_find(unsigned long ip, struct unwind_user_frame *frame)
return ret;
}
+#ifdef CONFIG_SFRAME_VALIDATION
+
+static int safe_read_fde(struct sframe_section *sec,
+ unsigned int fde_num, struct sframe_fde_internal *fde)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fde(sec, fde_num, fde);
+ user_read_access_end();
+ return ret;
+}
+
+static int safe_read_fre(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long fre_addr,
+ struct sframe_fre_internal *fre)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fre(sec, fde, fre_addr, fre);
+ user_read_access_end();
+ return ret;
+}
+
+static int sframe_validate_section(struct sframe_section *sec)
+{
+ unsigned long prev_ip = 0;
+ unsigned int i;
+
+ for (i = 0; i < sec->num_fdes; i++) {
+ struct sframe_fre_internal *fre, *prev_fre = NULL;
+ unsigned long ip, fre_addr;
+ struct sframe_fde_internal fde;
+ struct sframe_fre_internal fres[2];
+ bool which = false;
+ unsigned int j;
+ int ret;
+
+ ret = safe_read_fde(sec, i, &fde);
+ if (ret)
+ return ret;
+
+ ip = fde.func_addr;
+ if (ip <= prev_ip) {
+ dbg_sec("fde %u not sorted\n", i);
+ return -EFAULT;
+ }
+ prev_ip = ip;
+
+ fre_addr = sec->fres_start + fde.fres_off;
+ for (j = 0; j < fde.fres_num; j++) {
+ int ret;
+
+ fre = which ? fres : fres + 1;
+ which = !which;
+
+ ret = safe_read_fre(sec, &fde, fre_addr, fre);
+ if (ret) {
+ dbg_sec("fde %u: __read_fre(%u) failed\n", i, j);
+ dbg_sec("FDE: func_addr:%#lx func_size:%#x fda_off:%#x fres_off:%#x fres_num:%d info:%u info2:%u rep_size:%u\n",
+ fde.func_addr, fde.func_size,
+ fde.fda_off,
+ fde.fres_off, fde.fres_num,
+ fde.info, fde.info2,
+ fde.rep_size);
+ return ret;
+ }
+
+ fre_addr += fre->size;
+
+ if (prev_fre && fre->ip_off <= prev_fre->ip_off) {
+ dbg_sec("fde %u: fre %u not sorted\n", i, j);
+ return -EFAULT;
+ }
+
+ prev_fre = fre;
+ }
+ }
+
+ return 0;
+}
+
+#else /* !CONFIG_SFRAME_VALIDATION */
+
+static int sframe_validate_section(struct sframe_section *sec) { return 0; }
+
+#endif /* !CONFIG_SFRAME_VALIDATION */
+
+
static void free_section(struct sframe_section *sec)
{
dbg_free(sec);
@@ -476,6 +571,10 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
goto err_free;
}
+ ret = sframe_validate_section(sec);
+ if (ret)
+ goto err_free;
+
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
if (ret) {
dbg_sec("mtree_insert_range failed: text=%lx-%lx\n",
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 12/18] unwind_user: Enable archs that pass RA in a register
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (10 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 11/18] unwind_user/sframe: Add .sframe validation option Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 13/18] unwind_user: Flexible FP/RA recovery rules Jens Remus
` (7 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
Not all architectures have the return address (RA) in user space saved
on the stack on function entry, such as x86-64 does due to its CALL
instruction pushing the RA onto the stack. Architectures/ABIs, such as
s390, also do not necessarily enforce to save the RA in user space on
the stack in the function prologue or even at all, for instance in leaf
functions.
Treat a RA offset from CFA of zero as indication that the RA is not
saved (on the stack). For the topmost frame treat it as indication that
the RA is in the link/RA register, such as on arm64 and s390, and obtain
it from there. For non-topmost frames treat it as error, as the RA must
be saved.
Additionally allow the SP to be unchanged in the topmost frame, for
architectures where SP at function entry == SP at call site, such as
arm64 and s390.
Note that treating a RA offset from CFA of zero as indication that
the RA is not saved on the stack additionally allows for architectures,
such as s390, where the frame pointer (FP) may be saved without the RA
being saved as well. Provided that such architectures represent this
in SFrame by encoding the "missing" RA offset using a padding RA offset
with a value of zero.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- New patch. Based on my s390 sframe support series patch
"unwind_user: Enable archs that pass RA in a register":
https://lore.kernel.org/all/20251208171559.2029709-11-jremus@linux.ibm.com/
include/linux/unwind_user.h | 9 +++++++++
kernel/unwind/sframe.c | 6 ++----
kernel/unwind/user.c | 17 +++++++++++++----
3 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h
index 64618618febd..bc2edae39955 100644
--- a/include/linux/unwind_user.h
+++ b/include/linux/unwind_user.h
@@ -23,6 +23,15 @@ static inline bool unwind_user_at_function_start(struct pt_regs *regs)
#define unwind_user_at_function_start unwind_user_at_function_start
#endif
+#ifndef unwind_user_get_ra_reg
+static inline int unwind_user_get_ra_reg(unsigned long *val)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+#define unwind_user_get_ra_reg unwind_user_get_ra_reg
+#endif
+
int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries);
#endif /* _LINUX_UNWIND_USER_H */
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index cf353bdbc907..fc905504ddde 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -243,10 +243,8 @@ static __always_inline int __read_fre(struct sframe_section *sec,
dataword_count--;
ra_off = sec->ra_off;
- if (!ra_off) {
- if (!dataword_count--)
- return -EFAULT;
-
+ if (!ra_off && dataword_count) {
+ dataword_count--;
UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
}
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index fdb1001e3750..9ceef9b2b8db 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -48,8 +48,12 @@ static int unwind_user_next_common(struct unwind_user_state *state,
}
cfa += frame->cfa_off;
- /* Make sure that stack is not going in wrong direction */
- if (cfa <= state->sp)
+ /*
+ * Make sure that stack is not going in wrong direction. Allow SP
+ * to be unchanged for the topmost frame, by subtracting topmost,
+ * which is either 0 or 1.
+ */
+ if (cfa <= state->sp - state->topmost)
return -EINVAL;
/* Make sure that the address is word aligned */
@@ -57,8 +61,13 @@ static int unwind_user_next_common(struct unwind_user_state *state,
return -EINVAL;
/* Get the Return Address (RA) */
- if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
- return -EINVAL;
+ if (frame->ra_off) {
+ if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
+ return -EINVAL;
+ } else {
+ if (!state->topmost || unwind_user_get_ra_reg(&ra))
+ return -EINVAL;
+ }
/* Get the Frame Pointer (FP) */
if (frame->fp_off && get_user_word(&fp, cfa, frame->fp_off, state->ws))
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 13/18] unwind_user: Flexible FP/RA recovery rules
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (11 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 12/18] unwind_user: Enable archs that pass RA in a register Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 14/18] unwind_user: Flexible CFA " Jens Remus
` (6 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
To enable support for SFrame V3 flexible FDEs with a subsequent patch,
add support for the following flexible frame pointer (FP) and return
address (RA) recovery rules:
FP/RA = *(CFA + offset)
FP/RA = register + offset
FP/RA = *(register + offset)
Note that FP/RA recovery rules that use arbitrary register contents are
only valid when in the topmost frame, as their contents are otherwise
unknown.
This also enables unwinding of user space for architectures, such as
s390, that may save the frame pointer (FP) and/or return address (RA) in
other registers, for instance when in a leaf function.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- New patch. Based on my s390 sframe support series patch
"unwind_user: Enable archs that save RA/FP in other registers":
https://lore.kernel.org/all/20251208171559.2029709-12-jremus@linux.ibm.com/
arch/x86/include/asm/unwind_user.h | 21 ++++++++++---
include/linux/unwind_user.h | 9 ++++++
include/linux/unwind_user_types.h | 23 +++++++++++++--
kernel/unwind/sframe.c | 16 ++++++++--
kernel/unwind/user.c | 47 +++++++++++++++++++++++++-----
5 files changed, 101 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 2dfb5ef11e36..9c3417be4283 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -21,15 +21,26 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
#define ARCH_INIT_USER_FP_FRAME(ws) \
.cfa_off = 2*(ws), \
- .ra_off = -1*(ws), \
- .fp_off = -2*(ws), \
+ .ra = { \
+ .rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
+ .offset = -1*(ws), \
+ }, \
+ .fp = { \
+ .rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
+ .offset = -2*(ws), \
+ }, \
.use_fp = true, \
.outermost = false,
#define ARCH_INIT_USER_FP_ENTRY_FRAME(ws) \
.cfa_off = 1*(ws), \
- .ra_off = -1*(ws), \
- .fp_off = 0, \
+ .ra = { \
+ .rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
+ .offset = -1*(ws), \
+ }, \
+ .fp = { \
+ .rule = UNWIND_USER_RULE_RETAIN,\
+ }, \
.use_fp = false, \
.outermost = false,
@@ -41,4 +52,6 @@ static inline bool unwind_user_at_function_start(struct pt_regs *regs)
#endif /* CONFIG_HAVE_UNWIND_USER_FP */
+#include <asm-generic/unwind_user.h>
+
#endif /* _ASM_X86_UNWIND_USER_H */
diff --git a/include/linux/unwind_user.h b/include/linux/unwind_user.h
index bc2edae39955..92cdf38c8ade 100644
--- a/include/linux/unwind_user.h
+++ b/include/linux/unwind_user.h
@@ -32,6 +32,15 @@ static inline int unwind_user_get_ra_reg(unsigned long *val)
#define unwind_user_get_ra_reg unwind_user_get_ra_reg
#endif
+#ifndef unwind_user_get_reg
+static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+#define unwind_user_get_reg unwind_user_get_reg
+#endif
+
int unwind_user(struct unwind_stacktrace *trace, unsigned int max_entries);
#endif /* _LINUX_UNWIND_USER_H */
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 616cc5ee4586..0d02714a1b5d 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -27,10 +27,29 @@ struct unwind_stacktrace {
unsigned long *entries;
};
+#define UNWIND_USER_RULE_DEREF BIT(31)
+
+enum unwind_user_rule {
+ UNWIND_USER_RULE_RETAIN, /* entity = entity */
+ UNWIND_USER_RULE_CFA_OFFSET, /* entity = CFA + offset */
+ UNWIND_USER_RULE_REG_OFFSET, /* entity = register + offset */
+ /* DEREF variants */
+ UNWIND_USER_RULE_CFA_OFFSET_DEREF = /* entity = *(CFA + offset) */
+ UNWIND_USER_RULE_CFA_OFFSET | UNWIND_USER_RULE_DEREF,
+ UNWIND_USER_RULE_REG_OFFSET_DEREF = /* entity = *(register + offset) */
+ UNWIND_USER_RULE_REG_OFFSET | UNWIND_USER_RULE_DEREF,
+};
+
+struct unwind_user_rule_data {
+ enum unwind_user_rule rule;
+ s32 offset;
+ unsigned int regnum;
+};
+
struct unwind_user_frame {
s32 cfa_off;
- s32 ra_off;
- s32 fp_off;
+ struct unwind_user_rule_data ra;
+ struct unwind_user_rule_data fp;
bool use_fp;
bool outermost;
};
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index fc905504ddde..b5301fa9dbc8 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -271,6 +271,18 @@ static __always_inline int __read_fre(struct sframe_section *sec,
return -EFAULT;
}
+static __always_inline void
+sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
+ s32 offset)
+{
+ if (offset) {
+ rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF;
+ rule_data->offset = offset;
+ } else {
+ rule_data->rule = UNWIND_USER_RULE_RETAIN;
+ }
+}
+
static __always_inline int __find_fre(struct sframe_section *sec,
struct sframe_fde_internal *fde,
unsigned long ip,
@@ -321,8 +333,8 @@ static __always_inline int __find_fre(struct sframe_section *sec,
fre = prev_fre;
frame->cfa_off = fre->cfa_off;
- frame->ra_off = fre->ra_off;
- frame->fp_off = fre->fp_off;
+ sframe_init_rule_data(&frame->ra, fre->ra_off);
+ sframe_init_rule_data(&frame->fp, fre->fp_off);
frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 9ceef9b2b8db..0405922c5c0d 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -61,22 +61,55 @@ static int unwind_user_next_common(struct unwind_user_state *state,
return -EINVAL;
/* Get the Return Address (RA) */
- if (frame->ra_off) {
- if (get_user_word(&ra, cfa, frame->ra_off, state->ws))
- return -EINVAL;
- } else {
+ switch (frame->ra.rule) {
+ case UNWIND_USER_RULE_RETAIN:
if (!state->topmost || unwind_user_get_ra_reg(&ra))
return -EINVAL;
+ break;
+ /* UNWIND_USER_RULE_CFA_OFFSET not implemented on purpose */
+ case UNWIND_USER_RULE_CFA_OFFSET_DEREF:
+ ra = cfa + frame->ra.offset;
+ break;
+ case UNWIND_USER_RULE_REG_OFFSET:
+ case UNWIND_USER_RULE_REG_OFFSET_DEREF:
+ if (!state->topmost || unwind_user_get_reg(&ra, frame->ra.regnum))
+ return -EINVAL;
+ ra += frame->ra.offset;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
}
+ if (frame->ra.rule & UNWIND_USER_RULE_DEREF &&
+ get_user_word(&ra, ra, 0, state->ws))
+ return -EINVAL;
/* Get the Frame Pointer (FP) */
- if (frame->fp_off && get_user_word(&fp, cfa, frame->fp_off, state->ws))
+ switch (frame->fp.rule) {
+ case UNWIND_USER_RULE_RETAIN:
+ fp = state->fp;
+ break;
+ /* UNWIND_USER_RULE_CFA_OFFSET not implemented on purpose */
+ case UNWIND_USER_RULE_CFA_OFFSET_DEREF:
+ fp = cfa + frame->fp.offset;
+ break;
+ case UNWIND_USER_RULE_REG_OFFSET:
+ case UNWIND_USER_RULE_REG_OFFSET_DEREF:
+ if (!state->topmost || unwind_user_get_reg(&fp, frame->fp.regnum))
+ return -EINVAL;
+ fp += frame->fp.offset;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+ }
+ if (frame->fp.rule & UNWIND_USER_RULE_DEREF &&
+ get_user_word(&fp, fp, 0, state->ws))
return -EINVAL;
state->ip = ra;
state->sp = cfa;
- if (frame->fp_off)
- state->fp = fp;
+ state->fp = fp;
state->topmost = false;
return 0;
}
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 14/18] unwind_user: Flexible CFA recovery rules
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (12 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 13/18] unwind_user: Flexible FP/RA recovery rules Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 15/18] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
` (5 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
To enable support for SFrame V3 flexible FDEs with a subsequent patch,
add support for the following flexible Canonical Frame Address (CFA)
recovery rules:
CFA = SP + offset
CFA = FP + offset
CFA = register + offset
CFA = *(register + offset)
Note that CFA recovery rules that use arbitrary register contents are
only valid when in the topmost frame, as their contents are otherwise
unknown.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- New patch.
arch/x86/include/asm/unwind_user.h | 12 ++++++++----
include/linux/unwind_user_types.h | 18 ++++++++++++++++--
kernel/unwind/sframe.c | 15 +++++++++++++--
kernel/unwind/user.c | 22 ++++++++++++++++++----
4 files changed, 55 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index 9c3417be4283..f38f7c5ff1de 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -20,7 +20,10 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
#ifdef CONFIG_HAVE_UNWIND_USER_FP
#define ARCH_INIT_USER_FP_FRAME(ws) \
- .cfa_off = 2*(ws), \
+ .cfa = { \
+ .rule = UNWIND_USER_CFA_RULE_FP_OFFSET,\
+ .offset = 2*(ws), \
+ }, \
.ra = { \
.rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
.offset = -1*(ws), \
@@ -29,11 +32,13 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
.rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
.offset = -2*(ws), \
}, \
- .use_fp = true, \
.outermost = false,
#define ARCH_INIT_USER_FP_ENTRY_FRAME(ws) \
- .cfa_off = 1*(ws), \
+ .cfa = { \
+ .rule = UNWIND_USER_CFA_RULE_SP_OFFSET,\
+ .offset = 1*(ws), \
+ }, \
.ra = { \
.rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF,\
.offset = -1*(ws), \
@@ -41,7 +46,6 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
.fp = { \
.rule = UNWIND_USER_RULE_RETAIN,\
}, \
- .use_fp = false, \
.outermost = false,
static inline bool unwind_user_at_function_start(struct pt_regs *regs)
diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h
index 0d02714a1b5d..059e5c76f2f3 100644
--- a/include/linux/unwind_user_types.h
+++ b/include/linux/unwind_user_types.h
@@ -29,6 +29,21 @@ struct unwind_stacktrace {
#define UNWIND_USER_RULE_DEREF BIT(31)
+enum unwind_user_cfa_rule {
+ UNWIND_USER_CFA_RULE_SP_OFFSET, /* CFA = SP + offset */
+ UNWIND_USER_CFA_RULE_FP_OFFSET, /* CFA = FP + offset */
+ UNWIND_USER_CFA_RULE_REG_OFFSET, /* CFA = reg + offset */
+ /* DEREF variants */
+ UNWIND_USER_CFA_RULE_REG_OFFSET_DEREF = /* CFA = *(reg + offset) */
+ UNWIND_USER_CFA_RULE_REG_OFFSET | UNWIND_USER_RULE_DEREF,
+};
+
+struct unwind_user_cfa_rule_data {
+ enum unwind_user_cfa_rule rule;
+ s32 offset;
+ unsigned int regnum;
+};
+
enum unwind_user_rule {
UNWIND_USER_RULE_RETAIN, /* entity = entity */
UNWIND_USER_RULE_CFA_OFFSET, /* entity = CFA + offset */
@@ -47,10 +62,9 @@ struct unwind_user_rule_data {
};
struct unwind_user_frame {
- s32 cfa_off;
+ struct unwind_user_cfa_rule_data cfa;
struct unwind_user_rule_data ra;
struct unwind_user_rule_data fp;
- bool use_fp;
bool outermost;
};
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index b5301fa9dbc8..4dfc8cf2075e 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -271,6 +271,18 @@ static __always_inline int __read_fre(struct sframe_section *sec,
return -EFAULT;
}
+static __always_inline void
+sframe_init_cfa_rule_data(struct unwind_user_cfa_rule_data *cfa_rule_data,
+ unsigned char fre_info,
+ s32 offset)
+{
+ if (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP)
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
+ else
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
+ cfa_rule_data->offset = offset;
+}
+
static __always_inline void
sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
s32 offset)
@@ -332,10 +344,9 @@ static __always_inline int __find_fre(struct sframe_section *sec,
return -EINVAL;
fre = prev_fre;
- frame->cfa_off = fre->cfa_off;
+ sframe_init_cfa_rule_data(&frame->cfa, fre->info, fre->cfa_off);
sframe_init_rule_data(&frame->ra, fre->ra_off);
sframe_init_rule_data(&frame->fp, fre->fp_off);
- frame->use_fp = SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP;
frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
return 0;
diff --git a/kernel/unwind/user.c b/kernel/unwind/user.c
index 0405922c5c0d..eb7d9489f671 100644
--- a/kernel/unwind/user.c
+++ b/kernel/unwind/user.c
@@ -39,14 +39,28 @@ static int unwind_user_next_common(struct unwind_user_state *state,
}
/* Get the Canonical Frame Address (CFA) */
- if (frame->use_fp) {
+ switch (frame->cfa.rule) {
+ case UNWIND_USER_CFA_RULE_SP_OFFSET:
+ cfa = state->sp;
+ break;
+ case UNWIND_USER_CFA_RULE_FP_OFFSET:
if (state->fp < state->sp)
return -EINVAL;
cfa = state->fp;
- } else {
- cfa = state->sp;
+ break;
+ case UNWIND_USER_CFA_RULE_REG_OFFSET:
+ case UNWIND_USER_CFA_RULE_REG_OFFSET_DEREF:
+ if (!state->topmost || unwind_user_get_reg(&cfa, frame->cfa.regnum))
+ return -EINVAL;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return -EINVAL;
}
- cfa += frame->cfa_off;
+ cfa += frame->cfa.offset;
+ if (frame->cfa.rule & UNWIND_USER_RULE_DEREF &&
+ get_user_word(&cfa, cfa, 0, state->ws))
+ return -EINVAL;
/*
* Make sure that stack is not going in wrong direction. Allow SP
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 15/18] unwind_user/sframe: Add support for SFrame V3 flexible FDEs
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (13 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 14/18] unwind_user: Flexible CFA " Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 16/18] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
` (4 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
SFrame V3 introduces flexible FDEs in addition to the regular FDEs.
The key difference is that flexible FDEs encode the CFA, RA, and FP
tracking information using two FRE data words, a control word and an
offset, or a single padding data word of zero (e.g. to represent FP
without RA tracking information).
The control word contains the following information:
- reg_p: Whether to use the register contents (reg_p=1) specified
by regnum or the CFA (reg_p=0) as base.
- deref_p: Whether to dereference.
- regnum: A DWARF register number.
The offset is added to the base (i.e. CFA or register contents). Then
the resulting address may optionally be dereferenced.
This enables the following flexible CFA and FP/RA recovery rules:
- CFA = register + offset // reg_p=1, deref_p=0
- CFA = *(register + offset) // reg_p=1, deref_p=1
- FP/RA = *(CFA + offset) // reg_p=0, deref_p=0
- FP/RA = register + offset // reg_p=1, deref_p=0
- FP/RA = *(register + offset) // reg_p=1, deref_p=1
Note that for the CFA a rule with reg_p=0 is invalid, as the value of
the CFA cannot be described using itself as base. For FP/RA a rule with
reg_p=0 and deref_p=0 and regnum=0 is invalid, as it that is equal to
the padding data word of zero.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- New patch.
kernel/unwind/sframe.c | 249 ++++++++++++++++++++++++++++++++---------
kernel/unwind/sframe.h | 5 +
2 files changed, 204 insertions(+), 50 deletions(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index 4dfc8cf2075e..ebf2a2905c5c 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -12,6 +12,7 @@
#include <linux/mm.h>
#include <linux/string_helpers.h>
#include <linux/sframe.h>
+#include <asm/unwind_user_sframe.h>
#include <linux/unwind_user_types.h>
#include "sframe.h"
@@ -31,8 +32,11 @@ struct sframe_fde_internal {
struct sframe_fre_internal {
unsigned int size;
u32 ip_off;
+ u32 cfa_ctl;
s32 cfa_off;
+ u32 ra_ctl;
s32 ra_off;
+ u32 fp_ctl;
s32 fp_off;
u8 info;
};
@@ -189,16 +193,147 @@ static __always_inline int __find_fde(struct sframe_section *sec,
s32 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label), \
s64 : UNSAFE_GET_USER_SIGNED_INC(to, from, size, label))
+static __always_inline int
+__read_regular_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long cur,
+ unsigned char dataword_count,
+ unsigned char dataword_size,
+ struct sframe_fre_internal *fre)
+{
+ s32 cfa_off, ra_off, fp_off;
+ unsigned int cfa_regnum;
+
+ UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
+ dataword_count--;
+
+ ra_off = sec->ra_off;
+ if (!ra_off && dataword_count) {
+ dataword_count--;
+ UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
+ }
+
+ fp_off = sec->fp_off;
+ if (!fp_off && dataword_count) {
+ dataword_count--;
+ UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ }
+
+ if (dataword_count)
+ return -EFAULT;
+
+ cfa_regnum =
+ (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre->info) == SFRAME_BASE_REG_FP) ?
+ SFRAME_REG_FP : SFRAME_REG_SP;
+
+ fre->cfa_ctl = (cfa_regnum << 3) | 1; /* regnum, deref_p=0, reg_p=1 */
+ fre->cfa_off = cfa_off;
+ fre->ra_ctl = ra_off ? 2 : 0; /* regnum=0, deref_p=(ra_off != 0), reg_p=0 */
+ fre->ra_off = ra_off;
+ fre->fp_ctl = fp_off ? 2 : 0; /* regnum=0, deref_p=(fp_off != 0), reg_p=0 */
+ fre->fp_off = fp_off;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int
+__read_flex_fde_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long cur,
+ unsigned char dataword_count,
+ unsigned char dataword_size,
+ struct sframe_fre_internal *fre)
+{
+ u32 cfa_ctl, ra_ctl, fp_ctl;
+ s32 cfa_off, ra_off, fp_off;
+
+ if (dataword_count < 2)
+ return -EFAULT;
+ UNSAFE_GET_USER_INC(cfa_ctl, cur, dataword_size, Efault);
+ UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
+ dataword_count -= 2;
+
+ ra_off = sec->ra_off;
+ ra_ctl = ra_off ? 2 : 0; /* regnum=0, deref_p=(ra_off != 0), reg_p=0 */
+ if (dataword_count >= 2) {
+ UNSAFE_GET_USER_INC(ra_ctl, cur, dataword_size, Efault);
+ dataword_count--;
+ if (ra_ctl) {
+ UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
+ dataword_count--;
+ } else {
+ /* Padding RA location info */
+ ra_ctl = ra_off ? 2 : 0; /* re-deduce (see above) */
+ }
+ }
+
+ fp_off = sec->fp_off;
+ fp_ctl = fp_off ? 2 : 0; /* regnum=0, deref_p=(fp_off != 0), reg_p=0 */
+ if (dataword_count >= 2) {
+ UNSAFE_GET_USER_INC(fp_ctl, cur, dataword_size, Efault);
+ dataword_count--;
+ if (fp_ctl) {
+ UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ dataword_count--;
+ } else {
+ /* Padding FP location info */
+ fp_ctl = fp_off ? 2 : 0; /* re-deduce (see above) */
+ }
+ }
+
+ if (dataword_count)
+ return -EFAULT;
+
+ fre->cfa_ctl = cfa_ctl;
+ fre->cfa_off = cfa_off;
+ fre->ra_ctl = ra_ctl;
+ fre->ra_off = ra_off;
+ fre->fp_ctl = fp_ctl;
+ fre->fp_off = fp_off;
+
+ return 0;
+
+Efault:
+ return -EFAULT;
+}
+
+static __always_inline int
+__read_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ unsigned long cur,
+ unsigned char dataword_count,
+ unsigned char dataword_size,
+ struct sframe_fre_internal *fre)
+{
+ unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
+
+ switch (fde_type) {
+ case SFRAME_FDE_TYPE_REGULAR:
+ return __read_regular_fre_datawords(sec, fde, cur,
+ dataword_count,
+ dataword_size,
+ fre);
+ case SFRAME_FDE_TYPE_FLEXIBLE:
+ return __read_flex_fde_fre_datawords(sec, fde, cur,
+ dataword_count,
+ dataword_size,
+ fre);
+ default:
+ return -EFAULT;
+ }
+}
+
static __always_inline int __read_fre(struct sframe_section *sec,
struct sframe_fde_internal *fde,
unsigned long fre_addr,
struct sframe_fre_internal *fre)
{
- unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(fde->info);
unsigned char dataword_count, dataword_size;
- s32 cfa_off, ra_off, fp_off;
unsigned long cur = fre_addr;
unsigned char addr_size;
u32 ip_off;
@@ -224,75 +359,88 @@ static __always_inline int __read_fre(struct sframe_section *sec,
if (cur + (dataword_count * dataword_size) > sec->fres_end)
return -EFAULT;
- /* TODO: Support for flexible FDEs not implemented yet. */
- if (fde_type != SFRAME_FDE_TYPE_REGULAR)
- return -EFAULT;
+ fre->size = addr_size + 1 + (dataword_count * dataword_size);
+ fre->ip_off = ip_off;
+ fre->info = info;
if (!dataword_count) {
/*
* A FRE without data words indicates RA undefined /
* outermost frame.
*/
- cfa_off = 0;
- ra_off = 0;
- fp_off = 0;
- goto done;
- }
+ fre->cfa_ctl = 0;
+ fre->cfa_off = 0;
+ fre->ra_ctl = 0;
+ fre->ra_off = 0;
+ fre->fp_ctl = 0;
+ fre->fp_off = 0;
- UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
- dataword_count--;
-
- ra_off = sec->ra_off;
- if (!ra_off && dataword_count) {
- dataword_count--;
- UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
- }
-
- fp_off = sec->fp_off;
- if (!fp_off && dataword_count) {
- dataword_count--;
- UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
+ return 0;
}
- if (dataword_count)
- return -EFAULT;
-
-done:
- fre->size = addr_size + 1 + (dataword_count * dataword_size);
- fre->ip_off = ip_off;
- fre->cfa_off = cfa_off;
- fre->ra_off = ra_off;
- fre->fp_off = fp_off;
- fre->info = info;
-
- return 0;
+ return __read_fre_datawords(sec, fde, cur, dataword_count, dataword_size, fre);
Efault:
return -EFAULT;
}
-static __always_inline void
+static __always_inline int
sframe_init_cfa_rule_data(struct unwind_user_cfa_rule_data *cfa_rule_data,
- unsigned char fre_info,
- s32 offset)
+ u32 ctlword, s32 offset)
{
- if (SFRAME_V3_FRE_CFA_BASE_REG_ID(fre_info) == SFRAME_BASE_REG_FP)
- cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
- else
- cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
+ bool deref_p = SFRAME_V3_FLEX_FDE_CTLWORD_DEREF_P(ctlword);
+ bool reg_p = SFRAME_V3_FLEX_FDE_CTLWORD_REG_P(ctlword);
+
+ if (reg_p) {
+ unsigned int regnum = SFRAME_V3_FLEX_FDE_CTLWORD_REGNUM(ctlword);
+
+ switch (regnum) {
+ case SFRAME_REG_SP:
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_SP_OFFSET;
+ break;
+ case SFRAME_REG_FP:
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_FP_OFFSET;
+ break;
+ default:
+ cfa_rule_data->rule = UNWIND_USER_CFA_RULE_REG_OFFSET;
+ cfa_rule_data->regnum = regnum;
+ }
+ } else {
+ return -EINVAL;
+ }
+
+ if (deref_p)
+ cfa_rule_data->rule |= UNWIND_USER_RULE_DEREF;
+
cfa_rule_data->offset = offset;
+
+ return 0;
}
static __always_inline void
sframe_init_rule_data(struct unwind_user_rule_data *rule_data,
- s32 offset)
+ u32 ctlword, s32 offset)
{
- if (offset) {
- rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET_DEREF;
- rule_data->offset = offset;
- } else {
+ bool deref_p = SFRAME_V3_FLEX_FDE_CTLWORD_DEREF_P(ctlword);
+ bool reg_p = SFRAME_V3_FLEX_FDE_CTLWORD_REG_P(ctlword);
+
+ if (!ctlword && !offset) {
rule_data->rule = UNWIND_USER_RULE_RETAIN;
+ return;
+ }
+ if (reg_p) {
+ unsigned int regnum = SFRAME_V3_FLEX_FDE_CTLWORD_REGNUM(ctlword);
+
+ rule_data->rule = UNWIND_USER_RULE_REG_OFFSET;
+ rule_data->regnum = regnum;
+ } else {
+ rule_data->rule = UNWIND_USER_RULE_CFA_OFFSET;
}
+
+ if (deref_p)
+ rule_data->rule |= UNWIND_USER_RULE_DEREF;
+
+ rule_data->offset = offset;
}
static __always_inline int __find_fre(struct sframe_section *sec,
@@ -344,9 +492,10 @@ static __always_inline int __find_fre(struct sframe_section *sec,
return -EINVAL;
fre = prev_fre;
- sframe_init_cfa_rule_data(&frame->cfa, fre->info, fre->cfa_off);
- sframe_init_rule_data(&frame->ra, fre->ra_off);
- sframe_init_rule_data(&frame->fp, fre->fp_off);
+ if (sframe_init_cfa_rule_data(&frame->cfa, fre->cfa_ctl, fre->cfa_off))
+ return -EINVAL;
+ sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off);
+ sframe_init_rule_data(&frame->fp, fre->fp_ctl, fre->fp_off);
frame->outermost = SFRAME_V3_FRE_RA_UNDEFINED_P(fre->info);
return 0;
diff --git a/kernel/unwind/sframe.h b/kernel/unwind/sframe.h
index 3fcc15534e5a..5b6112945b6c 100644
--- a/kernel/unwind/sframe.h
+++ b/kernel/unwind/sframe.h
@@ -66,6 +66,7 @@ struct sframe_fda_v3 {
#define SFRAME_V3_AARCH64_FDE_PAUTH_KEY(info) (((info) >> 5) & 0x1)
#define SFRAME_FDE_TYPE_REGULAR 0
+#define SFRAME_FDE_TYPE_FLEXIBLE 1
#define SFRAME_V3_FDE_TYPE_MASK 0x0f
#define SFRAME_V3_FDE_TYPE(info2) ((info2) & SFRAME_V3_FDE_TYPE_MASK)
@@ -79,4 +80,8 @@ struct sframe_fda_v3 {
#define SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(info) (((info) >> 7) & 0x1)
#define SFRAME_V3_FRE_RA_UNDEFINED_P(info) (SFRAME_V3_FRE_DATAWORD_COUNT(info) == 0)
+#define SFRAME_V3_FLEX_FDE_CTLWORD_REGNUM(data) (((data) >> 3) & 0x1f)
+#define SFRAME_V3_FLEX_FDE_CTLWORD_DEREF_P(data) (((data) >> 1) & 0x1)
+#define SFRAME_V3_FLEX_FDE_CTLWORD_REG_P(data) ((data) & 0x1)
+
#endif /* _SFRAME_H */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 16/18] unwind_user/sframe: Separate reading of FRE from reading of FRE data words
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (14 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 15/18] unwind_user/sframe: Add support for SFrame V3 flexible FDEs Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 17/18] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
` (3 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
__find_fre() performs linear search for a matching SFrame FRE for a
given IP. For that purpose it uses __read_fre(), which reads the whole
FRE. That is the variable-size FRE structure as well as the trailing
variable-length array of variable-size data words. For the search logic
to skip over the FRE it would be sufficient to read the variable-size
FRE structure only, which includes the count and size of data words.
Add fields to struct sframe_fre_internal to store the FRE data word's
address, count, and size. Change __read_fre() to read the variable-
size FRE structure only and populate those new fields. Change
__read_fre_datawords() to use those new fields. Change __find_fre()
to use __read_fre_datawords() to read the FRE data words only after a
matching FRE has been found. Introduce safe_read_fre_datawords() and
use it in sframe_validate_section() to validate that the FRE data words.
Cc: Steven Rostedt <rostedt@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- New patch.
kernel/unwind/sframe.c | 91 +++++++++++++++++++++++++++---------------
1 file changed, 58 insertions(+), 33 deletions(-)
diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
index ebf2a2905c5c..f24997e84e05 100644
--- a/kernel/unwind/sframe.c
+++ b/kernel/unwind/sframe.c
@@ -39,6 +39,9 @@ struct sframe_fre_internal {
u32 fp_ctl;
s32 fp_off;
u8 info;
+ unsigned long dw_addr;
+ unsigned char dw_count;
+ unsigned char dw_size;
};
DEFINE_STATIC_SRCU(sframe_srcu);
@@ -196,11 +199,11 @@ static __always_inline int __find_fde(struct sframe_section *sec,
static __always_inline int
__read_regular_fre_datawords(struct sframe_section *sec,
struct sframe_fde_internal *fde,
- unsigned long cur,
- unsigned char dataword_count,
- unsigned char dataword_size,
struct sframe_fre_internal *fre)
{
+ unsigned char dataword_count = fre->dw_count;
+ unsigned char dataword_size = fre->dw_size;
+ unsigned long cur = fre->dw_addr;
s32 cfa_off, ra_off, fp_off;
unsigned int cfa_regnum;
@@ -242,11 +245,11 @@ __read_regular_fre_datawords(struct sframe_section *sec,
static __always_inline int
__read_flex_fde_fre_datawords(struct sframe_section *sec,
struct sframe_fde_internal *fde,
- unsigned long cur,
- unsigned char dataword_count,
- unsigned char dataword_size,
struct sframe_fre_internal *fre)
{
+ unsigned char dataword_count = fre->dw_count;
+ unsigned char dataword_size = fre->dw_size;
+ unsigned long cur = fre->dw_addr;
u32 cfa_ctl, ra_ctl, fp_ctl;
s32 cfa_off, ra_off, fp_off;
@@ -303,24 +306,28 @@ __read_flex_fde_fre_datawords(struct sframe_section *sec,
static __always_inline int
__read_fre_datawords(struct sframe_section *sec,
struct sframe_fde_internal *fde,
- unsigned long cur,
- unsigned char dataword_count,
- unsigned char dataword_size,
struct sframe_fre_internal *fre)
{
unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
+ unsigned char dataword_count = fre->dw_count;
+
+ if (!dataword_count) {
+ /* A FRE without data words indicates an outermost frame. */
+ fre->cfa_ctl = 0;
+ fre->cfa_off = 0;
+ fre->ra_ctl = 0;
+ fre->ra_off = 0;
+ fre->fp_ctl = 0;
+ fre->fp_off = 0;
+
+ return 0;
+ }
switch (fde_type) {
case SFRAME_FDE_TYPE_REGULAR:
- return __read_regular_fre_datawords(sec, fde, cur,
- dataword_count,
- dataword_size,
- fre);
+ return __read_regular_fre_datawords(sec, fde, fre);
case SFRAME_FDE_TYPE_FLEXIBLE:
- return __read_flex_fde_fre_datawords(sec, fde, cur,
- dataword_count,
- dataword_size,
- fre);
+ return __read_flex_fde_fre_datawords(sec, fde, fre);
default:
return -EFAULT;
}
@@ -362,23 +369,11 @@ static __always_inline int __read_fre(struct sframe_section *sec,
fre->size = addr_size + 1 + (dataword_count * dataword_size);
fre->ip_off = ip_off;
fre->info = info;
+ fre->dw_addr = cur;
+ fre->dw_count = dataword_count;
+ fre->dw_size = dataword_size;
- if (!dataword_count) {
- /*
- * A FRE without data words indicates RA undefined /
- * outermost frame.
- */
- fre->cfa_ctl = 0;
- fre->cfa_off = 0;
- fre->ra_ctl = 0;
- fre->ra_off = 0;
- fre->fp_ctl = 0;
- fre->fp_off = 0;
-
- return 0;
- }
-
- return __read_fre_datawords(sec, fde, cur, dataword_count, dataword_size, fre);
+ return 0;
Efault:
return -EFAULT;
@@ -455,6 +450,7 @@ static __always_inline int __find_fre(struct sframe_section *sec,
bool which = false;
unsigned int i;
u32 ip_off;
+ int ret;
ip_off = ip - fde->func_addr;
@@ -492,6 +488,10 @@ static __always_inline int __find_fre(struct sframe_section *sec,
return -EINVAL;
fre = prev_fre;
+ ret = __read_fre_datawords(sec, fde, fre);
+ if (ret)
+ return ret;
+
if (sframe_init_cfa_rule_data(&frame->cfa, fre->cfa_ctl, fre->cfa_off))
return -EINVAL;
sframe_init_rule_data(&frame->ra, fre->ra_ctl, fre->ra_off);
@@ -567,6 +567,20 @@ static int safe_read_fre(struct sframe_section *sec,
return ret;
}
+static int safe_read_fre_datawords(struct sframe_section *sec,
+ struct sframe_fde_internal *fde,
+ struct sframe_fre_internal *fre)
+{
+ int ret;
+
+ if (!user_read_access_begin((void __user *)sec->sframe_start,
+ sec->sframe_end - sec->sframe_start))
+ return -EFAULT;
+ ret = __read_fre_datawords(sec, fde, fre);
+ user_read_access_end();
+ return ret;
+}
+
static int sframe_validate_section(struct sframe_section *sec)
{
unsigned long prev_ip = 0;
@@ -610,6 +624,17 @@ static int sframe_validate_section(struct sframe_section *sec)
fde.rep_size);
return ret;
}
+ ret = safe_read_fre_datawords(sec, &fde, fre);
+ if (ret) {
+ dbg_sec("fde %u: __read_fre_datawords(%u) failed\n", i, j);
+ dbg_sec("FDE: func_addr:%#lx func_size:%#x fda_off:%#x fres_off:%#x fres_num:%d info:%u info2:%u rep_size:%u\n",
+ fde.func_addr, fde.func_size,
+ fde.fda_off,
+ fde.fres_off, fde.fres_num,
+ fde.info, fde.info2,
+ fde.rep_size);
+ return ret;
+ }
fre_addr += fre->size;
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 17/18] unwind_user/sframe/x86: Enable sframe unwinding on x86
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (15 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 16/18] unwind_user/sframe: Separate reading of FRE from reading of FRE data words Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-01-27 15:05 ` [PATCH v13 18/18] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
` (2 subsequent siblings)
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
The x86 SFrame V3 implementation works fairly well, starting with
binutils 2.46. Enable it.
[ Jens Remus: Reword commit message for SFrame V3, starting with
binutils 2.46. ]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
Notes (jremus):
Changes in v13:
- Naive implementation of unwind_user_get_reg() to support SFrame V3
flexible FDEs (e.g. used to represent DRAP pattern).
- Define SFRAME_REG_SP and SFRAME_REG_FP to the respective x86-64
DWARF register numbers.
- Reword commit message for SFrame V3 and (upcoming) binutils 2.46.
arch/x86/Kconfig | 1 +
arch/x86/include/asm/unwind_user.h | 34 +++++++++++++++++++++++
arch/x86/include/asm/unwind_user_sframe.h | 12 ++++++++
3 files changed, 47 insertions(+)
create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f859..195cb99a590c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -301,6 +301,7 @@ config X86
select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_UNWIND_USER_FP if X86_64
+ select HAVE_UNWIND_USER_SFRAME if X86_64
select HAVE_USER_RETURN_NOTIFIER
select HAVE_GENERIC_VDSO
select VDSO_GETRANDOM if X86_64
diff --git a/arch/x86/include/asm/unwind_user.h b/arch/x86/include/asm/unwind_user.h
index f38f7c5ff1de..ae46906c3b39 100644
--- a/arch/x86/include/asm/unwind_user.h
+++ b/arch/x86/include/asm/unwind_user.h
@@ -15,6 +15,40 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
return user_64bit_mode(regs) ? 8 : 4;
}
+static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
+{
+#ifdef CONFIG_X86_64
+ const struct pt_regs *regs = task_pt_regs(current);
+
+ switch (regnum) {
+ /* DWARF register numbers 0..15 */
+ case 0: *val = regs->ax; break;
+ case 1: *val = regs->dx; break;
+ case 2: *val = regs->cx; break;
+ case 3: *val = regs->bx; break;
+ case 4: *val = regs->si; break;
+ case 5: *val = regs->di; break;
+ case 6: *val = regs->bp; break;
+ case 7: *val = regs->sp; break;
+ case 8: *val = regs->r8; break;
+ case 9: *val = regs->r9; break;
+ case 10: *val = regs->r10; break;
+ case 11: *val = regs->r11; break;
+ case 12: *val = regs->r12; break;
+ case 13: *val = regs->r13; break;
+ case 14: *val = regs->r14; break;
+ case 15: *val = regs->r15; break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+#else /* !CONFIG_X86_64 */
+ return -EINVAL;
+#endif /* !CONFIG_X86_64 */
+
+}
+#define unwind_user_get_reg unwind_user_get_reg
+
#endif /* CONFIG_UNWIND_USER */
#ifdef CONFIG_HAVE_UNWIND_USER_FP
diff --git a/arch/x86/include/asm/unwind_user_sframe.h b/arch/x86/include/asm/unwind_user_sframe.h
new file mode 100644
index 000000000000..d828ae1a4aac
--- /dev/null
+++ b/arch/x86/include/asm/unwind_user_sframe.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_UNWIND_USER_SFRAME_H
+#define _ASM_X86_UNWIND_USER_SFRAME_H
+
+#ifdef CONFIG_X86_64
+
+#define SFRAME_REG_SP 7
+#define SFRAME_REG_FP 6
+
+#endif
+
+#endif /* _ASM_X86_UNWIND_USER_SFRAME_H */
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v13 18/18] unwind_user/sframe: Add prctl() interface for registering .sframe sections
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (16 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 17/18] unwind_user/sframe/x86: Enable sframe unwinding on x86 Jens Remus
@ 2026-01-27 15:05 ` Jens Remus
2026-02-05 18:26 ` [PATCH v13 00/18] unwind_deferred: Implement sframe handling Namhyung Kim
2026-02-11 1:47 ` Dylan Hatch
19 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-01-27 15:05 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm, Steven Rostedt
Cc: Jens Remus, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik, Steven Rostedt (Google)
From: Josh Poimboeuf <jpoimboe@kernel.org>
The kernel doesn't have direct visibility to the ELF contents of shared
libraries. Add some prctl() interfaces which allow glibc to tell the
kernel where to find .sframe sections.
[
This adds an interface for prctl() for testing loading of sframes for
libraries. But this interface should really be a system call. This patch
is for testing purposes only and should not be applied to mainline.
]
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Indu Bhagat <indu.bhagat@oracle.com>
Cc: "Jose E. Marchesi" <jemarch@gnu.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Jens Remus <jremus@linux.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Sam James <sam@gentoo.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Carlos O'Donell" <codonell@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
include/uapi/linux/prctl.h | 6 +++++-
kernel/sys.c | 9 +++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 51c4e8c82b1e..91d2c9eb3aa3 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -368,7 +368,7 @@ struct prctl_mm_map {
* configuration. All bits may be locked via this call, including
* undefined bits.
*/
-#define PR_LOCK_SHADOW_STACK_STATUS 76
+#define PR_LOCK_SHADOW_STACK_STATUS 76
/*
* Controls the mode of timer_create() for CRIU restore operations.
@@ -386,4 +386,8 @@ struct prctl_mm_map {
# define PR_FUTEX_HASH_SET_SLOTS 1
# define PR_FUTEX_HASH_GET_SLOTS 2
+/* SFRAME management */
+#define PR_ADD_SFRAME 79
+#define PR_REMOVE_SFRAME 80
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 8b58eece4e58..ea088b643eb4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -65,6 +65,7 @@
#include <linux/rcupdate.h>
#include <linux/uidgid.h>
#include <linux/cred.h>
+#include <linux/sframe.h>
#include <linux/nospec.h>
@@ -2868,6 +2869,14 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_FUTEX_HASH:
error = futex_hash_prctl(arg2, arg3, arg4);
break;
+ case PR_ADD_SFRAME:
+ error = sframe_add_section(arg2, arg3, arg4, arg5);
+ break;
+ case PR_REMOVE_SFRAME:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = sframe_remove_section(arg2);
+ break;
default:
trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5);
error = -EINVAL;
--
2.51.0
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (17 preceding siblings ...)
2026-01-27 15:05 ` [PATCH v13 18/18] unwind_user/sframe: Add prctl() interface for registering .sframe sections Jens Remus
@ 2026-02-05 18:26 ` Namhyung Kim
2026-02-05 18:54 ` Steven Rostedt
2026-02-11 1:47 ` Dylan Hatch
19 siblings, 1 reply; 25+ messages in thread
From: Namhyung Kim @ 2026-02-05 18:26 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Thomas Gleixner, Andrii Nakryiko,
Indu Bhagat, Jose E. Marchesi, Beau Belgrave, Linus Torvalds,
Andrew Morton, Florian Weimer, Kees Cook, Carlos O'Donell,
Sam James, Dylan Hatch, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
Hello,
On Tue, Jan 27, 2026 at 04:05:35PM +0100, Jens Remus wrote:
> This is the implementation of parsing the SFrame V3 stack trace information
> from an .sframe section in an ELF file. It's a continuation of Josh's and
> Steve's work that can be found here:
>
> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>
> Currently the only way to get a user space stack trace from a stack
> walk (and not just copying large amount of user stack into the kernel
> ring buffer) is to use frame pointers. This has a few issues. The biggest
> one is that compiling frame pointers into every application and library
> has been shown to cause performance overhead.
>
> Another issue is that the format of the frames may not always be consistent
> between different compilers and some architectures (s390) has no defined
> format to do a reliable stack walk. The only way to perform user space
> profiling on these architectures is to copy the user stack into the kernel
> buffer.
>
> SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
> discussions going on about supporting SFrame in LLVM. SFrame acts more like
> ORC, and lives in the ELF executable file as its own section. Like ORC it
> has two tables where the first table is sorted by instruction pointers (IP)
> and using the current IP and finding it's entry in the first table, it will
> take you to the second table which will tell you where the return address
> of the current function is located and then you can use that address to
> look it up in the first table to find the return address of that function,
> and so on. This performs a user space stack walk.
>
> Now because the .sframe section lives in the ELF file it needs to be faulted
> into memory when it is used. This means that walking the user space stack
> requires being in a faultable context. As profilers like perf request a stack
> trace in interrupt or NMI context, it cannot do the walking when it is
> requested. Instead it must be deferred until it is safe to fault in user
> space. One place this is known to be safe is when the task is about to return
> back to user space.
>
> This series makes the deferred unwind user code implement SFrame format V3
> and enables it on x86-64.
>
> [1]: https://sourceware.org/binutils/wiki/sframe
>
>
> This series applies on top of the tip perf/core branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>
> The to be stack-traced user space programs (and libraries) need to be
> built with the recent SFrame stack trace information format V3, as
> generated by the upcoming binutils 2.46 with assembler option --gsframe.
> It can be built from source from the binutils-2_46-branch branch:
>
> git://sourceware.org/git/binutils-gdb.git binutils-2_46-branch
>
> Namhyung Kim's related perf tools deferred callchain support can be used
> for testing ("perf record --call-graph fp,defer" and "perf report/script").
Is it possible for users to choose the unwinder - frame pointer or
SFrame at runtime? I feel like the option should be
"--call-graph sframe,defer" or just "--call-graph sframe" if it always
uses deferred unwinding.
Thanks,
Namhyung
>
>
> Changes since v12 (see patch notes for details):
> - Rebase on tip perf/core branch (d55c571e4333).
> - Add support for SFrame V3, including its new flexible FDEs. SFrame V2
> is not supported.
>
> Changes since v11 (see patch notes for details):
> - Rebase on tip master branch (f8fdee44bf2f) with Namhyung Kim's
> perf/defer-callchain-v4 branch merged on top.
> - Adjust to Peter's latest undwind user enhancements.
> - Simplify logic by using an internal SFrame FDE representation, whose
> FDE function start address field is an address instead of a PC-relative
> offset (from FDE).
> - Rename struct sframe_fre to sframe_fre_internal to align with
> struct sframe_fde_internal.
> - Remove unused pt_regs from unwind_user_next_common() and its
> callers. (Peter)
> - Simplify unwind_user_next_sframe(). (Peter)
> - Fix a few checkpatch errors and warnings.
> - Minor cleanups (e.g. move includes, fix indentation).
>
> Changes since v10:
> - Support for SFrame V2 PC-relative FDE function start address.
> - Support for SFrame V2 representing RA undefined as indication for
> outermost frames.
>
>
> Patches 1, 4, 11, and 17 have been updated to exclusively support the
> latest SFrame V3 stack trace information format, that is generated by
> the upcoming binutils 2.46 release. Old SFrame V2 sections get rejected
> with dynamic debug message "bad/unsupported sframe header".
>
> Patches 7 and 8 add support to unwind user (sframe) for outermost frames.
>
> Patches 12-15 add support to unwind user (sframe) for the new SFrame V3
> flexible FDEs.
>
> Patch 16 improves the performance of searching the SFrame FRE for an IP.
>
> Regards,
> Jens
>
>
> Jens Remus (7):
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
> unwind_user: Enable archs that pass RA in a register
> unwind_user: Flexible FP/RA recovery rules
> unwind_user: Flexible CFA recovery rules
> unwind_user/sframe: Add support for SFrame V3 flexible FDEs
> unwind_user/sframe: Separate reading of FRE from reading of FRE data
> words
>
> Josh Poimboeuf (11):
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Wire up unwind_user to sframe
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user/sframe: Add prctl() interface for registering .sframe
> sections
>
> MAINTAINERS | 1 +
> arch/Kconfig | 23 +
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/mmu.h | 2 +-
> arch/x86/include/asm/uaccess.h | 39 +-
> arch/x86/include/asm/unwind_user.h | 69 +-
> arch/x86/include/asm/unwind_user_sframe.h | 12 +
> fs/binfmt_elf.c | 48 +-
> include/linux/mm_types.h | 3 +
> include/linux/sframe.h | 60 ++
> include/linux/unwind_user.h | 18 +
> include/linux/unwind_user_types.h | 46 +-
> include/uapi/linux/elf.h | 1 +
> include/uapi/linux/prctl.h | 6 +-
> kernel/fork.c | 10 +
> kernel/sys.c | 9 +
> kernel/unwind/Makefile | 3 +-
> kernel/unwind/sframe.c | 840 ++++++++++++++++++++++
> kernel/unwind/sframe.h | 87 +++
> kernel/unwind/sframe_debug.h | 68 ++
> kernel/unwind/user.c | 105 ++-
> mm/init-mm.c | 2 +
> 22 files changed, 1414 insertions(+), 39 deletions(-)
> create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
> create mode 100644 include/linux/sframe.h
> create mode 100644 kernel/unwind/sframe.c
> create mode 100644 kernel/unwind/sframe.h
> create mode 100644 kernel/unwind/sframe_debug.h
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
2026-02-05 18:26 ` [PATCH v13 00/18] unwind_deferred: Implement sframe handling Namhyung Kim
@ 2026-02-05 18:54 ` Steven Rostedt
2026-02-09 17:07 ` Jens Remus
0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2026-02-05 18:54 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jens Remus, linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
On Thu, 5 Feb 2026 10:26:10 -0800
Namhyung Kim <namhyung@kernel.org> wrote:
> > Namhyung Kim's related perf tools deferred callchain support can be used
> > for testing ("perf record --call-graph fp,defer" and "perf report/script").
>
> Is it possible for users to choose the unwinder - frame pointer or
> SFrame at runtime? I feel like the option should be
> "--call-graph sframe,defer" or just "--call-graph sframe" if it always
> uses deferred unwinding.
Currently no, and I'm not sure we want that do we? The idea is to use the
best option that is available. Why use frame pointers if sframe is
available and it's being called with defer?
If there's no defer, then sframes are not available, so it defaults to the
best option available (which will likely be frame pointers).
-- Steve
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
2026-02-05 18:54 ` Steven Rostedt
@ 2026-02-09 17:07 ` Jens Remus
2026-02-11 0:17 ` Namhyung Kim
0 siblings, 1 reply; 25+ messages in thread
From: Jens Remus @ 2026-02-09 17:07 UTC (permalink / raw)
To: Steven Rostedt, Namhyung Kim
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
On 2/5/2026 7:54 PM, Steven Rostedt wrote:
> On Thu, 5 Feb 2026 10:26:10 -0800
> Namhyung Kim <namhyung@kernel.org> wrote:
>
>>> Namhyung Kim's related perf tools deferred callchain support can be used
>>> for testing ("perf record --call-graph fp,defer" and "perf report/script").
>>
>> Is it possible for users to choose the unwinder - frame pointer or
>> SFrame at runtime? I feel like the option should be
>> "--call-graph sframe,defer" or just "--call-graph sframe" if it always
>> uses deferred unwinding.
>
> Currently no, and I'm not sure we want that do we? The idea is to use the
> best option that is available. Why use frame pointers if sframe is
> available and it's being called with defer?
>
> If there's no defer, then sframes are not available, so it defaults to the
> best option available (which will likely be frame pointers).
Maybe it would make sense not to "overload" the perf record option
"--call-graph fp,defer" and use it for all deferred unwinding methods.
What about "--call-graph defer", "--call-graph any,defer", or
"--call-graph *,defer"?
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
2026-02-09 17:07 ` Jens Remus
@ 2026-02-11 0:17 ` Namhyung Kim
0 siblings, 0 replies; 25+ messages in thread
From: Namhyung Kim @ 2026-02-11 0:17 UTC (permalink / raw)
To: Jens Remus
Cc: Steven Rostedt, linux-kernel, linux-trace-kernel, bpf, x86,
linux-mm, Josh Poimboeuf, Masami Hiramatsu, Mathieu Desnoyers,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Thomas Gleixner, Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi,
Beau Belgrave, Linus Torvalds, Andrew Morton, Florian Weimer,
Kees Cook, Carlos O'Donell, Sam James, Dylan Hatch,
Borislav Petkov, Dave Hansen, David Hildenbrand, H. Peter Anvin,
Liam R. Howlett, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Heiko Carstens,
Vasily Gorbik
Hello,
On Mon, Feb 09, 2026 at 06:07:59PM +0100, Jens Remus wrote:
> On 2/5/2026 7:54 PM, Steven Rostedt wrote:
> > On Thu, 5 Feb 2026 10:26:10 -0800
> > Namhyung Kim <namhyung@kernel.org> wrote:
> >
> >>> Namhyung Kim's related perf tools deferred callchain support can be used
> >>> for testing ("perf record --call-graph fp,defer" and "perf report/script").
> >>
> >> Is it possible for users to choose the unwinder - frame pointer or
> >> SFrame at runtime? I feel like the option should be
> >> "--call-graph sframe,defer" or just "--call-graph sframe" if it always
> >> uses deferred unwinding.
> >
> > Currently no, and I'm not sure we want that do we? The idea is to use the
> > best option that is available. Why use frame pointers if sframe is
> > available and it's being called with defer?
> >
> > If there's no defer, then sframes are not available, so it defaults to the
> > best option available (which will likely be frame pointers).
Users (me, at least) may want to compare stacktraces from FP and SFrame?
>
> Maybe it would make sense not to "overload" the perf record option
> "--call-graph fp,defer" and use it for all deferred unwinding methods.
>
> What about "--call-graph defer", "--call-graph any,defer", or
> "--call-graph *,defer"?
Sounds better. But I think it cannot enforce "--call-graph fp,defer" to
use frame pointers when SFrame is available.. Hmm.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
2026-01-27 15:05 [PATCH v13 00/18] unwind_deferred: Implement sframe handling Jens Remus
` (18 preceding siblings ...)
2026-02-05 18:26 ` [PATCH v13 00/18] unwind_deferred: Implement sframe handling Namhyung Kim
@ 2026-02-11 1:47 ` Dylan Hatch
2026-02-11 16:15 ` Jens Remus
19 siblings, 1 reply; 25+ messages in thread
From: Dylan Hatch @ 2026-02-11 1:47 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
On Tue, Jan 27, 2026 at 7:06 AM Jens Remus <jremus@linux.ibm.com> wrote:
>
> This is the implementation of parsing the SFrame V3 stack trace information
> from an .sframe section in an ELF file. It's a continuation of Josh's and
> Steve's work that can be found here:
>
> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>
> Currently the only way to get a user space stack trace from a stack
> walk (and not just copying large amount of user stack into the kernel
> ring buffer) is to use frame pointers. This has a few issues. The biggest
> one is that compiling frame pointers into every application and library
> has been shown to cause performance overhead.
>
> Another issue is that the format of the frames may not always be consistent
> between different compilers and some architectures (s390) has no defined
> format to do a reliable stack walk. The only way to perform user space
> profiling on these architectures is to copy the user stack into the kernel
> buffer.
>
> SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
> discussions going on about supporting SFrame in LLVM. SFrame acts more like
> ORC, and lives in the ELF executable file as its own section. Like ORC it
> has two tables where the first table is sorted by instruction pointers (IP)
> and using the current IP and finding it's entry in the first table, it will
> take you to the second table which will tell you where the return address
> of the current function is located and then you can use that address to
> look it up in the first table to find the return address of that function,
> and so on. This performs a user space stack walk.
>
> Now because the .sframe section lives in the ELF file it needs to be faulted
> into memory when it is used. This means that walking the user space stack
> requires being in a faultable context. As profilers like perf request a stack
> trace in interrupt or NMI context, it cannot do the walking when it is
> requested. Instead it must be deferred until it is safe to fault in user
> space. One place this is known to be safe is when the task is about to return
> back to user space.
>
> This series makes the deferred unwind user code implement SFrame format V3
> and enables it on x86-64.
>
> [1]: https://sourceware.org/binutils/wiki/sframe
>
>
> This series applies on top of the tip perf/core branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>
> The to be stack-traced user space programs (and libraries) need to be
> built with the recent SFrame stack trace information format V3, as
> generated by the upcoming binutils 2.46 with assembler option --gsframe.
> It can be built from source from the binutils-2_46-branch branch:
>
> git://sourceware.org/git/binutils-gdb.git binutils-2_46-branch
>
> Namhyung Kim's related perf tools deferred callchain support can be used
> for testing ("perf record --call-graph fp,defer" and "perf report/script").
>
>
> Changes since v12 (see patch notes for details):
> - Rebase on tip perf/core branch (d55c571e4333).
> - Add support for SFrame V3, including its new flexible FDEs. SFrame V2
> is not supported.
>
> Changes since v11 (see patch notes for details):
> - Rebase on tip master branch (f8fdee44bf2f) with Namhyung Kim's
> perf/defer-callchain-v4 branch merged on top.
> - Adjust to Peter's latest undwind user enhancements.
> - Simplify logic by using an internal SFrame FDE representation, whose
> FDE function start address field is an address instead of a PC-relative
> offset (from FDE).
> - Rename struct sframe_fre to sframe_fre_internal to align with
> struct sframe_fde_internal.
> - Remove unused pt_regs from unwind_user_next_common() and its
> callers. (Peter)
> - Simplify unwind_user_next_sframe(). (Peter)
> - Fix a few checkpatch errors and warnings.
> - Minor cleanups (e.g. move includes, fix indentation).
>
> Changes since v10:
> - Support for SFrame V2 PC-relative FDE function start address.
> - Support for SFrame V2 representing RA undefined as indication for
> outermost frames.
>
>
> Patches 1, 4, 11, and 17 have been updated to exclusively support the
> latest SFrame V3 stack trace information format, that is generated by
> the upcoming binutils 2.46 release. Old SFrame V2 sections get rejected
> with dynamic debug message "bad/unsupported sframe header".
>
> Patches 7 and 8 add support to unwind user (sframe) for outermost frames.
>
> Patches 12-15 add support to unwind user (sframe) for the new SFrame V3
> flexible FDEs.
>
> Patch 16 improves the performance of searching the SFrame FRE for an IP.
>
> Regards,
> Jens
>
>
> Jens Remus (7):
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
> unwind_user: Enable archs that pass RA in a register
> unwind_user: Flexible FP/RA recovery rules
> unwind_user: Flexible CFA recovery rules
> unwind_user/sframe: Add support for SFrame V3 flexible FDEs
> unwind_user/sframe: Separate reading of FRE from reading of FRE data
> words
>
> Josh Poimboeuf (11):
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Wire up unwind_user to sframe
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user/sframe: Add prctl() interface for registering .sframe
> sections
>
> MAINTAINERS | 1 +
> arch/Kconfig | 23 +
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/mmu.h | 2 +-
> arch/x86/include/asm/uaccess.h | 39 +-
> arch/x86/include/asm/unwind_user.h | 69 +-
> arch/x86/include/asm/unwind_user_sframe.h | 12 +
> fs/binfmt_elf.c | 48 +-
> include/linux/mm_types.h | 3 +
> include/linux/sframe.h | 60 ++
> include/linux/unwind_user.h | 18 +
> include/linux/unwind_user_types.h | 46 +-
> include/uapi/linux/elf.h | 1 +
> include/uapi/linux/prctl.h | 6 +-
> kernel/fork.c | 10 +
> kernel/sys.c | 9 +
> kernel/unwind/Makefile | 3 +-
> kernel/unwind/sframe.c | 840 ++++++++++++++++++++++
> kernel/unwind/sframe.h | 87 +++
> kernel/unwind/sframe_debug.h | 68 ++
> kernel/unwind/user.c | 105 ++-
> mm/init-mm.c | 2 +
> 22 files changed, 1414 insertions(+), 39 deletions(-)
> create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
> create mode 100644 include/linux/sframe.h
> create mode 100644 kernel/unwind/sframe.c
> create mode 100644 kernel/unwind/sframe.h
> create mode 100644 kernel/unwind/sframe_debug.h
>
> --
> 2.51.0
>
Hi Jens,
Do you by chance have this work uploaded in a public branch somewhere?
I'd like to get a new version of the SFrame for reliable stacktrace on
arm64 patch series [1] working for SFrame V3, ideally with the SFrame
library in your patch series here.
https://lore.kernel.org/lkml/20250904223850.884188-1-dylanbhatch@google.com/
Thanks,
Dylan
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
2026-02-11 1:47 ` Dylan Hatch
@ 2026-02-11 16:15 ` Jens Remus
0 siblings, 0 replies; 25+ messages in thread
From: Jens Remus @ 2026-02-11 16:15 UTC (permalink / raw)
To: Dylan Hatch
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Indu Bhagat, Jose E. Marchesi, Beau Belgrave,
Linus Torvalds, Andrew Morton, Florian Weimer, Kees Cook,
Carlos O'Donell, Sam James, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik
On 2/11/2026 2:47 AM, Dylan Hatch wrote:
> On Tue, Jan 27, 2026 at 7:06 AM Jens Remus <jremus@linux.ibm.com> wrote:
>>
>> This is the implementation of parsing the SFrame V3 stack trace information
>> from an .sframe section in an ELF file. It's a continuation of Josh's and
>> Steve's work that can be found here:
>>
>> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
>> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>>
>> Currently the only way to get a user space stack trace from a stack
>> walk (and not just copying large amount of user stack into the kernel
>> ring buffer) is to use frame pointers. This has a few issues. The biggest
>> one is that compiling frame pointers into every application and library
>> has been shown to cause performance overhead.
>>
>> Another issue is that the format of the frames may not always be consistent
>> between different compilers and some architectures (s390) has no defined
>> format to do a reliable stack walk. The only way to perform user space
>> profiling on these architectures is to copy the user stack into the kernel
>> buffer.
>>
>> SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
>> discussions going on about supporting SFrame in LLVM. SFrame acts more like
>> ORC, and lives in the ELF executable file as its own section. Like ORC it
>> has two tables where the first table is sorted by instruction pointers (IP)
>> and using the current IP and finding it's entry in the first table, it will
>> take you to the second table which will tell you where the return address
>> of the current function is located and then you can use that address to
>> look it up in the first table to find the return address of that function,
>> and so on. This performs a user space stack walk.
>>
>> Now because the .sframe section lives in the ELF file it needs to be faulted
>> into memory when it is used. This means that walking the user space stack
>> requires being in a faultable context. As profilers like perf request a stack
>> trace in interrupt or NMI context, it cannot do the walking when it is
>> requested. Instead it must be deferred until it is safe to fault in user
>> space. One place this is known to be safe is when the task is about to return
>> back to user space.
>>
>> This series makes the deferred unwind user code implement SFrame format V3
>> and enables it on x86-64.
>>
>> [1]: https://sourceware.org/binutils/wiki/sframe
>>
>>
>> This series applies on top of the tip perf/core branch:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>>
>> The to be stack-traced user space programs (and libraries) need to be
>> built with the recent SFrame stack trace information format V3, as
>> generated by the upcoming binutils 2.46 with assembler option --gsframe.
>> It can be built from source from the binutils-2_46-branch branch:
>>
>> git://sourceware.org/git/binutils-gdb.git binutils-2_46-branch
> Do you by chance have this work uploaded in a public branch somewhere?
> I'd like to get a new version of the SFrame for reliable stacktrace on
> arm64 patch series [1] working for SFrame V3, ideally with the SFrame
> library in your patch series here.
>
> https://lore.kernel.org/lkml/20250904223850.884188-1-dylanbhatch@google.com/
No, I don't. Following is how you can easily get to tip:perf/core with
this work applied using b4:
$ git checkout -b sframe v6.18
$ git pull --no-rebase git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core # pre-req for [PATCH v13 00/18] unwind_deferred: Implement sframe handling
$ b4 shazam -T 20260127150554.2760964-1-jremus@linux.ibm.com # [PATCH v13 00/18] unwind_deferred: Implement sframe handling
Following is how you can get to a tervolds:master with all of the latest
sframe related series on top using b4:
$ git fetch git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master:sframe
$ git checkout sframe
$ b4 shazam -T 20260127150554.2760964-1-jremus@linux.ibm.com # [PATCH v13 00/18] unwind_deferred: Implement sframe handling
$ git am -3 # resolve "unwind_user/sframe: Store .sframe section data in per-mm maple tree"
$ git am -3 # partially resolve "unwind_user/sframe: Add prctl() interface for registering .sframe sections"
$ git mergetool # manually resolve prctl.h and kernel/sys.c conflicts; bump PR_ADD_SFRAME and PR_REMOVE_SFRAME; each case must end with break;
$ git am --continue
$ b4 shazam -T 20260211141357.271402-1-jremus@linux.ibm.com # optional [PATCH v9 0/6] x86/vdso: VDSO updates and fixes for sframes
$ b4 shazam -T 20260127151926.2805123-1-jremus@linux.ibm.com # optional [PATCH v4 00/12] s390: SFrame user space unwinding
$ git am -3 # partially resolve "s390/vdso: Enable SFrame V3 generation in vDSO"
$ git mergetool # manually resolve arch/s390/kernel/vdso/Makefile conflict
$ git am --continue
$ git am -3 # resolve "s390/ptrace: Convert function macros to inline functions"
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 25+ messages in thread