* [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE
@ 2024-08-02 6:13 Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions Aruna Ramakrishna
` (4 more replies)
0 siblings, 5 replies; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-08-02 6:13 UTC (permalink / raw)
To: linux-kernel
Cc: x86, dave.hansen, tglx, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
v8 updates:
- Edited changelog for better readability
v7 updates:
- Rebased patchset to v6.11.0-rc1
- Removed ia32 and x32 function interface changes as PKRU is only
supported in 64-bit mode, as suggested by Jeff Xu
v6 updates:
- Rebased patchset to v6.10.0-rc5
- Changed sig_prepare_pkru() back to enabling all pkeys, based on
discussion with Jeff Xu
v5 updates:
- No major changes, mostly a resend of v4 - except for updating the
commit description for patch 5/5
v4 updates (based on review feedback from Thomas Gleixner):
- Simplified update_pkru_in_sigframe()
- Changed sigpkru to enable minimally required keys (init_pkru and
current pkru)
- Modified pkey_sighandler_tests.c to use kselfttest framework
- Fixed commit descriptions
- Fixed sigreturn use case (pointed out by Jeff Xu)
- Added a new sigreturn test case
v3 updates (based on review feedback from Ingo Molnar and Dave Hansen):
- Split the original patch into 3:
- function interface changes
- helper functions
- functional change to write pkru on sigframe
- Enabled all pkeys before XSAVE - i.e. wrpkru(0), rather than assuming
that the alt sig stack is always protected by pkey 0.
- Added a few test cases in pkey_sighandler_tests.c.
I had some trouble adding these tests to
tools/testing/selftests/mm/protection_keys.c, so they're in a separate
file.
Aruna Ramakrishna (4):
x86/pkeys: Add PKRU as a parameter in signal handling functions
x86/pkeys: Add helper functions to update PKRU on the sigframe
x86/pkeys: Update PKRU to enable all pkeys before XSAVE
x86/pkeys: Restore altstack access in sigreturn()
Keith Lucas (1):
selftests/mm: Add new testcases for pkeys
arch/x86/include/asm/fpu/signal.h | 2 +-
arch/x86/kernel/fpu/signal.c | 27 +-
arch/x86/kernel/fpu/xstate.c | 13 +
arch/x86/kernel/fpu/xstate.h | 2 +
arch/x86/kernel/signal.c | 29 +-
arch/x86/kernel/signal_64.c | 6 +-
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/pkey-helpers.h | 11 +-
.../selftests/mm/pkey_sighandler_tests.c | 479 ++++++++++++++++++
tools/testing/selftests/mm/protection_keys.c | 10 -
10 files changed, 560 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/mm/pkey_sighandler_tests.c
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
prerequisite-patch-id: 7b2144970da943b19b53baab525a0fc653fcd7b8
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions
2024-08-02 6:13 [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
@ 2024-08-02 6:13 ` Aruna Ramakrishna
2024-08-09 17:16 ` Jeff Xu
2024-08-02 6:13 ` [PATCH v8 2/5] x86/pkeys: Add helper functions to update PKRU on the sigframe Aruna Ramakrishna
` (3 subsequent siblings)
4 siblings, 1 reply; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-08-02 6:13 UTC (permalink / raw)
To: linux-kernel
Cc: x86, dave.hansen, tglx, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
Problem description:
Let's assume there's a multithreaded application that runs untrusted
user code. Each thread has its stack/code protected by a non-zero pkey,
and the PKRU register is set up such that only that particular non-zero
pkey is enabled. Each thread also sets up an alternate signal stack to
handle signals, which is protected by pkey zero. The pkeys man page
documents that the PKRU will be reset to init_pkru when the signal
handler is invoked, which means that pkey zero access will be enabled.
But this reset happens after the kernel attempts to push fpu state
to the alternate stack, which is not (yet) accessible by the kernel,
which leads to a new SIGSEGV being sent to the application, terminating
it.
Enabling both the non-zero pkey (for the thread) and pkey zero in
userspace will not work for this use case. We cannot have the alt stack
writeable by all - the rationale here is that the code running in that
thread (using a non-zero pkey) is untrusted and should not have access
to the alternate signal stack (that uses pkey zero), to prevent the
return address of a function from being changed. The expectation is that
kernel should be able to set up the alternate signal stack and deliver
the signal to the application even if pkey zero is explicitly disabled
by the application. The signal handler accessibility should not be
dictated by whatever PKRU value the thread sets up.
Solution:
The PKRU register is managed by XSAVE, which means the sigframe contents
must match the register contents - which is not the case here. We want
the sigframe to contain the user-defined PKRU value (so that it is
restored correctly from sigcontext) but the actual register must be
reset to init_pkru so that the alt stack is accessible and the signal
can be delivered to the application. It seems that the proper fix here
would be to remove PKRU from the XSAVE framework and manage it
separately, which is quite complicated. As a workaround, do this:
orig_pkru = rdpkru();
wrpkru(orig_pkru & init_pkru_value);
xsave_to_user_sigframe();
put_user(pkru_sigframe_addr, orig_pkru)
This change is split over multiple patches.
In preparation for writing PKRU to sigframe in a later patch, pass in
PKRU as an additional parameter down the chain from get_sigframe():
get_sigframe()
copy_fpstate_to_sigframe()
copy_fpregs_to_sigframe()
There are no functional changes in this patch.
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
---
arch/x86/include/asm/fpu/signal.h | 2 +-
arch/x86/kernel/fpu/signal.c | 6 +++---
arch/x86/kernel/signal.c | 3 ++-
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
index 611fa41711af..eccc75bc9c4f 100644
--- a/arch/x86/include/asm/fpu/signal.h
+++ b/arch/x86/include/asm/fpu/signal.h
@@ -29,7 +29,7 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long fpu__get_fpstate_size(void);
-extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
+extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size, u32 pkru);
extern void fpu__clear_user_states(struct fpu *fpu);
extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 247f2225aa9f..2b3b9e140dd4 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -156,7 +156,7 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
return !err;
}
-static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
+static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
{
if (use_xsave())
return xsave_to_user_sigframe(buf);
@@ -185,7 +185,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
* For [f]xsave state, update the SW reserved fields in the [f]xsave frame
* indicating the absence/presence of the extended state to the user.
*/
-bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
+bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size, u32 pkru)
{
struct task_struct *tsk = current;
struct fpstate *fpstate = tsk->thread.fpu.fpstate;
@@ -228,7 +228,7 @@ bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
fpregs_restore_userregs();
pagefault_disable();
- ret = copy_fpregs_to_sigframe(buf_fx);
+ ret = copy_fpregs_to_sigframe(buf_fx, pkru);
pagefault_enable();
fpregs_unlock();
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 31b6f5dddfc2..1f1e8e0ac5a3 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -84,6 +84,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
unsigned long math_size = 0;
unsigned long sp = regs->sp;
unsigned long buf_fx = 0;
+ u32 pkru = read_pkru();
/* redzone */
if (!ia32_frame)
@@ -139,7 +140,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
}
/* save i387 and extended state */
- if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size))
+ if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru))
return (void __user *)-1L;
return (void __user *)sp;
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v8 2/5] x86/pkeys: Add helper functions to update PKRU on the sigframe
2024-08-02 6:13 [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions Aruna Ramakrishna
@ 2024-08-02 6:13 ` Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
` (2 subsequent siblings)
4 siblings, 0 replies; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-08-02 6:13 UTC (permalink / raw)
To: linux-kernel
Cc: x86, dave.hansen, tglx, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
In the case where a user thread sets up an alternate signal stack
protected by the default pkey (i.e. pkey 0), while the thread's stack
is protected by a non-zero pkey, both these pkeys have to be enabled in
the PKRU register for the signal to be delivered to the application
correctly. However, the PKRU value restored after handling the signal
must not enable this extra pkey (i.e. pkey 0) - i.e., the PKRU value on
the on the sigframe should be overwritten with the user-defined value.
Add helper functions that will update PKRU value on the sigframe after
XSAVE. These functions will be called in a later patch; this patch does
not change any behavior as yet.
Note that sig_prepare_pkru() makes no assumption about what pkey could
be used to protect the altstack (i.e. it may not be part of init_pkru),
and so enables all pkeys.
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
---
arch/x86/kernel/fpu/signal.c | 10 ++++++++++
arch/x86/kernel/fpu/xstate.c | 13 +++++++++++++
arch/x86/kernel/fpu/xstate.h | 2 ++
arch/x86/kernel/signal.c | 18 ++++++++++++++++++
4 files changed, 43 insertions(+)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 2b3b9e140dd4..931c5469d7f3 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -63,6 +63,16 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
return true;
}
+/*
+ * Update the value of PKRU register that was already pushed onto the signal frame.
+ */
+static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
+{
+ if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
+ return 0;
+ return __put_user(pkru, (unsigned int __user *)get_xsave_addr_user(buf, XFEATURE_PKRU));
+}
+
/*
* Signal frame handlers.
*/
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee5e0..fa7628bb541b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -993,6 +993,19 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
}
EXPORT_SYMBOL_GPL(get_xsave_addr);
+/*
+ * Given an xstate feature nr, calculate where in the xsave buffer the state is.
+ * The xsave buffer should be in standard format, not compacted (e.g. user mode
+ * signal frames).
+ */
+void __user *get_xsave_addr_user(struct xregs_state __user *xsave, int xfeature_nr)
+{
+ if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr)))
+ return NULL;
+
+ return (void __user *)xsave + xstate_offsets[xfeature_nr];
+}
+
#ifdef CONFIG_ARCH_HAS_PKEYS
/*
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 2ee0b9c53dcc..5f057e50df81 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -54,6 +54,8 @@ extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system_xstate(unsigned int legacy_size);
+extern void __user *get_xsave_addr_user(struct xregs_state __user *xsave, int xfeature_nr);
+
static inline u64 xfeatures_mask_supervisor(void)
{
return fpu_kernel_cfg.max_features & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 1f1e8e0ac5a3..9dc77ad03a0e 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -60,6 +60,24 @@ static inline int is_x32_frame(struct ksignal *ksig)
ksig->ka.sa.sa_flags & SA_X32_ABI;
}
+/*
+ * Enable all pkeys temporarily, so as to ensure that both the current
+ * execution stack as well as the alternate signal stack are writeable.
+ * The application can use any of the available pkeys to protect the
+ * alternate signal stack, and we don't know which one it is, so enable
+ * all. The PKRU register will be reset to init_pkru later in the flow,
+ * in fpu__clear_user_states(), and it is the application's responsibility
+ * to enable the appropriate pkey as the first step in the signal handler
+ * so that the handler does not segfault.
+ */
+static inline u32 sig_prepare_pkru(void)
+{
+ u32 orig_pkru = read_pkru();
+
+ write_pkru(0);
+ return orig_pkru;
+}
+
/*
* Set up a signal frame.
*/
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2024-08-02 6:13 [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 2/5] x86/pkeys: Add helper functions to update PKRU on the sigframe Aruna Ramakrishna
@ 2024-08-02 6:13 ` Aruna Ramakrishna
2024-08-09 17:30 ` Jeff Xu
2025-02-04 10:01 ` Dmitry Vyukov
2024-08-02 6:13 ` [PATCH v8 4/5] x86/pkeys: Restore altstack access in sigreturn() Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys Aruna Ramakrishna
4 siblings, 2 replies; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-08-02 6:13 UTC (permalink / raw)
To: linux-kernel
Cc: x86, dave.hansen, tglx, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
If the alternate signal stack is protected by a different pkey than the
current execution stack, copying xsave data to the sigaltstack will fail
if its pkey is not enabled in the PKRU register.
We do not know which pkey was used by the application for the altstack,
so enable all pkeys before xsave.
But this updated PKRU value is also pushed onto the sigframe, which
means the register value restored from sigcontext will be different from
the user-defined one, which is unexpected. Fix that by overwriting the
PKRU value on the sigframe with the original, user-defined PKRU.
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
---
arch/x86/kernel/fpu/signal.c | 11 +++++++++--
arch/x86/kernel/signal.c | 12 ++++++++++--
2 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 931c5469d7f3..1065ab995305 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -168,8 +168,15 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
{
- if (use_xsave())
- return xsave_to_user_sigframe(buf);
+ int err = 0;
+
+ if (use_xsave()) {
+ err = xsave_to_user_sigframe(buf);
+ if (!err)
+ err = update_pkru_in_sigframe(buf, pkru);
+ return err;
+ }
+
if (use_fxsr())
return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
else
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 9dc77ad03a0e..5f441039b572 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -102,7 +102,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
unsigned long math_size = 0;
unsigned long sp = regs->sp;
unsigned long buf_fx = 0;
- u32 pkru = read_pkru();
+ u32 pkru;
/* redzone */
if (!ia32_frame)
@@ -157,9 +157,17 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
return (void __user *)-1L;
}
+ /* Update PKRU to enable access to the alternate signal stack. */
+ pkru = sig_prepare_pkru();
/* save i387 and extended state */
- if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru))
+ if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru)) {
+ /*
+ * Restore PKRU to the original, user-defined value; disable
+ * extra pkeys enabled for the alternate signal stack, if any.
+ */
+ write_pkru(pkru);
return (void __user *)-1L;
+ }
return (void __user *)sp;
}
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v8 4/5] x86/pkeys: Restore altstack access in sigreturn()
2024-08-02 6:13 [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
` (2 preceding siblings ...)
2024-08-02 6:13 ` [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
@ 2024-08-02 6:13 ` Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys Aruna Ramakrishna
4 siblings, 0 replies; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-08-02 6:13 UTC (permalink / raw)
To: linux-kernel
Cc: x86, dave.hansen, tglx, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
A process can disable access to the alternate signal stack by not
enabling the altstack's pkey in the PKRU register. Nevertheless, the
kernel updates the PKRU temporarily for signal handling. However, in
sigreturn(), restore_sigcontext() will restore the PKRU to the
user-defined PKRU value. This will cause restore_altstack() to fail with
a SIGSEGV as it needs read access to the altstack which is prohibited
by the user-defined PKRU value.
Fix this by restoring altstack before restoring PKRU.
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
---
arch/x86/kernel/signal_64.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c
index 8a94053c5444..ee9453891901 100644
--- a/arch/x86/kernel/signal_64.c
+++ b/arch/x86/kernel/signal_64.c
@@ -260,13 +260,13 @@ SYSCALL_DEFINE0(rt_sigreturn)
set_current_blocked(&set);
- if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags))
+ if (restore_altstack(&frame->uc.uc_stack))
goto badframe;
- if (restore_signal_shadow_stack())
+ if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags))
goto badframe;
- if (restore_altstack(&frame->uc.uc_stack))
+ if (restore_signal_shadow_stack())
goto badframe;
return regs->ax;
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys
2024-08-02 6:13 [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
` (3 preceding siblings ...)
2024-08-02 6:13 ` [PATCH v8 4/5] x86/pkeys: Restore altstack access in sigreturn() Aruna Ramakrishna
@ 2024-08-02 6:13 ` Aruna Ramakrishna
2024-08-02 8:16 ` Thomas Gleixner
` (2 more replies)
4 siblings, 3 replies; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-08-02 6:13 UTC (permalink / raw)
To: linux-kernel
Cc: x86, dave.hansen, tglx, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
From: Keith Lucas <keith.lucas@oracle.com>
Add a few new tests to exercise the signal handler flow,
especially with pkey 0 disabled.
There are 5 new tests added:
- test_sigsegv_handler_with_pkey0_disabled
- test_sigsegv_handler_cannot_access_stack
- test_sigsegv_handler_with_different_pkey_for_stack
- test_pkru_preserved_after_sigusr1
- test_pkru_sigreturn
[ Aruna: Adapted to upstream ]
Signed-off-by: Keith Lucas <keith.lucas@oracle.com>
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
---
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/pkey-helpers.h | 11 +-
.../selftests/mm/pkey_sighandler_tests.c | 479 ++++++++++++++++++
tools/testing/selftests/mm/protection_keys.c | 10 -
4 files changed, 491 insertions(+), 11 deletions(-)
create mode 100644 tools/testing/selftests/mm/pkey_sighandler_tests.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 901e0d07765b..0123a3a0bb17 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -88,6 +88,7 @@ CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_pr
CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
VMTARGETS := protection_keys
+VMTARGETS := pkey_sighandler_tests
BINARIES_32 := $(VMTARGETS:%=%_32)
BINARIES_64 := $(VMTARGETS:%=%_64)
@@ -106,6 +107,7 @@ else
ifneq (,$(findstring $(ARCH),powerpc))
TEST_GEN_FILES += protection_keys
+TEST_GEN_FILES += pkey_sighandler_tests
endif
endif
diff --git a/tools/testing/selftests/mm/pkey-helpers.h b/tools/testing/selftests/mm/pkey-helpers.h
index 1af3156a9db8..2b1189c27167 100644
--- a/tools/testing/selftests/mm/pkey-helpers.h
+++ b/tools/testing/selftests/mm/pkey-helpers.h
@@ -12,6 +12,7 @@
#include <stdlib.h>
#include <ucontext.h>
#include <sys/mman.h>
+#include <linux/compiler.h>
#include "../kselftest.h"
@@ -79,7 +80,15 @@ extern void abort_hooks(void);
} \
} while (0)
-__attribute__((noinline)) int read_ptr(int *ptr);
+noinline int read_ptr(int *ptr)
+{
+ /*
+ * Keep GCC from optimizing this away somehow
+ */
+ barrier();
+ return *ptr;
+}
+
void expected_pkey_fault(int pkey);
int sys_pkey_alloc(unsigned long flags, unsigned long init_val);
int sys_pkey_free(unsigned long pkey);
diff --git a/tools/testing/selftests/mm/pkey_sighandler_tests.c b/tools/testing/selftests/mm/pkey_sighandler_tests.c
new file mode 100644
index 000000000000..c43030c7056d
--- /dev/null
+++ b/tools/testing/selftests/mm/pkey_sighandler_tests.c
@@ -0,0 +1,479 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tests Memory Protection Keys (see Documentation/core-api/protection-keys.rst)
+ *
+ * The testcases in this file exercise various flows related to signal handling,
+ * using an alternate signal stack, with the default pkey (pkey 0) disabled.
+ *
+ * Compile with:
+ * gcc -mxsave -o pkey_sighandler_tests -O2 -g -std=gnu99 -pthread -Wall pkey_sighandler_tests.c -I../../../../tools/include -lrt -ldl -lm
+ * gcc -mxsave -m32 -o pkey_sighandler_tests -O2 -g -std=gnu99 -pthread -Wall pkey_sighandler_tests.c -I../../../../tools/include -lrt -ldl -lm
+ */
+#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__
+#include <errno.h>
+#include <sys/syscall.h>
+#include <string.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <limits.h>
+
+#include "pkey-helpers.h"
+
+#define STACK_SIZE PTHREAD_STACK_MIN
+
+void expected_pkey_fault(int pkey) {}
+
+pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
+siginfo_t siginfo = {0};
+
+/*
+ * We need to use inline assembly instead of glibc's syscall because glibc's
+ * syscall will attempt to access the PLT in order to call a library function
+ * which is protected by MPK 0 which we don't have access to.
+ */
+static inline __always_inline
+long syscall_raw(long n, long a1, long a2, long a3, long a4, long a5, long a6)
+{
+ unsigned long ret;
+#ifdef __x86_64__
+ register long r10 asm("r10") = a4;
+ register long r8 asm("r8") = a5;
+ register long r9 asm("r9") = a6;
+ asm volatile ("syscall"
+ : "=a"(ret)
+ : "a"(n), "D"(a1), "S"(a2), "d"(a3), "r"(r10), "r"(r8), "r"(r9)
+ : "rcx", "r11", "memory");
+#elif defined __i386__
+ asm volatile ("int $0x80"
+ : "=a"(ret)
+ : "a"(n), "b"(a1), "c"(a2), "d"(a3), "S"(a4), "D"(a5)
+ : "memory");
+#endif
+ return ret;
+}
+
+static void sigsegv_handler(int signo, siginfo_t *info, void *ucontext)
+{
+ pthread_mutex_lock(&mutex);
+
+ memcpy(&siginfo, info, sizeof(siginfo_t));
+
+ pthread_cond_signal(&cond);
+ pthread_mutex_unlock(&mutex);
+
+ syscall_raw(SYS_exit, 0, 0, 0, 0, 0, 0);
+}
+
+static void sigusr1_handler(int signo, siginfo_t *info, void *ucontext)
+{
+ pthread_mutex_lock(&mutex);
+
+ memcpy(&siginfo, info, sizeof(siginfo_t));
+
+ pthread_cond_signal(&cond);
+ pthread_mutex_unlock(&mutex);
+}
+
+static void sigusr2_handler(int signo, siginfo_t *info, void *ucontext)
+{
+ /*
+ * pkru should be the init_pkru value which enabled MPK 0 so
+ * we can use library functions.
+ */
+ printf("%s invoked.\n", __func__);
+}
+
+static void raise_sigusr2(void)
+{
+ pid_t tid = 0;
+
+ tid = syscall_raw(SYS_gettid, 0, 0, 0, 0, 0, 0);
+
+ syscall_raw(SYS_tkill, tid, SIGUSR2, 0, 0, 0, 0);
+
+ /*
+ * We should return from the signal handler here and be able to
+ * return to the interrupted thread.
+ */
+}
+
+static void *thread_segv_with_pkey0_disabled(void *ptr)
+{
+ /* Disable MPK 0 (and all others too) */
+ __write_pkey_reg(0x55555555);
+
+ /* Segfault (with SEGV_MAPERR) */
+ *(int *) (0x1) = 1;
+ return NULL;
+}
+
+static void *thread_segv_pkuerr_stack(void *ptr)
+{
+ /* Disable MPK 0 (and all others too) */
+ __write_pkey_reg(0x55555555);
+
+ /* After we disable MPK 0, we can't access the stack to return */
+ return NULL;
+}
+
+static void *thread_segv_maperr_ptr(void *ptr)
+{
+ stack_t *stack = ptr;
+ int *bad = (int *)1;
+
+ /*
+ * Setup alternate signal stack, which should be pkey_mprotect()ed by
+ * MPK 0. The thread's stack cannot be used for signals because it is
+ * not accessible by the default init_pkru value of 0x55555554.
+ */
+ syscall_raw(SYS_sigaltstack, (long)stack, 0, 0, 0, 0, 0);
+
+ /* Disable MPK 0. Only MPK 1 is enabled. */
+ __write_pkey_reg(0x55555551);
+
+ /* Segfault */
+ *bad = 1;
+ syscall_raw(SYS_exit, 0, 0, 0, 0, 0, 0);
+ return NULL;
+}
+
+/*
+ * Verify that the sigsegv handler is invoked when pkey 0 is disabled.
+ * Note that the new thread stack and the alternate signal stack is
+ * protected by MPK 0.
+ */
+static void test_sigsegv_handler_with_pkey0_disabled(void)
+{
+ struct sigaction sa;
+ pthread_attr_t attr;
+ pthread_t thr;
+
+ sa.sa_flags = SA_SIGINFO;
+
+ sa.sa_sigaction = sigsegv_handler;
+ sigemptyset(&sa.sa_mask);
+ if (sigaction(SIGSEGV, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+
+ memset(&siginfo, 0, sizeof(siginfo));
+
+ pthread_attr_init(&attr);
+ pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+
+ pthread_create(&thr, &attr, thread_segv_with_pkey0_disabled, NULL);
+
+ pthread_mutex_lock(&mutex);
+ while (siginfo.si_signo == 0)
+ pthread_cond_wait(&cond, &mutex);
+ pthread_mutex_unlock(&mutex);
+
+ ksft_test_result(siginfo.si_signo == SIGSEGV &&
+ siginfo.si_code == SEGV_MAPERR &&
+ siginfo.si_addr == (void *)1,
+ "%s\n", __func__);
+}
+
+/*
+ * Verify that the sigsegv handler is invoked when pkey 0 is disabled.
+ * Note that the new thread stack and the alternate signal stack is
+ * protected by MPK 0, which renders them inaccessible when MPK 0
+ * is disabled. So just the return from the thread should cause a
+ * segfault with SEGV_PKUERR.
+ */
+static void test_sigsegv_handler_cannot_access_stack(void)
+{
+ struct sigaction sa;
+ pthread_attr_t attr;
+ pthread_t thr;
+
+ sa.sa_flags = SA_SIGINFO;
+
+ sa.sa_sigaction = sigsegv_handler;
+ sigemptyset(&sa.sa_mask);
+ if (sigaction(SIGSEGV, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+
+ memset(&siginfo, 0, sizeof(siginfo));
+
+ pthread_attr_init(&attr);
+ pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+
+ pthread_create(&thr, &attr, thread_segv_pkuerr_stack, NULL);
+
+ pthread_mutex_lock(&mutex);
+ while (siginfo.si_signo == 0)
+ pthread_cond_wait(&cond, &mutex);
+ pthread_mutex_unlock(&mutex);
+
+ ksft_test_result(siginfo.si_signo == SIGSEGV &&
+ siginfo.si_code == SEGV_PKUERR,
+ "%s\n", __func__);
+}
+
+/*
+ * Verify that the sigsegv handler that uses an alternate signal stack
+ * is correctly invoked for a thread which uses a non-zero MPK to protect
+ * its own stack, and disables all other MPKs (including 0).
+ */
+static void test_sigsegv_handler_with_different_pkey_for_stack(void)
+{
+ struct sigaction sa;
+ static stack_t sigstack;
+ void *stack;
+ int pkey;
+ int parent_pid = 0;
+ int child_pid = 0;
+
+ sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
+
+ sa.sa_sigaction = sigsegv_handler;
+
+ sigemptyset(&sa.sa_mask);
+ if (sigaction(SIGSEGV, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+
+ stack = mmap(0, STACK_SIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+ assert(stack != MAP_FAILED);
+
+ /* Allow access to MPK 0 and MPK 1 */
+ __write_pkey_reg(0x55555550);
+
+ /* Protect the new stack with MPK 1 */
+ pkey = pkey_alloc(0, 0);
+ pkey_mprotect(stack, STACK_SIZE, PROT_READ | PROT_WRITE, pkey);
+
+ /* Set up alternate signal stack that will use the default MPK */
+ sigstack.ss_sp = mmap(0, STACK_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ sigstack.ss_flags = 0;
+ sigstack.ss_size = STACK_SIZE;
+
+ memset(&siginfo, 0, sizeof(siginfo));
+
+ /* Use clone to avoid newer glibcs using rseq on new threads */
+ long ret = syscall_raw(SYS_clone,
+ CLONE_VM | CLONE_FS | CLONE_FILES |
+ CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM |
+ CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID |
+ CLONE_DETACHED,
+ (long) ((char *)(stack) + STACK_SIZE),
+ (long) &parent_pid,
+ (long) &child_pid, 0, 0);
+
+ if (ret < 0) {
+ errno = -ret;
+ perror("clone");
+ } else if (ret == 0) {
+ thread_segv_maperr_ptr(&sigstack);
+ syscall_raw(SYS_exit, 0, 0, 0, 0, 0, 0);
+ }
+
+ pthread_mutex_lock(&mutex);
+ while (siginfo.si_signo == 0)
+ pthread_cond_wait(&cond, &mutex);
+ pthread_mutex_unlock(&mutex);
+
+ ksft_test_result(siginfo.si_signo == SIGSEGV &&
+ siginfo.si_code == SEGV_MAPERR &&
+ siginfo.si_addr == (void *)1,
+ "%s\n", __func__);
+}
+
+/*
+ * Verify that the PKRU value set by the application is correctly
+ * restored upon return from signal handling.
+ */
+static void test_pkru_preserved_after_sigusr1(void)
+{
+ struct sigaction sa;
+ unsigned long pkru = 0x45454544;
+
+ sa.sa_flags = SA_SIGINFO;
+
+ sa.sa_sigaction = sigusr1_handler;
+ sigemptyset(&sa.sa_mask);
+ if (sigaction(SIGUSR1, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+
+ memset(&siginfo, 0, sizeof(siginfo));
+
+ __write_pkey_reg(pkru);
+
+ raise(SIGUSR1);
+
+ pthread_mutex_lock(&mutex);
+ while (siginfo.si_signo == 0)
+ pthread_cond_wait(&cond, &mutex);
+ pthread_mutex_unlock(&mutex);
+
+ /* Ensure the pkru value is the same after returning from signal. */
+ ksft_test_result(pkru == __read_pkey_reg() &&
+ siginfo.si_signo == SIGUSR1,
+ "%s\n", __func__);
+}
+
+static noinline void *thread_sigusr2_self(void *ptr)
+{
+ /*
+ * A const char array like "Resuming after SIGUSR2" won't be stored on
+ * the stack and the code could access it via an offset from the program
+ * counter. This makes sure it's on the function's stack frame.
+ */
+ char str[] = {'R', 'e', 's', 'u', 'm', 'i', 'n', 'g', ' ',
+ 'a', 'f', 't', 'e', 'r', ' ',
+ 'S', 'I', 'G', 'U', 'S', 'R', '2',
+ '.', '.', '.', '\n', '\0'};
+ stack_t *stack = ptr;
+
+ /*
+ * Setup alternate signal stack, which should be pkey_mprotect()ed by
+ * MPK 0. The thread's stack cannot be used for signals because it is
+ * not accessible by the default init_pkru value of 0x55555554.
+ */
+ syscall(SYS_sigaltstack, (long)stack, 0, 0, 0, 0, 0);
+
+ /* Disable MPK 0. Only MPK 2 is enabled. */
+ __write_pkey_reg(0x55555545);
+
+ raise_sigusr2();
+
+ /* Do something, to show the thread resumed execution after the signal */
+ syscall_raw(SYS_write, 1, (long) str, sizeof(str) - 1, 0, 0, 0);
+
+ /*
+ * We can't return to test_pkru_sigreturn because it
+ * will attempt to use a %rbp value which is on the stack
+ * of the main thread.
+ */
+ syscall_raw(SYS_exit, 0, 0, 0, 0, 0, 0);
+ return NULL;
+}
+
+/*
+ * Verify that sigreturn is able to restore altstack even if the thread had
+ * disabled pkey 0.
+ */
+static void test_pkru_sigreturn(void)
+{
+ struct sigaction sa = {0};
+ static stack_t sigstack;
+ void *stack;
+ int pkey;
+ int parent_pid = 0;
+ int child_pid = 0;
+
+ sa.sa_handler = SIG_DFL;
+ sa.sa_flags = 0;
+ sigemptyset(&sa.sa_mask);
+
+ /*
+ * For this testcase, we do not want to handle SIGSEGV. Reset handler
+ * to default so that the application can crash if it receives SIGSEGV.
+ */
+ if (sigaction(SIGSEGV, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+
+ sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
+ sa.sa_sigaction = sigusr2_handler;
+ sigemptyset(&sa.sa_mask);
+
+ if (sigaction(SIGUSR2, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+
+ stack = mmap(0, STACK_SIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+ assert(stack != MAP_FAILED);
+
+ /*
+ * Allow access to MPK 0 and MPK 2. The child thread (to be created
+ * later in this flow) will have its stack protected by MPK 2, whereas
+ * the current thread's stack is protected by the default MPK 0. Hence
+ * both need to be enabled.
+ */
+ __write_pkey_reg(0x55555544);
+
+ /* Protect the stack with MPK 2 */
+ pkey = pkey_alloc(0, 0);
+ pkey_mprotect(stack, STACK_SIZE, PROT_READ | PROT_WRITE, pkey);
+
+ /* Set up alternate signal stack that will use the default MPK */
+ sigstack.ss_sp = mmap(0, STACK_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ sigstack.ss_flags = 0;
+ sigstack.ss_size = STACK_SIZE;
+
+ /* Use clone to avoid newer glibcs using rseq on new threads */
+ long ret = syscall_raw(SYS_clone,
+ CLONE_VM | CLONE_FS | CLONE_FILES |
+ CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM |
+ CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID |
+ CLONE_DETACHED,
+ (long) ((char *)(stack) + STACK_SIZE),
+ (long) &parent_pid,
+ (long) &child_pid, 0, 0);
+
+ if (ret < 0) {
+ errno = -ret;
+ perror("clone");
+ } else if (ret == 0) {
+ thread_sigusr2_self(&sigstack);
+ syscall_raw(SYS_exit, 0, 0, 0, 0, 0, 0);
+ }
+
+ child_pid = ret;
+ /* Check that thread exited */
+ do {
+ sched_yield();
+ ret = syscall_raw(SYS_tkill, child_pid, 0, 0, 0, 0, 0);
+ } while (ret != -ESRCH && ret != -EINVAL);
+
+ ksft_test_result_pass("%s\n", __func__);
+}
+
+void (*pkey_tests[])(void) = {
+ test_sigsegv_handler_with_pkey0_disabled,
+ test_sigsegv_handler_cannot_access_stack,
+ test_sigsegv_handler_with_different_pkey_for_stack,
+ test_pkru_preserved_after_sigusr1,
+ test_pkru_sigreturn
+};
+
+int main(int argc, char *argv[])
+{
+ int i;
+
+ ksft_print_header();
+ ksft_set_plan(ARRAY_SIZE(pkey_tests));
+
+ for (i = 0; i < ARRAY_SIZE(pkey_tests); i++)
+ (*pkey_tests[i])();
+
+ ksft_finished();
+ return 0;
+}
diff --git a/tools/testing/selftests/mm/protection_keys.c b/tools/testing/selftests/mm/protection_keys.c
index eaa6d1fc5328..cc6de1644360 100644
--- a/tools/testing/selftests/mm/protection_keys.c
+++ b/tools/testing/selftests/mm/protection_keys.c
@@ -950,16 +950,6 @@ void close_test_fds(void)
nr_test_fds = 0;
}
-#define barrier() __asm__ __volatile__("": : :"memory")
-__attribute__((noinline)) int read_ptr(int *ptr)
-{
- /*
- * Keep GCC from optimizing this away somehow
- */
- barrier();
- return *ptr;
-}
-
void test_pkey_alloc_free_attach_pkey0(int *ptr, u16 pkey)
{
int i, err;
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys
2024-08-02 6:13 ` [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys Aruna Ramakrishna
@ 2024-08-02 8:16 ` Thomas Gleixner
2024-08-02 8:22 ` Thomas Gleixner
2024-08-02 12:04 ` Thomas Gleixner
2 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2024-08-02 8:16 UTC (permalink / raw)
To: Aruna Ramakrishna, linux-kernel
Cc: x86, dave.hansen, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
On Fri, Aug 02 2024 at 06:13, Aruna Ramakrishna wrote:
>
> ifneq (,$(findstring $(ARCH),powerpc))
> TEST_GEN_FILES += protection_keys
> +TEST_GEN_FILES += pkey_sighandler_tests
That can't work because the raw_syscall() inline won't do anything on
non x86, right?
No need to resend. I removed it already and added a build error into
raw_syscall() just in case that someone tries to enable it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys
2024-08-02 6:13 ` [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys Aruna Ramakrishna
2024-08-02 8:16 ` Thomas Gleixner
@ 2024-08-02 8:22 ` Thomas Gleixner
2024-08-02 12:04 ` Thomas Gleixner
2 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2024-08-02 8:22 UTC (permalink / raw)
To: Aruna Ramakrishna, linux-kernel
Cc: x86, dave.hansen, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
On Fri, Aug 02 2024 at 06:13, Aruna Ramakrishna wrote:
>
> -__attribute__((noinline)) int read_ptr(int *ptr);
> +noinline int read_ptr(int *ptr)
tools/include only defines noinline for gcc, so a clang build won't have
it ...
Sigh...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys
2024-08-02 6:13 ` [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys Aruna Ramakrishna
2024-08-02 8:16 ` Thomas Gleixner
2024-08-02 8:22 ` Thomas Gleixner
@ 2024-08-02 12:04 ` Thomas Gleixner
2 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2024-08-02 12:04 UTC (permalink / raw)
To: Aruna Ramakrishna, linux-kernel
Cc: x86, dave.hansen, mingo, linux-mm, keith.lucas, jeffxu,
rick.p.edgecombe, jorgelo, keescook, sroettger, jannh,
aruna.ramakrishna
On Fri, Aug 02 2024 at 06:13, Aruna Ramakrishna wrote:
>
> VMTARGETS := protection_keys
> +VMTARGETS := pkey_sighandler_tests
What builds protection_keys_32/64 now?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions
2024-08-02 6:13 ` [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions Aruna Ramakrishna
@ 2024-08-09 17:16 ` Jeff Xu
0 siblings, 0 replies; 20+ messages in thread
From: Jeff Xu @ 2024-08-09 17:16 UTC (permalink / raw)
To: Aruna Ramakrishna
Cc: linux-kernel, x86, dave.hansen, tglx, mingo, linux-mm,
keith.lucas, rick.p.edgecombe, jorgelo, keescook, sroettger,
jannh
Hi Aruna,
On Thu, Aug 1, 2024 at 11:13 PM Aruna Ramakrishna
<aruna.ramakrishna@oracle.com> wrote:
>
> Problem description:
> Let's assume there's a multithreaded application that runs untrusted
> user code. Each thread has its stack/code protected by a non-zero pkey,
> and the PKRU register is set up such that only that particular non-zero
> pkey is enabled. Each thread also sets up an alternate signal stack to
> handle signals, which is protected by pkey zero. The pkeys man page
> documents that the PKRU will be reset to init_pkru when the signal
> handler is invoked, which means that pkey zero access will be enabled.
> But this reset happens after the kernel attempts to push fpu state
> to the alternate stack, which is not (yet) accessible by the kernel,
> which leads to a new SIGSEGV being sent to the application, terminating
> it.
>
> Enabling both the non-zero pkey (for the thread) and pkey zero in
> userspace will not work for this use case. We cannot have the alt stack
> writeable by all - the rationale here is that the code running in that
> thread (using a non-zero pkey) is untrusted and should not have access
> to the alternate signal stack (that uses pkey zero), to prevent the
> return address of a function from being changed. The expectation is that
> kernel should be able to set up the alternate signal stack and deliver
> the signal to the application even if pkey zero is explicitly disabled
> by the application. The signal handler accessibility should not be
> dictated by whatever PKRU value the thread sets up.
>
While the above use case is correct, there are also other use cases
that can benefit from this patch:
Setup:
The thread's normal operation has RW permission to PKEY 0, and RO or
None to PKEY 1.
The Thread uses sigaltstack() to register a mapping which has PKEY1 assigned.
Before this patch:
When the kernel is dispatching the signal, the thread will get SIGSEGV
inside get_sigframe, because the thread either doesn't have read
access or write access to the mapping which is protected by PKEY1.
Because of SIGSEGV, the signal is not delivered to userspace.
After this patch:
There won't be SIGSEGV and the signal is delivered to userspace.
The scenario is useful to protect signal stack: Jann Horn had this
idea originally, and Chrome is planning to use it in V8 [1]. There
were discussions in the past to enable this feature [2]
Is that possible to include above in the cover letter to support this
patch series for upstream?
(upstream does like to see there are multiple user cases)
Thanks
-Jeff
[1] https://docs.google.com/document/d/1DjPhBq-5gRKtTeaknQDTWfvqRBCONmYqkU1I6k-3Ai8/edit?resourcekey=0-GGQta3_yhKqK7xV5SxIrVQ&tab=t.0
[2] https://lore.kernel.org/lkml/202208221331.71C50A6F@keescook/
> Solution:
> The PKRU register is managed by XSAVE, which means the sigframe contents
> must match the register contents - which is not the case here. We want
> the sigframe to contain the user-defined PKRU value (so that it is
> restored correctly from sigcontext) but the actual register must be
> reset to init_pkru so that the alt stack is accessible and the signal
> can be delivered to the application. It seems that the proper fix here
> would be to remove PKRU from the XSAVE framework and manage it
> separately, which is quite complicated. As a workaround, do this:
>
> orig_pkru = rdpkru();
> wrpkru(orig_pkru & init_pkru_value);
> xsave_to_user_sigframe();
> put_user(pkru_sigframe_addr, orig_pkru)
>
> This change is split over multiple patches.
>
> In preparation for writing PKRU to sigframe in a later patch, pass in
> PKRU as an additional parameter down the chain from get_sigframe():
> get_sigframe()
> copy_fpstate_to_sigframe()
> copy_fpregs_to_sigframe()
>
> There are no functional changes in this patch.
>
> Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> ---
> arch/x86/include/asm/fpu/signal.h | 2 +-
> arch/x86/kernel/fpu/signal.c | 6 +++---
> arch/x86/kernel/signal.c | 3 ++-
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/fpu/signal.h b/arch/x86/include/asm/fpu/signal.h
> index 611fa41711af..eccc75bc9c4f 100644
> --- a/arch/x86/include/asm/fpu/signal.h
> +++ b/arch/x86/include/asm/fpu/signal.h
> @@ -29,7 +29,7 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
>
> unsigned long fpu__get_fpstate_size(void);
>
> -extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
> +extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size, u32 pkru);
> extern void fpu__clear_user_states(struct fpu *fpu);
> extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
>
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 247f2225aa9f..2b3b9e140dd4 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -156,7 +156,7 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
> return !err;
> }
>
> -static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
> +static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
> {
> if (use_xsave())
> return xsave_to_user_sigframe(buf);
> @@ -185,7 +185,7 @@ static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
> * For [f]xsave state, update the SW reserved fields in the [f]xsave frame
> * indicating the absence/presence of the extended state to the user.
> */
> -bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
> +bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size, u32 pkru)
> {
> struct task_struct *tsk = current;
> struct fpstate *fpstate = tsk->thread.fpu.fpstate;
> @@ -228,7 +228,7 @@ bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
> fpregs_restore_userregs();
>
> pagefault_disable();
> - ret = copy_fpregs_to_sigframe(buf_fx);
> + ret = copy_fpregs_to_sigframe(buf_fx, pkru);
> pagefault_enable();
> fpregs_unlock();
>
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index 31b6f5dddfc2..1f1e8e0ac5a3 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -84,6 +84,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> unsigned long math_size = 0;
> unsigned long sp = regs->sp;
> unsigned long buf_fx = 0;
> + u32 pkru = read_pkru();
>
> /* redzone */
> if (!ia32_frame)
> @@ -139,7 +140,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> }
>
> /* save i387 and extended state */
> - if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size))
> + if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru))
> return (void __user *)-1L;
>
> return (void __user *)sp;
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2024-08-02 6:13 ` [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
@ 2024-08-09 17:30 ` Jeff Xu
2024-08-27 0:53 ` Jeff Xu
2025-02-04 10:01 ` Dmitry Vyukov
1 sibling, 1 reply; 20+ messages in thread
From: Jeff Xu @ 2024-08-09 17:30 UTC (permalink / raw)
To: Aruna Ramakrishna
Cc: linux-kernel, x86, dave.hansen, tglx, mingo, linux-mm,
keith.lucas, rick.p.edgecombe, jorgelo, keescook, sroettger,
jannh
On Thu, Aug 1, 2024 at 11:13 PM Aruna Ramakrishna
<aruna.ramakrishna@oracle.com> wrote:
>
> If the alternate signal stack is protected by a different pkey than the
> current execution stack, copying xsave data to the sigaltstack will fail
> if its pkey is not enabled in the PKRU register.
>
> We do not know which pkey was used by the application for the altstack,
> so enable all pkeys before xsave.
>
> But this updated PKRU value is also pushed onto the sigframe, which
> means the register value restored from sigcontext will be different from
> the user-defined one, which is unexpected. Fix that by overwriting the
> PKRU value on the sigframe with the original, user-defined PKRU.
>
> Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> ---
> arch/x86/kernel/fpu/signal.c | 11 +++++++++--
> arch/x86/kernel/signal.c | 12 ++++++++++--
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 931c5469d7f3..1065ab995305 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -168,8 +168,15 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
>
> static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
> {
> - if (use_xsave())
> - return xsave_to_user_sigframe(buf);
> + int err = 0;
> +
> + if (use_xsave()) {
> + err = xsave_to_user_sigframe(buf);
> + if (!err)
> + err = update_pkru_in_sigframe(buf, pkru);
> + return err;
> + }
> +
> if (use_fxsr())
> return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
> else
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index 9dc77ad03a0e..5f441039b572 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -102,7 +102,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> unsigned long math_size = 0;
> unsigned long sp = regs->sp;
> unsigned long buf_fx = 0;
> - u32 pkru = read_pkru();
> + u32 pkru;
>
> /* redzone */
> if (!ia32_frame)
> @@ -157,9 +157,17 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> return (void __user *)-1L;
> }
>
> + /* Update PKRU to enable access to the alternate signal stack. */
> + pkru = sig_prepare_pkru();
I think, the better place to call sig_prepare_pkru() at begging of the
get_sigframe:
get_sigframe()
{
/* ... */
if (ka->sa.sa_flags & SA_ONSTACK) {
if (sas_ss_flags(sp) == 0) {
// set pkru = 0 <--- set pkru = 0 here.
entering_altstack = true;
}
}
For two reasons:
- We probably don't want all signaling handling going through "PKRU=0"
, this includes the regular stack and nested signaling handling. The
best place is when "entering the altstack". IIUC, this feature only
enabled when sigaltstack() is used.
- The thread might not have read-access to the altstack, so we will
want to make sure that pkru=0 is set before any read to the altstack.
And IIRC, fpu__alloc_mathframe needs read-access to the altstack.
To help with testing, I will send a test case to demo the usage.
(please give me sometime to organize the test code, I'm hoping to send
out before the end of next week)
Also, could you please consider adding a new flag SS_PKEYALTSTACK (see
SS_AUTODISARM for example). Most existing apps that use sigaltstack()
don't use PKEY and don't care about setting PKRU=0, and don't need to
overwrite the sig frame after XSAVE. The flag will limit the impact
of this patch.
Thanks
Best regards,
-Jeff
> /* save i387 and extended state */
> - if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru))
> + if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru)) {
> + /*
> + * Restore PKRU to the original, user-defined value; disable
> + * extra pkeys enabled for the alternate signal stack, if any.
> + */
> + write_pkru(pkru);
> return (void __user *)-1L;
> + }
>
> return (void __user *)sp;
> }
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2024-08-09 17:30 ` Jeff Xu
@ 2024-08-27 0:53 ` Jeff Xu
2024-10-03 23:29 ` Aruna Ramakrishna
0 siblings, 1 reply; 20+ messages in thread
From: Jeff Xu @ 2024-08-27 0:53 UTC (permalink / raw)
To: Aruna Ramakrishna
Cc: linux-kernel, x86, dave.hansen, tglx, mingo, linux-mm,
keith.lucas, rick.p.edgecombe, jorgelo, keescook, sroettger,
jannh
Hi Aruna
On Fri, Aug 9, 2024 at 10:30 AM Jeff Xu <jeffxu@chromium.org> wrote:
>
> On Thu, Aug 1, 2024 at 11:13 PM Aruna Ramakrishna
> <aruna.ramakrishna@oracle.com> wrote:
> >
> > If the alternate signal stack is protected by a different pkey than the
> > current execution stack, copying xsave data to the sigaltstack will fail
> > if its pkey is not enabled in the PKRU register.
> >
> > We do not know which pkey was used by the application for the altstack,
> > so enable all pkeys before xsave.
> >
> > But this updated PKRU value is also pushed onto the sigframe, which
> > means the register value restored from sigcontext will be different from
> > the user-defined one, which is unexpected. Fix that by overwriting the
> > PKRU value on the sigframe with the original, user-defined PKRU.
> >
> > Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> > ---
> > arch/x86/kernel/fpu/signal.c | 11 +++++++++--
> > arch/x86/kernel/signal.c | 12 ++++++++++--
> > 2 files changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> > index 931c5469d7f3..1065ab995305 100644
> > --- a/arch/x86/kernel/fpu/signal.c
> > +++ b/arch/x86/kernel/fpu/signal.c
> > @@ -168,8 +168,15 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
> >
> > static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
> > {
> > - if (use_xsave())
> > - return xsave_to_user_sigframe(buf);
> > + int err = 0;
> > +
> > + if (use_xsave()) {
> > + err = xsave_to_user_sigframe(buf);
> > + if (!err)
> > + err = update_pkru_in_sigframe(buf, pkru);
> > + return err;
> > + }
> > +
> > if (use_fxsr())
> > return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
> > else
> > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> > index 9dc77ad03a0e..5f441039b572 100644
> > --- a/arch/x86/kernel/signal.c
> > +++ b/arch/x86/kernel/signal.c
> > @@ -102,7 +102,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> > unsigned long math_size = 0;
> > unsigned long sp = regs->sp;
> > unsigned long buf_fx = 0;
> > - u32 pkru = read_pkru();
> > + u32 pkru;
> >
> > /* redzone */
> > if (!ia32_frame)
> > @@ -157,9 +157,17 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> > return (void __user *)-1L;
> > }
> >
> > + /* Update PKRU to enable access to the alternate signal stack. */
> > + pkru = sig_prepare_pkru();
> I think, the better place to call sig_prepare_pkru() at begging of the
> get_sigframe:
>
> get_sigframe()
> {
> /* ... */
> if (ka->sa.sa_flags & SA_ONSTACK) {
> if (sas_ss_flags(sp) == 0) {
> // set pkru = 0 <--- set pkru = 0 here.
> entering_altstack = true;
> }
> }
> For two reasons:
> - We probably don't want all signaling handling going through "PKRU=0"
> , this includes the regular stack and nested signaling handling. The
> best place is when "entering the altstack". IIUC, this feature only
> enabled when sigaltstack() is used.
> - The thread might not have read-access to the altstack, so we will
> want to make sure that pkru=0 is set before any read to the altstack.
> And IIRC, fpu__alloc_mathframe needs read-access to the altstack.
> To help with testing, I will send a test case to demo the usage.
Apologies, I remembered it wrong, the fpu__alloc_mathframe doesn't
need read-access to altstack.
But I think the best place to set pkru=0 is still when the stack
entering altstack, as shown above. the reason is: The signal stack can
be nested, i.e. trigger another signaling inside signal handler and we
don't need to set pkru=0 multiple times (only the first time enter
sigaltstack)
> (please give me sometime to organize the test code, I'm hoping to send
> out before the end of next week)
>
the test code is placed at [1]
[1] https://github.com/peaktocreek/pkeytest
> Also, could you please consider adding a new flag SS_PKEYALTSTACK (see
> SS_AUTODISARM for example). Most existing apps that use sigaltstack()
> don't use PKEY and don't care about setting PKRU=0, and don't need to
> overwrite the sig frame after XSAVE. The flag will limit the impact
> of this patch.
>
Thanks
-Jeff
> Thanks
> Best regards,
> -Jeff
>
>
>
>
>
> > /* save i387 and extended state */
> > - if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru))
> > + if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru)) {
> > + /*
> > + * Restore PKRU to the original, user-defined value; disable
> > + * extra pkeys enabled for the alternate signal stack, if any.
> > + */
> > + write_pkru(pkru);
> > return (void __user *)-1L;
> > + }
> >
> > return (void __user *)sp;
> > }
> > --
> > 2.39.3
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2024-08-27 0:53 ` Jeff Xu
@ 2024-10-03 23:29 ` Aruna Ramakrishna
2024-10-04 4:20 ` Jeff Xu
0 siblings, 1 reply; 20+ messages in thread
From: Aruna Ramakrishna @ 2024-10-03 23:29 UTC (permalink / raw)
To: Jeff Xu
Cc: linux-kernel, x86, dave.hansen, Thomas Gleixner, Ingo Molnar,
linux-mm, Keith Lucas, rick.p.edgecombe, Jorge Lucangeli Obes,
Kees Cook, Stephen Röttger, Jann Horn
> On Aug 26, 2024, at 5:53 PM, Jeff Xu <jeffxu@chromium.org> wrote:
>
> Hi Aruna
>
> On Fri, Aug 9, 2024 at 10:30 AM Jeff Xu <jeffxu@chromium.org> wrote:
>>
>> On Thu, Aug 1, 2024 at 11:13 PM Aruna Ramakrishna
>> <aruna.ramakrishna@oracle.com> wrote:
>>>
>>> If the alternate signal stack is protected by a different pkey than the
>>> current execution stack, copying xsave data to the sigaltstack will fail
>>> if its pkey is not enabled in the PKRU register.
>>>
>>> We do not know which pkey was used by the application for the altstack,
>>> so enable all pkeys before xsave.
>>>
>>> But this updated PKRU value is also pushed onto the sigframe, which
>>> means the register value restored from sigcontext will be different from
>>> the user-defined one, which is unexpected. Fix that by overwriting the
>>> PKRU value on the sigframe with the original, user-defined PKRU.
>>>
>>> Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>> ---
>>> arch/x86/kernel/fpu/signal.c | 11 +++++++++--
>>> arch/x86/kernel/signal.c | 12 ++++++++++--
>>> 2 files changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
>>> index 931c5469d7f3..1065ab995305 100644
>>> --- a/arch/x86/kernel/fpu/signal.c
>>> +++ b/arch/x86/kernel/fpu/signal.c
>>> @@ -168,8 +168,15 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
>>>
>>> static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
>>> {
>>> - if (use_xsave())
>>> - return xsave_to_user_sigframe(buf);
>>> + int err = 0;
>>> +
>>> + if (use_xsave()) {
>>> + err = xsave_to_user_sigframe(buf);
>>> + if (!err)
>>> + err = update_pkru_in_sigframe(buf, pkru);
>>> + return err;
>>> + }
>>> +
>>> if (use_fxsr())
>>> return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
>>> else
>>> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
>>> index 9dc77ad03a0e..5f441039b572 100644
>>> --- a/arch/x86/kernel/signal.c
>>> +++ b/arch/x86/kernel/signal.c
>>> @@ -102,7 +102,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
>>> unsigned long math_size = 0;
>>> unsigned long sp = regs->sp;
>>> unsigned long buf_fx = 0;
>>> - u32 pkru = read_pkru();
>>> + u32 pkru;
>>>
>>> /* redzone */
>>> if (!ia32_frame)
>>> @@ -157,9 +157,17 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
>>> return (void __user *)-1L;
>>> }
>>>
>>> + /* Update PKRU to enable access to the alternate signal stack. */
>>> + pkru = sig_prepare_pkru();
>> I think, the better place to call sig_prepare_pkru() at begging of the
>> get_sigframe:
>>
>> get_sigframe()
>> {
>> /* ... */
>> if (ka->sa.sa_flags & SA_ONSTACK) {
>> if (sas_ss_flags(sp) == 0) {
>> // set pkru = 0 <--- set pkru = 0 here.
>> entering_altstack = true;
>> }
>> }
>> For two reasons:
>> - We probably don't want all signaling handling going through "PKRU=0"
>> , this includes the regular stack and nested signaling handling. The
>> best place is when "entering the altstack". IIUC, this feature only
>> enabled when sigaltstack() is used.
>> - The thread might not have read-access to the altstack, so we will
>> want to make sure that pkru=0 is set before any read to the altstack.
>> And IIRC, fpu__alloc_mathframe needs read-access to the altstack.
>> To help with testing, I will send a test case to demo the usage.
> Apologies, I remembered it wrong, the fpu__alloc_mathframe doesn't
> need read-access to altstack.
>
> But I think the best place to set pkru=0 is still when the stack
> entering altstack, as shown above. the reason is: The signal stack can
> be nested, i.e. trigger another signaling inside signal handler and we
> don't need to set pkru=0 multiple times (only the first time enter
> sigaltstack)
>
>> (please give me sometime to organize the test code, I'm hoping to send
>> out before the end of next week)
>>
> the test code is placed at [1]
> [1] https://github.com/peaktocreek/pkeytest
>
>> Also, could you please consider adding a new flag SS_PKEYALTSTACK (see
>> SS_AUTODISARM for example). Most existing apps that use sigaltstack()
>> don't use PKEY and don't care about setting PKRU=0, and don't need to
>> overwrite the sig frame after XSAVE. The flag will limit the impact
>> of this patch.
>>
> Thanks
> -Jeff
>
Hi Jeff,
I apologize for the delay in my response. I haven’t had a chance to optimize
this patchset or try new test cases.
I agree with your first suggestion that we can set pkru to 0 only when
entering_altstack = true, as that is the intention of this flow anyway.
I’m not so sure if SS_PKEYALTSTACK is really necessary - theoretically it seems
logical to not do this for applications that do not use pkeys but use altstack, but
it adds extra code/checks for possibly insignificant performance gain. I have not
tested this yet.
I’ll try to retest and send out a new patchset on top of the previous ones that have
been merged to 6.12.
Thank you for your feedback, and your attention to detail - I appreciate it.
Thanks,
Aruna
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2024-10-03 23:29 ` Aruna Ramakrishna
@ 2024-10-04 4:20 ` Jeff Xu
0 siblings, 0 replies; 20+ messages in thread
From: Jeff Xu @ 2024-10-04 4:20 UTC (permalink / raw)
To: Aruna Ramakrishna
Cc: linux-kernel, x86, dave.hansen, Thomas Gleixner, Ingo Molnar,
linux-mm, Keith Lucas, rick.p.edgecombe, Jorge Lucangeli Obes,
Kees Cook, Stephen Röttger, Jann Horn
Hi Aruna
On Thu, Oct 3, 2024 at 4:29 PM Aruna Ramakrishna
<aruna.ramakrishna@oracle.com> wrote:
>
>
>
> > On Aug 26, 2024, at 5:53 PM, Jeff Xu <jeffxu@chromium.org> wrote:
> >
> > Hi Aruna
> >
> > On Fri, Aug 9, 2024 at 10:30 AM Jeff Xu <jeffxu@chromium.org> wrote:
> >>
> >> On Thu, Aug 1, 2024 at 11:13 PM Aruna Ramakrishna
> >> <aruna.ramakrishna@oracle.com> wrote:
> >>>
> >>> If the alternate signal stack is protected by a different pkey than the
> >>> current execution stack, copying xsave data to the sigaltstack will fail
> >>> if its pkey is not enabled in the PKRU register.
> >>>
> >>> We do not know which pkey was used by the application for the altstack,
> >>> so enable all pkeys before xsave.
> >>>
> >>> But this updated PKRU value is also pushed onto the sigframe, which
> >>> means the register value restored from sigcontext will be different from
> >>> the user-defined one, which is unexpected. Fix that by overwriting the
> >>> PKRU value on the sigframe with the original, user-defined PKRU.
> >>>
> >>> Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>> ---
> >>> arch/x86/kernel/fpu/signal.c | 11 +++++++++--
> >>> arch/x86/kernel/signal.c | 12 ++++++++++--
> >>> 2 files changed, 19 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> >>> index 931c5469d7f3..1065ab995305 100644
> >>> --- a/arch/x86/kernel/fpu/signal.c
> >>> +++ b/arch/x86/kernel/fpu/signal.c
> >>> @@ -168,8 +168,15 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
> >>>
> >>> static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
> >>> {
> >>> - if (use_xsave())
> >>> - return xsave_to_user_sigframe(buf);
> >>> + int err = 0;
> >>> +
> >>> + if (use_xsave()) {
> >>> + err = xsave_to_user_sigframe(buf);
> >>> + if (!err)
> >>> + err = update_pkru_in_sigframe(buf, pkru);
> >>> + return err;
> >>> + }
> >>> +
> >>> if (use_fxsr())
> >>> return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
> >>> else
> >>> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> >>> index 9dc77ad03a0e..5f441039b572 100644
> >>> --- a/arch/x86/kernel/signal.c
> >>> +++ b/arch/x86/kernel/signal.c
> >>> @@ -102,7 +102,7 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> >>> unsigned long math_size = 0;
> >>> unsigned long sp = regs->sp;
> >>> unsigned long buf_fx = 0;
> >>> - u32 pkru = read_pkru();
> >>> + u32 pkru;
> >>>
> >>> /* redzone */
> >>> if (!ia32_frame)
> >>> @@ -157,9 +157,17 @@ get_sigframe(struct ksignal *ksig, struct pt_regs *regs, size_t frame_size,
> >>> return (void __user *)-1L;
> >>> }
> >>>
> >>> + /* Update PKRU to enable access to the alternate signal stack. */
> >>> + pkru = sig_prepare_pkru();
> >> I think, the better place to call sig_prepare_pkru() at begging of the
> >> get_sigframe:
> >>
> >> get_sigframe()
> >> {
> >> /* ... */
> >> if (ka->sa.sa_flags & SA_ONSTACK) {
> >> if (sas_ss_flags(sp) == 0) {
> >> // set pkru = 0 <--- set pkru = 0 here.
> >> entering_altstack = true;
> >> }
> >> }
> >> For two reasons:
> >> - We probably don't want all signaling handling going through "PKRU=0"
> >> , this includes the regular stack and nested signaling handling. The
> >> best place is when "entering the altstack". IIUC, this feature only
> >> enabled when sigaltstack() is used.
> >> - The thread might not have read-access to the altstack, so we will
> >> want to make sure that pkru=0 is set before any read to the altstack.
> >> And IIRC, fpu__alloc_mathframe needs read-access to the altstack.
> >> To help with testing, I will send a test case to demo the usage.
> > Apologies, I remembered it wrong, the fpu__alloc_mathframe doesn't
> > need read-access to altstack.
> >
> > But I think the best place to set pkru=0 is still when the stack
> > entering altstack, as shown above. the reason is: The signal stack can
> > be nested, i.e. trigger another signaling inside signal handler and we
> > don't need to set pkru=0 multiple times (only the first time enter
> > sigaltstack)
> >
> >> (please give me sometime to organize the test code, I'm hoping to send
> >> out before the end of next week)
> >>
> > the test code is placed at [1]
> > [1] https://github.com/peaktocreek/pkeytest
> >
> >> Also, could you please consider adding a new flag SS_PKEYALTSTACK (see
> >> SS_AUTODISARM for example). Most existing apps that use sigaltstack()
> >> don't use PKEY and don't care about setting PKRU=0, and don't need to
> >> overwrite the sig frame after XSAVE. The flag will limit the impact
> >> of this patch.
> >>
> > Thanks
> > -Jeff
> >
>
> Hi Jeff,
>
> I apologize for the delay in my response. I haven’t had a chance to optimize
> this patchset or try new test cases.
>
Not a problem, I understand.
> I agree with your first suggestion that we can set pkru to 0 only when
> entering_altstack = true, as that is the intention of this flow anyway.
>
> I’m not so sure if SS_PKEYALTSTACK is really necessary - theoretically it seems
> logical to not do this for applications that do not use pkeys but use altstack, but
> it adds extra code/checks for possibly insignificant performance gain. I have not
> tested this yet.
>
One of the reasons that I'm thinking about is that signaling handling
has real-time requirements for some real-time systems, and applying
PKRU=0 without checking will increase cost to those systems and might
be viewed as a regression.
Thanks
-Jeff
> I’ll try to retest and send out a new patchset on top of the previous ones that have
> been merged to 6.12.
>
> Thank you for your feedback, and your attention to detail - I appreciate it.
>
> Thanks,
> Aruna
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2024-08-02 6:13 ` [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
2024-08-09 17:30 ` Jeff Xu
@ 2025-02-04 10:01 ` Dmitry Vyukov
2025-02-06 18:06 ` Dmitry Vyukov
2025-02-06 18:21 ` Dave Hansen
1 sibling, 2 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2025-02-04 10:01 UTC (permalink / raw)
To: aruna.ramakrishna, mathieu.desnoyers, peterz, paulmck, boqun.feng
Cc: dave.hansen, jannh, jeffxu, jorgelo, keescook, keith.lucas,
linux-kernel, linux-mm, mingo, rick.p.edgecombe, sroettger, tglx,
x86
Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@oracle.com/
> If the alternate signal stack is protected by a different pkey than the
> current execution stack, copying xsave data to the sigaltstack will fail
> if its pkey is not enabled in the PKRU register.
>
> We do not know which pkey was used by the application for the altstack,
> so enable all pkeys before xsave.
>
> But this updated PKRU value is also pushed onto the sigframe, which
> means the register value restored from sigcontext will be different from
> the user-defined one, which is unexpected. Fix that by overwriting the
> PKRU value on the sigframe with the original, user-defined PKRU.
Hi,
This unfortunatly seems to be broken for rseq user-space writes.
If the signal is caused by rseq struct being inaccessible due to PKEYs,
we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
which happens _before_ sig_prepare_pkru and won't succeed
(PKEY is still inaccessible, hard kills the process).
Any PKEY sandbox would want to restict untrusted access to rseq
as well (otherwise allows easy sandbox escapes).
If we do sig_prepare_pkru before rseq_signal_deliver (and generally
before any copy_to_userpace), then user-space handler gets SIGSEGV
and could unregister rseq and retry.
However, I am not sure if it's the best solution performance-
and complexity-wise (for user-space). A better solution may be to
change __rseq_handle_notify_resume to temporary switch to default
PKEY if user accesses fail.
Rseq is similar to signals in this respect. Since rseq updates
happen asynchronously with respect to user-space control flow,
if a program uses rseq and ever makes rseq inaccessible with PKEYs,
it's in trouble and will be randomly killed.
Since rseq updates are asynchronous as signals, they shouldn't
assume PKEY is set to default value that allows access
to rseq descriptor.
Thoughts?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2025-02-04 10:01 ` Dmitry Vyukov
@ 2025-02-06 18:06 ` Dmitry Vyukov
2025-02-10 22:46 ` Jeff Xu
2025-02-06 18:21 ` Dave Hansen
1 sibling, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2025-02-06 18:06 UTC (permalink / raw)
To: aruna.ramakrishna, mathieu.desnoyers, peterz, paulmck, boqun.feng
Cc: dave.hansen, jannh, jeffxu, jorgelo, keescook, keith.lucas,
linux-kernel, linux-mm, mingo, rick.p.edgecombe, sroettger, tglx,
x86
On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
> https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@oracle.com/
>
> > If the alternate signal stack is protected by a different pkey than the
> > current execution stack, copying xsave data to the sigaltstack will fail
> > if its pkey is not enabled in the PKRU register.
> >
> > We do not know which pkey was used by the application for the altstack,
> > so enable all pkeys before xsave.
> >
> > But this updated PKRU value is also pushed onto the sigframe, which
> > means the register value restored from sigcontext will be different from
> > the user-defined one, which is unexpected. Fix that by overwriting the
> > PKRU value on the sigframe with the original, user-defined PKRU.
>
> Hi,
>
> This unfortunatly seems to be broken for rseq user-space writes.
> If the signal is caused by rseq struct being inaccessible due to PKEYs,
> we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
> which happens _before_ sig_prepare_pkru and won't succeed
> (PKEY is still inaccessible, hard kills the process).
> Any PKEY sandbox would want to restict untrusted access to rseq
> as well (otherwise allows easy sandbox escapes).
>
> If we do sig_prepare_pkru before rseq_signal_deliver (and generally
> before any copy_to_userpace), then user-space handler gets SIGSEGV
> and could unregister rseq and retry.
>
> However, I am not sure if it's the best solution performance-
> and complexity-wise (for user-space). A better solution may be to
> change __rseq_handle_notify_resume to temporary switch to default
> PKEY if user accesses fail.
> Rseq is similar to signals in this respect. Since rseq updates
> happen asynchronously with respect to user-space control flow,
> if a program uses rseq and ever makes rseq inaccessible with PKEYs,
> it's in trouble and will be randomly killed.
> Since rseq updates are asynchronous as signals, they shouldn't
> assume PKEY is set to default value that allows access
> to rseq descriptor.
>
> Thoughts?
Another question about switching to pkey 0 and not switching back on all errors.
Can it create security problems by allowing sandboxed code to escape?
Namely, here:
+ /* Update PKRU to enable access to the alternate signal stack. */
+ pkru = sig_prepare_pkru();
/* save i387 and extended state */
- if (!copy_fpstate_to_sigframe(*fpstate, (void __user
*)buf_fx, math_size, pkru))
+ if (!copy_fpstate_to_sigframe(*fpstate, (void __user
*)buf_fx, math_size, pkru)) {
+ /*
+ * Restore PKRU to the original, user-defined value; disable
+ * extra pkeys enabled for the alternate signal stack, if any.
+ */
+ write_pkru(pkru);
return (void __user *)-1L;
+ }
we restore to the original pkru on this error, but there are other
failure paths later, e.g.:
https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_64.c#L199
on these errors paths we will eventually get here to force_sig(SIGSEGV):
https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685
which just sends SIGSEGV and is not fatal.
So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK,
which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this
will result in resetting PKRU to 0 without restoring it back.
Or sandboxed code somehow arranges for the first signal setup for other reasons.
This is, of course, a tricky attack vector, and the program must
resume after SIGSEGV somehow (there are some such cases, e.g. mmaping
something lazily and retrying), but with security you never know how
creative an attacker can get and what you are missing that they are
not missing. So it looks safer to restore to the original PKRU on all
errors.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2025-02-04 10:01 ` Dmitry Vyukov
2025-02-06 18:06 ` Dmitry Vyukov
@ 2025-02-06 18:21 ` Dave Hansen
2025-02-06 18:35 ` Dmitry Vyukov
1 sibling, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2025-02-06 18:21 UTC (permalink / raw)
To: Dmitry Vyukov, aruna.ramakrishna, mathieu.desnoyers, peterz,
paulmck, boqun.feng
Cc: dave.hansen, jannh, jeffxu, jorgelo, keescook, keith.lucas,
linux-kernel, linux-mm, mingo, rick.p.edgecombe, sroettger, tglx,
x86
On 2/4/25 02:01, Dmitry Vyukov wrote:
>> But this updated PKRU value is also pushed onto the sigframe, which
>> means the register value restored from sigcontext will be different from
>> the user-defined one, which is unexpected. Fix that by overwriting the
>> PKRU value on the sigframe with the original, user-defined PKRU.
>
> This unfortunatly seems to be broken for rseq user-space writes.
Hi Dmitry,
Are you saying that Aruna's patch caused a regression in the rseq code?
Or that the rseq code has a separate but similar issue to the one that
Aruna's patch fixed?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2025-02-06 18:21 ` Dave Hansen
@ 2025-02-06 18:35 ` Dmitry Vyukov
0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2025-02-06 18:35 UTC (permalink / raw)
To: Dave Hansen
Cc: aruna.ramakrishna, mathieu.desnoyers, peterz, paulmck,
boqun.feng, dave.hansen, jannh, jeffxu, jorgelo, keescook,
keith.lucas, linux-kernel, linux-mm, mingo, rick.p.edgecombe,
sroettger, tglx, x86
On Thu, 6 Feb 2025 at 19:21, Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/4/25 02:01, Dmitry Vyukov wrote:
> >> But this updated PKRU value is also pushed onto the sigframe, which
> >> means the register value restored from sigcontext will be different from
> >> the user-defined one, which is unexpected. Fix that by overwriting the
> >> PKRU value on the sigframe with the original, user-defined PKRU.
> >
> > This unfortunatly seems to be broken for rseq user-space writes.
>
> Hi Dmitry,
>
> Are you saying that Aruna's patch caused a regression in the rseq code?
> Or that the rseq code has a separate but similar issue to the one that
> Aruna's patch fixed?
No, no regression. Just another issue for real PKEY uses.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2025-02-06 18:06 ` Dmitry Vyukov
@ 2025-02-10 22:46 ` Jeff Xu
2025-02-11 6:47 ` Dmitry Vyukov
0 siblings, 1 reply; 20+ messages in thread
From: Jeff Xu @ 2025-02-10 22:46 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: aruna.ramakrishna, mathieu.desnoyers, peterz, paulmck,
boqun.feng, dave.hansen, jannh, jorgelo, keescook, keith.lucas,
linux-kernel, linux-mm, mingo, rick.p.edgecombe, sroettger, tglx,
x86
Hi Dmitry
On Thu, Feb 6, 2025 at 10:06 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
> > https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@oracle.com/
> >
> > > If the alternate signal stack is protected by a different pkey than the
> > > current execution stack, copying xsave data to the sigaltstack will fail
> > > if its pkey is not enabled in the PKRU register.
> > >
> > > We do not know which pkey was used by the application for the altstack,
> > > so enable all pkeys before xsave.
> > >
> > > But this updated PKRU value is also pushed onto the sigframe, which
> > > means the register value restored from sigcontext will be different from
> > > the user-defined one, which is unexpected. Fix that by overwriting the
> > > PKRU value on the sigframe with the original, user-defined PKRU.
> >
> > Hi,
> >
> > This unfortunatly seems to be broken for rseq user-space writes.
> > If the signal is caused by rseq struct being inaccessible due to PKEYs,
> > we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
> > which happens _before_ sig_prepare_pkru and won't succeed
> > (PKEY is still inaccessible, hard kills the process).
> > Any PKEY sandbox would want to restict untrusted access to rseq
> > as well (otherwise allows easy sandbox escapes).
> >
> > If we do sig_prepare_pkru before rseq_signal_deliver (and generally
> > before any copy_to_userpace), then user-space handler gets SIGSEGV
> > and could unregister rseq and retry.
> >
> > However, I am not sure if it's the best solution performance-
> > and complexity-wise (for user-space). A better solution may be to
> > change __rseq_handle_notify_resume to temporary switch to default
> > PKEY if user accesses fail.
> > Rseq is similar to signals in this respect. Since rseq updates
> > happen asynchronously with respect to user-space control flow,
> > if a program uses rseq and ever makes rseq inaccessible with PKEYs,
> > it's in trouble and will be randomly killed.
> > Since rseq updates are asynchronous as signals, they shouldn't
> > assume PKEY is set to default value that allows access
> > to rseq descriptor.
> >
> > Thoughts?
>
> Another question about switching to pkey 0 and not switching back on all errors.
> Can it create security problems by allowing sandboxed code to escape?
>
Sandbox escape would be bad , we wouldn't want the calling thread to
get PKRU = 0 in any error path.
> Namely, here:
>
> + /* Update PKRU to enable access to the alternate signal stack. */
> + pkru = sig_prepare_pkru();
> /* save i387 and extended state */
> - if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> *)buf_fx, math_size, pkru))
> + if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> *)buf_fx, math_size, pkru)) {
> + /*
> + * Restore PKRU to the original, user-defined value; disable
> + * extra pkeys enabled for the alternate signal stack, if any.
> + */
> + write_pkru(pkru);
> return (void __user *)-1L;
> + }
>
> we restore to the original pkru on this error, but there are other
> failure paths later, e.g.:
> https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_64.c#L199
>
> on these errors paths we will eventually get here to force_sig(SIGSEGV):
> https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685
> which just sends SIGSEGV and is not fatal.
>
> So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK,
> which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this
> will result in resetting PKRU to 0 without restoring it back.
> Or sandboxed code somehow arranges for the first signal setup for other reasons.
>
Can you walk me through the setup and steps that led to this situation?
> This is, of course, a tricky attack vector, and the program must
> resume after SIGSEGV somehow (there are some such cases, e.g. mmaping
> something lazily and retrying), but with security you never know how
> creative an attacker can get and what you are missing that they are
> not missing. So it looks safer to restore to the original PKRU on all
> errors.
Thanks
-Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE
2025-02-10 22:46 ` Jeff Xu
@ 2025-02-11 6:47 ` Dmitry Vyukov
0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2025-02-11 6:47 UTC (permalink / raw)
To: Jeff Xu
Cc: aruna.ramakrishna, mathieu.desnoyers, peterz, paulmck,
boqun.feng, dave.hansen, jannh, jorgelo, keescook, keith.lucas,
linux-kernel, linux-mm, mingo, rick.p.edgecombe, sroettger, tglx,
x86
On Mon, 10 Feb 2025 at 23:46, Jeff Xu <jeffxu@chromium.org> wrote:
>
> Hi Dmitry
>
> On Thu, Feb 6, 2025 at 10:06 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > Re commit 70044df250d022572e26cd301bddf75eac1fe50e:
> > > https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishna@oracle.com/
> > >
> > > > If the alternate signal stack is protected by a different pkey than the
> > > > current execution stack, copying xsave data to the sigaltstack will fail
> > > > if its pkey is not enabled in the PKRU register.
> > > >
> > > > We do not know which pkey was used by the application for the altstack,
> > > > so enable all pkeys before xsave.
> > > >
> > > > But this updated PKRU value is also pushed onto the sigframe, which
> > > > means the register value restored from sigcontext will be different from
> > > > the user-defined one, which is unexpected. Fix that by overwriting the
> > > > PKRU value on the sigframe with the original, user-defined PKRU.
> > >
> > > Hi,
> > >
> > > This unfortunatly seems to be broken for rseq user-space writes.
> > > If the signal is caused by rseq struct being inaccessible due to PKEYs,
> > > we try to write to rseq again at setup_rt_frame->rseq_signal_deliver,
> > > which happens _before_ sig_prepare_pkru and won't succeed
> > > (PKEY is still inaccessible, hard kills the process).
> > > Any PKEY sandbox would want to restict untrusted access to rseq
> > > as well (otherwise allows easy sandbox escapes).
> > >
> > > If we do sig_prepare_pkru before rseq_signal_deliver (and generally
> > > before any copy_to_userpace), then user-space handler gets SIGSEGV
> > > and could unregister rseq and retry.
> > >
> > > However, I am not sure if it's the best solution performance-
> > > and complexity-wise (for user-space). A better solution may be to
> > > change __rseq_handle_notify_resume to temporary switch to default
> > > PKEY if user accesses fail.
> > > Rseq is similar to signals in this respect. Since rseq updates
> > > happen asynchronously with respect to user-space control flow,
> > > if a program uses rseq and ever makes rseq inaccessible with PKEYs,
> > > it's in trouble and will be randomly killed.
> > > Since rseq updates are asynchronous as signals, they shouldn't
> > > assume PKEY is set to default value that allows access
> > > to rseq descriptor.
> > >
> > > Thoughts?
> >
> > Another question about switching to pkey 0 and not switching back on all errors.
> > Can it create security problems by allowing sandboxed code to escape?
> >
> Sandbox escape would be bad , we wouldn't want the calling thread to
> get PKRU = 0 in any error path.
>
> > Namely, here:
> >
> > + /* Update PKRU to enable access to the alternate signal stack. */
> > + pkru = sig_prepare_pkru();
> > /* save i387 and extended state */
> > - if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> > *)buf_fx, math_size, pkru))
> > + if (!copy_fpstate_to_sigframe(*fpstate, (void __user
> > *)buf_fx, math_size, pkru)) {
> > + /*
> > + * Restore PKRU to the original, user-defined value; disable
> > + * extra pkeys enabled for the alternate signal stack, if any.
> > + */
> > + write_pkru(pkru);
> > return (void __user *)-1L;
> > + }
> >
> > we restore to the original pkru on this error, but there are other
> > failure paths later, e.g.:
> > https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_64.c#L199
> >
> > on these errors paths we will eventually get here to force_sig(SIGSEGV):
> > https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685
> > which just sends SIGSEGV and is not fatal.
> >
> > So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK,
> > which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this
> > will result in resetting PKRU to 0 without restoring it back.
> > Or sandboxed code somehow arranges for the first signal setup for other reasons.
> >
> Can you walk me through the setup and steps that led to this situation?
I don't anything more concrete steps, just the observation that PKRU
is restored only on 1 out of N error paths.
> > This is, of course, a tricky attack vector, and the program must
> > resume after SIGSEGV somehow (there are some such cases, e.g. mmaping
> > something lazily and retrying), but with security you never know how
> > creative an attacker can get and what you are missing that they are
> > not missing. So it looks safer to restore to the original PKRU on all
> > errors.
>
> Thanks
> -Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-02-11 6:47 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-02 6:13 [PATCH v8 0/5] x86/pkeys: update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 1/5] x86/pkeys: Add PKRU as a parameter in signal handling functions Aruna Ramakrishna
2024-08-09 17:16 ` Jeff Xu
2024-08-02 6:13 ` [PATCH v8 2/5] x86/pkeys: Add helper functions to update PKRU on the sigframe Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE Aruna Ramakrishna
2024-08-09 17:30 ` Jeff Xu
2024-08-27 0:53 ` Jeff Xu
2024-10-03 23:29 ` Aruna Ramakrishna
2024-10-04 4:20 ` Jeff Xu
2025-02-04 10:01 ` Dmitry Vyukov
2025-02-06 18:06 ` Dmitry Vyukov
2025-02-10 22:46 ` Jeff Xu
2025-02-11 6:47 ` Dmitry Vyukov
2025-02-06 18:21 ` Dave Hansen
2025-02-06 18:35 ` Dmitry Vyukov
2024-08-02 6:13 ` [PATCH v8 4/5] x86/pkeys: Restore altstack access in sigreturn() Aruna Ramakrishna
2024-08-02 6:13 ` [PATCH v8 5/5] selftests/mm: Add new testcases for pkeys Aruna Ramakrishna
2024-08-02 8:16 ` Thomas Gleixner
2024-08-02 8:22 ` Thomas Gleixner
2024-08-02 12:04 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox