* [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 3:31 ` Randy Dunlap
2025-02-12 15:05 ` Liam R. Howlett
2025-02-12 3:21 ` [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed jeffxu
` (6 subsequent siblings)
7 siblings, 2 replies; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu
From: Jeff Xu <jeffxu@chromium.org>
Provide infrastructure to mseal system mappings. Establish
two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_HAS_MSEAL_SYSTEM_MAPPINGS) and a header file (userprocess.h)
for future patches.
As discussed during mseal() upstream process [1], mseal() protects
the VMAs of a given virtual memory range against modifications, such
as the read/write (RW) and no-execute (NX) bits. For complete
descriptions of memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect
them from ever changing to writable or unmmap/remapped as different
attributes.
System mappings such as vdso, vvar, and sigpage (arm), vectors (arm)
are created by the kernel during program initialization, and could
be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000),
which is outside the mm managed range. This means mprotect, munmap, and
mremap won't work on the vsyscall. Since sealing doesn't enhance
the vsyscall's security, it is skipped in this patch. If we ever seal
the vsyscall, it is probably only for decorative purpose, i.e. showing
the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor are also known to change the vdso/vvar mappings. Consequently,
this feature cannot be universally enabled across all systems. As such,
CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define
CONFIG_ARCH_HAS_MSEAL_SYSTEM_MAPPINGS and update their special mappings
calls to pass mseal flag. Additionally, architectures must confirm they
do not unmap/remap system mappings during the process lifetime.
In this version, we've improved the handling of system mapping sealing from
previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.
Additionally, this patch introduces a new header,
include/linux/usrprocess.h, which contains the mseal_system_mappings()
function. This function helps the architecture determine if system
mapping is enabled within the current kernel configuration. It can be
extended in the future to handle opt-in/out prctl for enabling system
mapping sealing at the process level or a kernel cmdline feature.
A new header file was introduced because it was difficult to find the
best location for this function. Although include/linux/mm.h was
considered, this feature is more closely related to user processes
than core memory management. Additionally, future prctl or kernel
cmd-line implementations for this feature would not fit within the
scope of core memory management or mseal.c. This is relevant because
if we add unit-tests for mseal.c in the future, we would want to limit
mseal.c's dependencies to core memory management.
Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
than its actual size). The dynamic linker, which relies on PT_LOAD
information to determine mapping size, would then split and partially
seal the vdso mapping. Since each architecture has its own vdso/vvar
code, fixing this in the kernel would require going through each
archiecture. Our initial goal was to enable sealing readonly mappings,
e.g. .text, across all architectures, sealing vdso from kernel since
creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length
information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker,
and there is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android,
and other high security systems.
Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
Android to ensure the sealing doesn't affect the functionality of
Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
normal operations such as browsing the web, open/edit doc are OK.
In addition, Benjamin Berg tested this on UML.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
include/linux/userprocess.h | 18 ++++++++++++++++++
init/Kconfig | 18 ++++++++++++++++++
security/Kconfig | 18 ++++++++++++++++++
3 files changed, 54 insertions(+)
create mode 100644 include/linux/userprocess.h
diff --git a/include/linux/userprocess.h b/include/linux/userprocess.h
new file mode 100644
index 000000000000..bd11e2e972c5
--- /dev/null
+++ b/include/linux/userprocess.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_USER_PROCESS_H
+#define _LINUX_USER_PROCESS_H
+#include <linux/mm.h>
+
+/*
+ * mseal of userspace process's system mappings.
+ */
+static inline unsigned long mseal_system_mappings(void)
+{
+#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
+ return VM_SEALED;
+#else
+ return 0;
+#endif
+}
+
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..892d2bcdf397 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
config ARCH_HAS_MEMBARRIER_SYNC_CORE
bool
+config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
+ bool
+ help
+ Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
+
+ A 64-bit kernel is required for the memory sealing feature.
+ No specific hardware features from the CPU are needed.
+
+ To enable this feature, the architecture needs to update their
+ speical mappings calls to include the sealing flag and confirm
+ that it doesn't unmap/remap system mappings during the life
+ time of the process. After the architecture enables this, a
+ distribution can set CONFIG_MSEAL_SYSTEM_MAPPING to manage access
+ to the feature.
+
+ For complete descriptions of memory sealing, please see
+ Documentation/userspace-api/mseal.rst
+
config HAVE_PERF_EVENTS
bool
help
diff --git a/security/Kconfig b/security/Kconfig
index f10dbf15c294..bfb35fc7a3c6 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -51,6 +51,24 @@ config PROC_MEM_NO_FORCE
endchoice
+config MSEAL_SYSTEM_MAPPINGS
+ bool "mseal system mappings"
+ depends on 64BIT
+ depends on ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
+ depends on !CHECKPOINT_RESTORE
+ help
+ Seal system mappings such as vdso, vvar, sigpage, uprobes, etc.
+
+ A 64-bit kernel is required for the memory sealing feature.
+ No specific hardware features from the CPU are needed.
+
+ Note: CHECKPOINT_RESTORE, UML, gVisor are known to relocate or
+ unmap system mapping, therefore this config can't be enabled
+ universally.
+
+ For complete descriptions of memory sealing, please see
+ Documentation/userspace-api/mseal.rst
+
config SECURITY
bool "Enable different security models"
depends on SYSFS
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-12 3:21 ` [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change jeffxu
@ 2025-02-12 3:31 ` Randy Dunlap
2025-02-12 3:40 ` Jeff Xu
2025-02-12 15:05 ` Liam R. Howlett
1 sibling, 1 reply; 43+ messages in thread
From: Randy Dunlap @ 2025-02-12 3:31 UTC (permalink / raw)
To: jeffxu, akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On 2/11/25 7:21 PM, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
>
> ---
> include/linux/userprocess.h | 18 ++++++++++++++++++
> init/Kconfig | 18 ++++++++++++++++++
> security/Kconfig | 18 ++++++++++++++++++
> 3 files changed, 54 insertions(+)
> create mode 100644 include/linux/userprocess.h
>
> diff --git a/init/Kconfig b/init/Kconfig
> index d0d021b3fa3b..892d2bcdf397 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> config ARCH_HAS_MEMBARRIER_SYNC_CORE
> bool
>
> +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> + bool
> + help
> + Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
> +
> + A 64-bit kernel is required for the memory sealing feature.
> + No specific hardware features from the CPU are needed.
> +
> + To enable this feature, the architecture needs to update their
> + speical mappings calls to include the sealing flag and confirm
special
> + that it doesn't unmap/remap system mappings during the life
> + time of the process. After the architecture enables this, a
> + distribution can set CONFIG_MSEAL_SYSTEM_MAPPING to manage access
> + to the feature.
> +
> + For complete descriptions of memory sealing, please see
> + Documentation/userspace-api/mseal.rst
> +
> config HAVE_PERF_EVENTS
> bool
> help
--
~Randy
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-12 3:31 ` Randy Dunlap
@ 2025-02-12 3:40 ` Jeff Xu
0 siblings, 0 replies; 43+ messages in thread
From: Jeff Xu @ 2025-02-12 3:40 UTC (permalink / raw)
To: Randy Dunlap
Cc: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Tue, Feb 11, 2025 at 7:31 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
>
>
> On 2/11/25 7:21 PM, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
>
> > ---
> > include/linux/userprocess.h | 18 ++++++++++++++++++
> > init/Kconfig | 18 ++++++++++++++++++
> > security/Kconfig | 18 ++++++++++++++++++
> > 3 files changed, 54 insertions(+)
> > create mode 100644 include/linux/userprocess.h
> >
>
> > diff --git a/init/Kconfig b/init/Kconfig
> > index d0d021b3fa3b..892d2bcdf397 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > bool
> >
> > +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> > + bool
> > + help
> > + Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
> > +
> > + A 64-bit kernel is required for the memory sealing feature.
> > + No specific hardware features from the CPU are needed.
> > +
> > + To enable this feature, the architecture needs to update their
> > + speical mappings calls to include the sealing flag and confirm
>
> special
>
Ack, will fix.
Thanks !
-Jeff
> > + that it doesn't unmap/remap system mappings during the life
> > + time of the process. After the architecture enables this, a
> > + distribution can set CONFIG_MSEAL_SYSTEM_MAPPING to manage access
> > + to the feature.
> > +
> > + For complete descriptions of memory sealing, please see
> > + Documentation/userspace-api/mseal.rst
> > +
> > config HAVE_PERF_EVENTS
> > bool
> > help
>
>
> --
> ~Randy
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-12 3:21 ` [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change jeffxu
2025-02-12 3:31 ` Randy Dunlap
@ 2025-02-12 15:05 ` Liam R. Howlett
2025-02-13 17:15 ` Jeff Xu
1 sibling, 1 reply; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-12 15:05 UTC (permalink / raw)
To: jeffxu
Cc: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
* jeffxu@chromium.org <jeffxu@chromium.org> [250211 22:22]:
> From: Jeff Xu <jeffxu@chromium.org>
>
> Provide infrastructure to mseal system mappings. Establish
> two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
> ARCH_HAS_MSEAL_SYSTEM_MAPPINGS) and a header file (userprocess.h)
> for future patches.
>
> As discussed during mseal() upstream process [1], mseal() protects
> the VMAs of a given virtual memory range against modifications, such
> as the read/write (RW) and no-execute (NX) bits. For complete
> descriptions of memory sealing, please see mseal.rst [2].
>
> The mseal() is useful to mitigate memory corruption issues where a
> corrupted pointer is passed to a memory management system. For
> example, such an attacker primitive can break control-flow integrity
> guarantees since read-only memory that is supposed to be trusted can
> become writable or .text pages can get remapped.
>
> The system mappings are readonly only, memory sealing can protect
> them from ever changing to writable or unmmap/remapped as different
> attributes.
>
> System mappings such as vdso, vvar, and sigpage (arm), vectors (arm)
> are created by the kernel during program initialization, and could
> be sealed after creation.
>
> Unlike the aforementioned mappings, the uprobe mapping is not
> established during program startup. However, its lifetime is the same
> as the process's lifetime [3]. It could be sealed from creation.
>
> The vsyscall on x86-64 uses a special address (0xffffffffff600000),
> which is outside the mm managed range. This means mprotect, munmap, and
> mremap won't work on the vsyscall. Since sealing doesn't enhance
> the vsyscall's security, it is skipped in this patch. If we ever seal
> the vsyscall, it is probably only for decorative purpose, i.e. showing
> the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
>
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the system mappings during restore operations. UML(User Mode Linux)
> and gVisor are also known to change the vdso/vvar mappings. Consequently,
> this feature cannot be universally enabled across all systems. As such,
> CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
>
> To support mseal of system mappings, architectures must define
> CONFIG_ARCH_HAS_MSEAL_SYSTEM_MAPPINGS and update their special mappings
> calls to pass mseal flag. Additionally, architectures must confirm they
> do not unmap/remap system mappings during the process lifetime.
>
> In this version, we've improved the handling of system mapping sealing from
> previous versions, instead of modifying the _install_special_mapping
> function itself, which would affect all architectures, we now call
> _install_special_mapping with a sealing flag only within the specific
> architecture that requires it. This targeted approach offers two key
> advantages: 1) It limits the code change's impact to the necessary
> architectures, and 2) It aligns with the software architecture by keeping
> the core memory management within the mm layer, while delegating the
> decision of sealing system mappings to the individual architecture, which
> is particularly relevant since 32-bit architectures never require sealing.
>
> Additionally, this patch introduces a new header,
> include/linux/usrprocess.h, which contains the mseal_system_mappings()
> function. This function helps the architecture determine if system
> mapping is enabled within the current kernel configuration. It can be
> extended in the future to handle opt-in/out prctl for enabling system
> mapping sealing at the process level or a kernel cmdline feature.
>
> A new header file was introduced because it was difficult to find the
> best location for this function. Although include/linux/mm.h was
> considered, this feature is more closely related to user processes
> than core memory management. Additionally, future prctl or kernel
> cmd-line implementations for this feature would not fit within the
> scope of core memory management or mseal.c. This is relevant because
> if we add unit-tests for mseal.c in the future, we would want to limit
> mseal.c's dependencies to core memory management.
>
> Prior to this patch series, we explored sealing special mappings from
> userspace using glibc's dynamic linker. This approach revealed several
> issues:
> - The PT_LOAD header may report an incorrect length for vdso, (smaller
> than its actual size). The dynamic linker, which relies on PT_LOAD
> information to determine mapping size, would then split and partially
> seal the vdso mapping. Since each architecture has its own vdso/vvar
> code, fixing this in the kernel would require going through each
> archiecture. Our initial goal was to enable sealing readonly mappings,
> e.g. .text, across all architectures, sealing vdso from kernel since
> creation appears to be simpler than sealing vdso at glibc.
> - The [vvar] mapping header only contains address information, not length
> information. Similar issues might exist for other special mappings.
> - Mappings like uprobe are not covered by the dynamic linker,
> and there is no effective solution for them.
>
> This feature's security enhancements will benefit ChromeOS, Android,
> and other high security systems.
>
> Testing:
> This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
> - Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
> i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
> - Passing various automation tests (e.g. pre-checkin) on ChromeOS and
> Android to ensure the sealing doesn't affect the functionality of
> Chromebook and Android phone.
>
> I also tested the feature on Ubuntu on x86-64:
> - With config disabled, vdso/vvar is not sealed,
> - with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
> normal operations such as browsing the web, open/edit doc are OK.
>
> In addition, Benjamin Berg tested this on UML.
>
> Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
> Link: Documentation/userspace-api/mseal.rst [2]
> Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
I found it odd that the history and V4 links were not here, but I
realised that was in 0/7 instead.
>
> Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> ---
> include/linux/userprocess.h | 18 ++++++++++++++++++
> init/Kconfig | 18 ++++++++++++++++++
> security/Kconfig | 18 ++++++++++++++++++
> 3 files changed, 54 insertions(+)
> create mode 100644 include/linux/userprocess.h
>
> diff --git a/include/linux/userprocess.h b/include/linux/userprocess.h
> new file mode 100644
> index 000000000000..bd11e2e972c5
> --- /dev/null
> +++ b/include/linux/userprocess.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_USER_PROCESS_H
> +#define _LINUX_USER_PROCESS_H
> +#include <linux/mm.h>
Why is this in a new file and not mm.h directly? Anyone who needs to
use this will pull in mm.h anyways, and probably already has mm.h
included.
> +
> +/*
> + * mseal of userspace process's system mappings.
> + */
> +static inline unsigned long mseal_system_mappings(void)
> +{
> +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> + return VM_SEALED;
> +#else
> + return 0;
> +#endif
> +}
> +
> +#endif
Looking in mm.h, there are examples of config checks which either set
the bit or set it to 0. This means you wouldn't need the function at
all and the precompiler would do all the work (although with a static
inline, I'm not sure it would make a big difference in instructions).
For instance, you could do:
#ifdef CONFIG_64BIT
/* VM is sealed, in vm_flags */
#define VM_SEALED _BITUL(63)
#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
#define VM_SYSTEM_SEAL VM_SEALED
else
#define VM_SYSTEM_SEAL VM_NONE
#endif
#else /* CONFIG_64BIT */
#define VM_SYSTEM_SEAL VM_NONE
#endif
Then you can use VM_SYSTEM_SEAL in the system mappings function in the
list of flags instead of having a variable + calling the static
function.
I didn't look close enough to see if the 32bit version is necessary.
> diff --git a/init/Kconfig b/init/Kconfig
> index d0d021b3fa3b..892d2bcdf397 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> config ARCH_HAS_MEMBARRIER_SYNC_CORE
> bool
>
> +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS is more clear. HAS may mean it's
supported or it could mean it's enabled. SUPPORTS is more clear that
this is set if an arch supports the feature. After all, I think HAS
here means it "has support for" mseal system mappings.
I see that both are used (HAS and SUPPORTS), but I'm still not sure what
HAS means in other contexts without digging.
> + bool
> + help
> + Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
> +
> + A 64-bit kernel is required for the memory sealing feature.
> + No specific hardware features from the CPU are needed.
> +
> + To enable this feature, the architecture needs to update their
> + speical mappings calls to include the sealing flag and confirm
> + that it doesn't unmap/remap system mappings during the life
> + time of the process. After the architecture enables this, a
> + distribution can set CONFIG_MSEAL_SYSTEM_MAPPING to manage access
> + to the feature.
> +
> + For complete descriptions of memory sealing, please see
> + Documentation/userspace-api/mseal.rst
> +
> config HAVE_PERF_EVENTS
> bool
> help
> diff --git a/security/Kconfig b/security/Kconfig
> index f10dbf15c294..bfb35fc7a3c6 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -51,6 +51,24 @@ config PROC_MEM_NO_FORCE
>
> endchoice
>
> +config MSEAL_SYSTEM_MAPPINGS
> + bool "mseal system mappings"
> + depends on 64BIT
> + depends on ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> + depends on !CHECKPOINT_RESTORE
> + help
> + Seal system mappings such as vdso, vvar, sigpage, uprobes, etc.
> +
> + A 64-bit kernel is required for the memory sealing feature.
> + No specific hardware features from the CPU are needed.
> +
> + Note: CHECKPOINT_RESTORE, UML, gVisor are known to relocate or
> + unmap system mapping, therefore this config can't be enabled
> + universally.
I guess add rr to the list, since that's also reported to have issues as
well.
> +
> + For complete descriptions of memory sealing, please see
> + Documentation/userspace-api/mseal.rst
> +
> config SECURITY
> bool "Enable different security models"
> depends on SYSFS
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-12 15:05 ` Liam R. Howlett
@ 2025-02-13 17:15 ` Jeff Xu
2025-02-13 18:29 ` Liam R. Howlett
0 siblings, 1 reply; 43+ messages in thread
From: Jeff Xu @ 2025-02-13 17:15 UTC (permalink / raw)
To: Liam R. Howlett, jeffxu, akpm, keescook, jannh, torvalds, vbabka,
lorenzo.stoakes, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 7:05 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
...
> >
> > In this version, we've improved the handling of system mapping sealing from
> > previous versions, instead of modifying the _install_special_mapping
> > function itself, which would affect all architectures, we now call
> > _install_special_mapping with a sealing flag only within the specific
> > architecture that requires it. This targeted approach offers two key
> > advantages: 1) It limits the code change's impact to the necessary
> > architectures, and 2) It aligns with the software architecture by keeping
> > the core memory management within the mm layer, while delegating the
> > decision of sealing system mappings to the individual architecture, which
> > is particularly relevant since 32-bit architectures never require sealing.
> >
> > Additionally, this patch introduces a new header,
> > include/linux/usrprocess.h, which contains the mseal_system_mappings()
> > function. This function helps the architecture determine if system
> > mapping is enabled within the current kernel configuration. It can be
> > extended in the future to handle opt-in/out prctl for enabling system
> > mapping sealing at the process level or a kernel cmdline feature.
> >
> > A new header file was introduced because it was difficult to find the
> > best location for this function. Although include/linux/mm.h was
> > considered, this feature is more closely related to user processes
> > than core memory management. Additionally, future prctl or kernel
> > cmd-line implementations for this feature would not fit within the
> > scope of core memory management or mseal.c. This is relevant because
> > if we add unit-tests for mseal.c in the future, we would want to limit
> > mseal.c's dependencies to core memory management.
> >
...
> >
> > diff --git a/include/linux/userprocess.h b/include/linux/userprocess.h
> > new file mode 100644
> > index 000000000000..bd11e2e972c5
> > --- /dev/null
> > +++ b/include/linux/userprocess.h
> > @@ -0,0 +1,18 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_USER_PROCESS_H
> > +#define _LINUX_USER_PROCESS_H
> > +#include <linux/mm.h>
>
> Why is this in a new file and not mm.h directly? Anyone who needs to
> use this will pull in mm.h anyways, and probably already has mm.h
> included.
>
The commit message includes the reason. I've snipped and left the
relevant portion for easy reference, please see the beginning of this
email.
> > +
> > +/*
> > + * mseal of userspace process's system mappings.
> > + */
> > +static inline unsigned long mseal_system_mappings(void)
> > +{
> > +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> > + return VM_SEALED;
> > +#else
> > + return 0;
> > +#endif
> > +}
> > +
> > +#endif
>
> Looking in mm.h, there are examples of config checks which either set
> the bit or set it to 0. This means you wouldn't need the function at
> all and the precompiler would do all the work (although with a static
> inline, I'm not sure it would make a big difference in instructions).
>
> For instance, you could do:
> #ifdef CONFIG_64BIT
> /* VM is sealed, in vm_flags */
> #define VM_SEALED _BITUL(63)
>
> #ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> #define VM_SYSTEM_SEAL VM_SEALED
> else
> #define VM_SYSTEM_SEAL VM_NONE
> #endif
>
> #else /* CONFIG_64BIT */
> #define VM_SYSTEM_SEAL VM_NONE
> #endif
>
> Then you can use VM_SYSTEM_SEAL in the system mappings function in the
> list of flags instead of having a variable + calling the static
> function.
>
> I didn't look close enough to see if the 32bit version is necessary.
>
I'm aware of the other pattern.
VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
the build. This is intentional. Any 32-bit code trying to use the
sealing function or the VM_SEALED flag will immediately fail
compilation. This makes it easier to identify incorrect usage.
> > diff --git a/init/Kconfig b/init/Kconfig
> > index d0d021b3fa3b..892d2bcdf397 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > bool
> >
> > +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
>
> ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS is more clear. HAS may mean it's
> supported or it could mean it's enabled. SUPPORTS is more clear that
> this is set if an arch supports the feature. After all, I think HAS
> here means it "has support for" mseal system mappings.
>
> I see that both are used (HAS and SUPPORTS), but I'm still not sure what
> HAS means in other contexts without digging.
>
The existing pattern is to indicate arch support with
CONFIG_ARCH_HAS_N and security/KConfig to have CONFIG_N as the main
switch. For example, CONFIG_ARCH_HAS_FORTIFY_SOURCE and
CONFIG_FORTIFY_SOURCE. Since the MSEAL_SYSTEM_MAPPINGS is placed in
security/Kconfig, hence I'm following the existing pattern in
security/Kconfig.
> > diff --git a/security/Kconfig b/security/Kconfig
> > index f10dbf15c294..bfb35fc7a3c6 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -51,6 +51,24 @@ config PROC_MEM_NO_FORCE
> >
> > endchoice
> >
> > +config MSEAL_SYSTEM_MAPPINGS
> > + bool "mseal system mappings"
> > + depends on 64BIT
> > + depends on ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> > + depends on !CHECKPOINT_RESTORE
> > + help
> > + Seal system mappings such as vdso, vvar, sigpage, uprobes, etc.
> > +
> > + A 64-bit kernel is required for the memory sealing feature.
> > + No specific hardware features from the CPU are needed.
> > +
> > + Note: CHECKPOINT_RESTORE, UML, gVisor are known to relocate or
> > + unmap system mapping, therefore this config can't be enabled
> > + universally.
>
> I guess add rr to the list, since that's also reported to have issues as
> well.
>
sure.
Thanks for reviewing.
-Jeff
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-13 17:15 ` Jeff Xu
@ 2025-02-13 18:29 ` Liam R. Howlett
2025-02-13 20:11 ` Kees Cook
0 siblings, 1 reply; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-13 18:29 UTC (permalink / raw)
To: Jeff Xu
Cc: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
* Jeff Xu <jeffxu@chromium.org> [250213 12:17]:
> On Wed, Feb 12, 2025 at 7:05 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> ...
> > >
> > > In this version, we've improved the handling of system mapping sealing from
> > > previous versions, instead of modifying the _install_special_mapping
> > > function itself, which would affect all architectures, we now call
> > > _install_special_mapping with a sealing flag only within the specific
> > > architecture that requires it. This targeted approach offers two key
> > > advantages: 1) It limits the code change's impact to the necessary
> > > architectures, and 2) It aligns with the software architecture by keeping
> > > the core memory management within the mm layer, while delegating the
> > > decision of sealing system mappings to the individual architecture, which
> > > is particularly relevant since 32-bit architectures never require sealing.
> > >
> > > Additionally, this patch introduces a new header,
> > > include/linux/usrprocess.h, which contains the mseal_system_mappings()
> > > function. This function helps the architecture determine if system
> > > mapping is enabled within the current kernel configuration. It can be
> > > extended in the future to handle opt-in/out prctl for enabling system
> > > mapping sealing at the process level or a kernel cmdline feature.
> > >
> > > A new header file was introduced because it was difficult to find the
> > > best location for this function. Although include/linux/mm.h was
> > > considered, this feature is more closely related to user processes
> > > than core memory management. Additionally, future prctl or kernel
> > > cmd-line implementations for this feature would not fit within the
> > > scope of core memory management or mseal.c. This is relevant because
> > > if we add unit-tests for mseal.c in the future, we would want to limit
> > > mseal.c's dependencies to core memory management.
> > >
> ...
> > >
> > > diff --git a/include/linux/userprocess.h b/include/linux/userprocess.h
> > > new file mode 100644
> > > index 000000000000..bd11e2e972c5
> > > --- /dev/null
> > > +++ b/include/linux/userprocess.h
> > > @@ -0,0 +1,18 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef _LINUX_USER_PROCESS_H
> > > +#define _LINUX_USER_PROCESS_H
> > > +#include <linux/mm.h>
> >
> > Why is this in a new file and not mm.h directly? Anyone who needs to
> > use this will pull in mm.h anyways, and probably already has mm.h
> > included.
> >
> The commit message includes the reason. I've snipped and left the
> relevant portion for easy reference, please see the beginning of this
> email.
This is a _really_ good reason NOT to stack the entire patch set
description into a single patch description. Within the above 111 lines
of text, I missed that statement.
When Andrew takes the first patch, he'll put the patch description in
there, and then we can actually concentrate on the patch with the
limited context of what is going on.
If it doesn't go through Andrew's branch, then we can fall back to each
patch standing on its own with the change stating why things are done.
Also, I don't agree with your stated reason for a new header.
Please remove this header until it is needed. It can be added later if
it is needed. If we all had tiny headers because we might need them in
the future, we'd have serious issues finding anything and the list of
included headers would be huge.
You have added unnecessary changes and complications to this patch set
on v5.
>
> > > +
> > > +/*
> > > + * mseal of userspace process's system mappings.
> > > + */
> > > +static inline unsigned long mseal_system_mappings(void)
> > > +{
> > > +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> > > + return VM_SEALED;
> > > +#else
> > > + return 0;
> > > +#endif
> > > +}
> > > +
> > > +#endif
> >
> > Looking in mm.h, there are examples of config checks which either set
> > the bit or set it to 0. This means you wouldn't need the function at
> > all and the precompiler would do all the work (although with a static
> > inline, I'm not sure it would make a big difference in instructions).
> >
> > For instance, you could do:
> > #ifdef CONFIG_64BIT
> > /* VM is sealed, in vm_flags */
> > #define VM_SEALED _BITUL(63)
> >
> > #ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> > #define VM_SYSTEM_SEAL VM_SEALED
> > else
> > #define VM_SYSTEM_SEAL VM_NONE
> > #endif
> >
> > #else /* CONFIG_64BIT */
> > #define VM_SYSTEM_SEAL VM_NONE
> > #endif
> >
> > Then you can use VM_SYSTEM_SEAL in the system mappings function in the
> > list of flags instead of having a variable + calling the static
> > function.
> >
> > I didn't look close enough to see if the 32bit version is necessary.
> >
> I'm aware of the other pattern.
>
> VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> the build. This is intentional. Any 32-bit code trying to use the
> sealing function or the VM_SEALED flag will immediately fail
> compilation. This makes it easier to identify incorrect usage.
So you are against using the #define because the VM_SYSTEM_SEAL will be
defined, even though it will be VM_NONE? This is no worse than a
function that returns 0, and it aligns better with what we have today in
that it can be put in the list of other flags.
Also, my vote for where you should put this code is clear: it should
live with the mseal #define in mm.h. You're going to need mm.h anyways,
and that's a big hint as to where it should live.
>
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index d0d021b3fa3b..892d2bcdf397 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > > config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > > bool
> > >
> > > +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> >
> > ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS is more clear. HAS may mean it's
> > supported or it could mean it's enabled. SUPPORTS is more clear that
> > this is set if an arch supports the feature. After all, I think HAS
> > here means it "has support for" mseal system mappings.
> >
> > I see that both are used (HAS and SUPPORTS), but I'm still not sure what
> > HAS means in other contexts without digging.
> >
> The existing pattern is to indicate arch support with
> CONFIG_ARCH_HAS_N and security/KConfig to have CONFIG_N as the main
> switch. For example, CONFIG_ARCH_HAS_FORTIFY_SOURCE and
> CONFIG_FORTIFY_SOURCE. Since the MSEAL_SYSTEM_MAPPINGS is placed in
> security/Kconfig, hence I'm following the existing pattern in
> security/Kconfig.
Okay, thanks. I really think SUPPORTS is more clear, but sticking with
whatever your area uses is also fine.
...
Thanks,
Liam
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-13 18:29 ` Liam R. Howlett
@ 2025-02-13 20:11 ` Kees Cook
2025-02-13 20:54 ` Liam R. Howlett
0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2025-02-13 20:11 UTC (permalink / raw)
To: Liam R. Howlett, Jeff Xu, akpm, jannh, torvalds, vbabka,
lorenzo.stoakes, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 01:29:46PM -0500, Liam R. Howlett wrote:
> * Jeff Xu <jeffxu@chromium.org> [250213 12:17]:
> > On Wed, Feb 12, 2025 at 7:05 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > >
> > ...
> > > >
> > > > In this version, we've improved the handling of system mapping sealing from
> > > > previous versions, instead of modifying the _install_special_mapping
> > > > function itself, which would affect all architectures, we now call
> > > > _install_special_mapping with a sealing flag only within the specific
> > > > architecture that requires it. This targeted approach offers two key
> > > > advantages: 1) It limits the code change's impact to the necessary
> > > > architectures, and 2) It aligns with the software architecture by keeping
> > > > the core memory management within the mm layer, while delegating the
> > > > decision of sealing system mappings to the individual architecture, which
> > > > is particularly relevant since 32-bit architectures never require sealing.
> > > >
> > > > Additionally, this patch introduces a new header,
> > > > include/linux/usrprocess.h, which contains the mseal_system_mappings()
> > > > function. This function helps the architecture determine if system
> > > > mapping is enabled within the current kernel configuration. It can be
> > > > extended in the future to handle opt-in/out prctl for enabling system
> > > > mapping sealing at the process level or a kernel cmdline feature.
> > > >
> > > > A new header file was introduced because it was difficult to find the
> > > > best location for this function. Although include/linux/mm.h was
> > > > considered, this feature is more closely related to user processes
> > > > than core memory management. Additionally, future prctl or kernel
> > > > cmd-line implementations for this feature would not fit within the
> > > > scope of core memory management or mseal.c. This is relevant because
> > > > if we add unit-tests for mseal.c in the future, we would want to limit
> > > > mseal.c's dependencies to core memory management.
> > > >
> > ...
> > > >
> > > > diff --git a/include/linux/userprocess.h b/include/linux/userprocess.h
> > > > new file mode 100644
> > > > index 000000000000..bd11e2e972c5
> > > > --- /dev/null
> > > > +++ b/include/linux/userprocess.h
> > > > @@ -0,0 +1,18 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#ifndef _LINUX_USER_PROCESS_H
> > > > +#define _LINUX_USER_PROCESS_H
> > > > +#include <linux/mm.h>
> [...]
>
> Also, I don't agree with your stated reason for a new header.
>
> Please remove this header until it is needed. It can be added later if
> it is needed. If we all had tiny headers because we might need them in
> the future, we'd have serious issues finding anything and the list of
> included headers would be huge.
>
> You have added unnecessary changes and complications to this patch set
> on v5.
In all fairness, I think v5 is significantly less complex overall. Jeff
used his best judgement with a new header, and I don't think it's
unreasonable. That it is unwanted is fine, mm.h it is, but I think it's
clear the intent was trying to make this as least burdensome for the mm
subsystem as possible.
> > > > +/*
> > > > + * mseal of userspace process's system mappings.
> > > > + */
> > > > +static inline unsigned long mseal_system_mappings(void)
> > > > +{
> > > > +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> > > > + return VM_SEALED;
> > > > +#else
> > > > + return 0;
> > > > +#endif
> > > > +}
> > > > +
> > > > +#endif
> > >
> > > Looking in mm.h, there are examples of config checks which either set
> > > the bit or set it to 0. This means you wouldn't need the function at
> > > all and the precompiler would do all the work (although with a static
> > > inline, I'm not sure it would make a big difference in instructions).
> > >
> > > For instance, you could do:
> > > #ifdef CONFIG_64BIT
> > > /* VM is sealed, in vm_flags */
> > > #define VM_SEALED _BITUL(63)
> > >
> > > #ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> > > #define VM_SYSTEM_SEAL VM_SEALED
> > > else
> > > #define VM_SYSTEM_SEAL VM_NONE
> > > #endif
> > >
> > > #else /* CONFIG_64BIT */
> > > #define VM_SYSTEM_SEAL VM_NONE
> > > #endif
> > >
> > > Then you can use VM_SYSTEM_SEAL in the system mappings function in the
> > > list of flags instead of having a variable + calling the static
> > > function.
> > >
> > > I didn't look close enough to see if the 32bit version is necessary.
> > >
> > I'm aware of the other pattern.
> >
> > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > the build. This is intentional. Any 32-bit code trying to use the
> > sealing function or the VM_SEALED flag will immediately fail
> > compilation. This makes it easier to identify incorrect usage.
>
> So you are against using the #define because the VM_SYSTEM_SEAL will be
> defined, even though it will be VM_NONE? This is no worse than a
> function that returns 0, and it aligns better with what we have today in
> that it can be put in the list of other flags.
When I was reading through all of this and considering the history of
its development goals, it strikes me that a function is easier for the
future if/when this can be made a boot-time setting.
If mm maintainers prefer a #define for now, that's fine of course. The
value of VM_SYSTEM_SEAL can be VM_NONE on 32-bit.
> Also, my vote for where you should put this code is clear: it should
> live with the mseal #define in mm.h. You're going to need mm.h anyways,
> and that's a big hint as to where it should live.
Sounds good.
> > > > diff --git a/init/Kconfig b/init/Kconfig
> > > > index d0d021b3fa3b..892d2bcdf397 100644
> > > > --- a/init/Kconfig
> > > > +++ b/init/Kconfig
> > > > @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > > > config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > > > bool
> > > >
> > > > +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> > >
> > > ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS is more clear. HAS may mean it's
> > > supported or it could mean it's enabled. SUPPORTS is more clear that
> > > this is set if an arch supports the feature. After all, I think HAS
> > > here means it "has support for" mseal system mappings.
> > >
> > > I see that both are used (HAS and SUPPORTS), but I'm still not sure what
> > > HAS means in other contexts without digging.
> > >
> > The existing pattern is to indicate arch support with
> > CONFIG_ARCH_HAS_N and security/KConfig to have CONFIG_N as the main
> > switch. For example, CONFIG_ARCH_HAS_FORTIFY_SOURCE and
> > CONFIG_FORTIFY_SOURCE. Since the MSEAL_SYSTEM_MAPPINGS is placed in
> > security/Kconfig, hence I'm following the existing pattern in
> > security/Kconfig.
>
> Okay, thanks. I really think SUPPORTS is more clear, but sticking with
> whatever your area uses is also fine.
Yeah, I've really scratched my head over what the best practice is here.
There's a real mixture of "HAS" vs "SUPPORTS" across both hardware
architectural support, software features, interface implementations,
and compiler behavior that bridges between those. When I've looked in
the past, it seemed that "HAS" was in the majority, but I've never been
able to make sense of it.
Let me look again. Today, HAS shows:
$ git grep 'config .*_HAS_' | grep -v Documentation/ | \
awk '{print $2}' | cut -d_ -f1 | sort | uniq -c | sort -n
1 ARM
1 MAC80211
1 PGTABLE
1 PPC
1 RUSTC
1 USB
2 ARM64
2 SIBYTE
2 SOC
3 FS
3 PAHOLE
6 SGI
6 TOOLCHAIN
10 ARC
20 AS
32 SYS
34 CPU
38 CC
105 ARCH
Looking through them, it's tools (CC, AS, PAHOLE, etc), and
system/cpu/architecture prefixes, with ARCH being a clear winner.
SUPPORTS looks similar, but isn't as widely used:
$ git grep 'config .*_SUPPORTS_' | grep -v Documentation/ | \
awk '{print $2}' | cut -d_ -f1 | sort | uniq -c | sort -n
1 PPC64
1 X86
2 CLANG
2 GCC
2 RUSTC
8 CPU
38 SYS
71 ARCH
The mips arch is using SYS, and distinguishes between HAS and SUPPORTS
in the sense of "this kernel HAS support for CPU $foo, which SUPPORTS
features x, y, z".
Looking through ARCH_SUPPORTS, it's all software features. Looking
through ARCH_HAS, it looks like a mix of hardware features
and software features. Some software features are more about
implementation availability, though (so "HAS" makes sense,
but should maybe be "IMPLEMENTS"?) e.g. ARCH_HAS_SYSCALL_WRAPPER,
ARCH_HAS_STRNLEN_USER, ARCH_HAS_FORTIFY_SOURCE, ARCH_HAS_ELF_RANDOMIZE,
ARCH_HAS_STRICT_KERNEL_RWX, ARCH_HAS_STRICT_MODULE_RWX.
So, I think I'd agree: this is about a software features and not an API
implementation nor hardware feature, so let's use ARCH_SUPPORTS_.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-13 20:11 ` Kees Cook
@ 2025-02-13 20:54 ` Liam R. Howlett
2025-02-13 22:00 ` Jeff Xu
0 siblings, 1 reply; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-13 20:54 UTC (permalink / raw)
To: Kees Cook
Cc: Jeff Xu, akpm, jannh, torvalds, vbabka, lorenzo.stoakes,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
* Kees Cook <kees@kernel.org> [250213 15:11]:
> On Thu, Feb 13, 2025 at 01:29:46PM -0500, Liam R. Howlett wrote:
> > * Jeff Xu <jeffxu@chromium.org> [250213 12:17]:
> > > On Wed, Feb 12, 2025 at 7:05 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > > >
> > > ...
> > > > >
> > > > > In this version, we've improved the handling of system mapping sealing from
> > > > > previous versions, instead of modifying the _install_special_mapping
> > > > > function itself, which would affect all architectures, we now call
> > > > > _install_special_mapping with a sealing flag only within the specific
> > > > > architecture that requires it. This targeted approach offers two key
> > > > > advantages: 1) It limits the code change's impact to the necessary
> > > > > architectures, and 2) It aligns with the software architecture by keeping
> > > > > the core memory management within the mm layer, while delegating the
> > > > > decision of sealing system mappings to the individual architecture, which
> > > > > is particularly relevant since 32-bit architectures never require sealing.
> > > > >
> > > > > Additionally, this patch introduces a new header,
> > > > > include/linux/usrprocess.h, which contains the mseal_system_mappings()
> > > > > function. This function helps the architecture determine if system
> > > > > mapping is enabled within the current kernel configuration. It can be
> > > > > extended in the future to handle opt-in/out prctl for enabling system
> > > > > mapping sealing at the process level or a kernel cmdline feature.
> > > > >
> > > > > A new header file was introduced because it was difficult to find the
> > > > > best location for this function. Although include/linux/mm.h was
> > > > > considered, this feature is more closely related to user processes
> > > > > than core memory management. Additionally, future prctl or kernel
> > > > > cmd-line implementations for this feature would not fit within the
> > > > > scope of core memory management or mseal.c. This is relevant because
> > > > > if we add unit-tests for mseal.c in the future, we would want to limit
> > > > > mseal.c's dependencies to core memory management.
> > > > >
> > > ...
> > > > >
> > > > > diff --git a/include/linux/userprocess.h b/include/linux/userprocess.h
> > > > > new file mode 100644
> > > > > index 000000000000..bd11e2e972c5
> > > > > --- /dev/null
> > > > > +++ b/include/linux/userprocess.h
> > > > > @@ -0,0 +1,18 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > +#ifndef _LINUX_USER_PROCESS_H
> > > > > +#define _LINUX_USER_PROCESS_H
> > > > > +#include <linux/mm.h>
> > [...]
> >
> > Also, I don't agree with your stated reason for a new header.
> >
> > Please remove this header until it is needed. It can be added later if
> > it is needed. If we all had tiny headers because we might need them in
> > the future, we'd have serious issues finding anything and the list of
> > included headers would be huge.
> >
> > You have added unnecessary changes and complications to this patch set
> > on v5.
>
> In all fairness, I think v5 is significantly less complex overall. Jeff
> used his best judgement with a new header, and I don't think it's
> unreasonable. That it is unwanted is fine, mm.h it is, but I think it's
> clear the intent was trying to make this as least burdensome for the mm
> subsystem as possible.
This version is certainly easier to follow, but it's still more
complicated than it needs to be. Adding custom headers seems like an
unnecessary addition to previous versions.
I didn't mean this to be a jab or upsetting - and sorry to both of you,
I can see how it could be taken this way.
I was trying to point out that this is the same sort of thing that got
Jeff into trouble with Linus about over-engineering the v8 of the mseal
patch set [1].
...
> > >
> > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > > the build. This is intentional. Any 32-bit code trying to use the
> > > sealing function or the VM_SEALED flag will immediately fail
> > > compilation. This makes it easier to identify incorrect usage.
> >
> > So you are against using the #define because the VM_SYSTEM_SEAL will be
> > defined, even though it will be VM_NONE? This is no worse than a
> > function that returns 0, and it aligns better with what we have today in
> > that it can be put in the list of other flags.
>
> When I was reading through all of this and considering the history of
> its development goals, it strikes me that a function is easier for the
> future if/when this can be made a boot-time setting.
>
Reworking this change to function as a boot-time parameter, or whatever,
would not be a significant amount of work, if/when the time comes.
Since we don't know what the future holds, it it unnecessary to engineer
in a potential change for a future version when the change is easy
enough to make later and keep the code cleaner now.
> If mm maintainers prefer a #define for now, that's fine of course. The
> value of VM_SYSTEM_SEAL can be VM_NONE on 32-bit.
Thanks. I think having a flag with VM_NONE on 32-bit is just as sane as
a "flags |= system_seal()" call that unconditionally returns 0 on
32-bit.
>
> > Also, my vote for where you should put this code is clear: it should
> > live with the mseal #define in mm.h. You're going to need mm.h anyways,
> > and that's a big hint as to where it should live.
>
> Sounds good.
>
> > > > > diff --git a/init/Kconfig b/init/Kconfig
> > > > > index d0d021b3fa3b..892d2bcdf397 100644
> > > > > --- a/init/Kconfig
> > > > > +++ b/init/Kconfig
> > > > > @@ -1882,6 +1882,24 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > > > > config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > > > > bool
> > > > >
> > > > > +config ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
> > > >
> > > > ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS is more clear. HAS may mean it's
> > > > supported or it could mean it's enabled. SUPPORTS is more clear that
> > > > this is set if an arch supports the feature. After all, I think HAS
> > > > here means it "has support for" mseal system mappings.
> > > >
> > > > I see that both are used (HAS and SUPPORTS), but I'm still not sure what
> > > > HAS means in other contexts without digging.
> > > >
> > > The existing pattern is to indicate arch support with
> > > CONFIG_ARCH_HAS_N and security/KConfig to have CONFIG_N as the main
> > > switch. For example, CONFIG_ARCH_HAS_FORTIFY_SOURCE and
> > > CONFIG_FORTIFY_SOURCE. Since the MSEAL_SYSTEM_MAPPINGS is placed in
> > > security/Kconfig, hence I'm following the existing pattern in
> > > security/Kconfig.
> >
> > Okay, thanks. I really think SUPPORTS is more clear, but sticking with
> > whatever your area uses is also fine.
>
> Yeah, I've really scratched my head over what the best practice is here.
> There's a real mixture of "HAS" vs "SUPPORTS" across both hardware
> architectural support, software features, interface implementations,
> and compiler behavior that bridges between those. When I've looked in
> the past, it seemed that "HAS" was in the majority, but I've never been
> able to make sense of it.
>
> Let me look again. Today, HAS shows:
>
> $ git grep 'config .*_HAS_' | grep -v Documentation/ | \
> awk '{print $2}' | cut -d_ -f1 | sort | uniq -c | sort -n
> 1 ARM
> 1 MAC80211
> 1 PGTABLE
> 1 PPC
> 1 RUSTC
> 1 USB
> 2 ARM64
> 2 SIBYTE
> 2 SOC
> 3 FS
> 3 PAHOLE
> 6 SGI
> 6 TOOLCHAIN
> 10 ARC
> 20 AS
> 32 SYS
> 34 CPU
> 38 CC
> 105 ARCH
>
> Looking through them, it's tools (CC, AS, PAHOLE, etc), and
> system/cpu/architecture prefixes, with ARCH being a clear winner.
>
> SUPPORTS looks similar, but isn't as widely used:
>
> $ git grep 'config .*_SUPPORTS_' | grep -v Documentation/ | \
> awk '{print $2}' | cut -d_ -f1 | sort | uniq -c | sort -n
> 1 PPC64
> 1 X86
> 2 CLANG
> 2 GCC
> 2 RUSTC
> 8 CPU
> 38 SYS
> 71 ARCH
>
> The mips arch is using SYS, and distinguishes between HAS and SUPPORTS
> in the sense of "this kernel HAS support for CPU $foo, which SUPPORTS
> features x, y, z".
>
> Looking through ARCH_SUPPORTS, it's all software features. Looking
> through ARCH_HAS, it looks like a mix of hardware features
> and software features. Some software features are more about
> implementation availability, though (so "HAS" makes sense,
> but should maybe be "IMPLEMENTS"?) e.g. ARCH_HAS_SYSCALL_WRAPPER,
> ARCH_HAS_STRNLEN_USER, ARCH_HAS_FORTIFY_SOURCE, ARCH_HAS_ELF_RANDOMIZE,
> ARCH_HAS_STRICT_KERNEL_RWX, ARCH_HAS_STRICT_MODULE_RWX.
>
> So, I think I'd agree: this is about a software features and not an API
> implementation nor hardware feature, so let's use ARCH_SUPPORTS_.
This makes sense. I grepped for both as well and found that HAS
outnumbers SUPPORTS, but there are a lot of each. I figured using
whatever the subsystem uses is probably the right answer - your answer
is better.
Seeing them in the same kconfig option together gave me pause as to why
they were different. Thanks for digging into this!
Regards,
Liam
[1] https://lore.kernel.org/linux-mm/CAHk-=wjGGgfAoiEdPqLdib7VvQgG7uVXpTPzJ9jTW0HesRpPwQ@mail.gmail.com/
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-13 20:54 ` Liam R. Howlett
@ 2025-02-13 22:00 ` Jeff Xu
2025-02-14 0:14 ` Liam R. Howlett
0 siblings, 1 reply; 43+ messages in thread
From: Jeff Xu @ 2025-02-13 22:00 UTC (permalink / raw)
To: Liam R. Howlett, Kees Cook, Jeff Xu, akpm, jannh, torvalds,
vbabka, lorenzo.stoakes, adhemerval.zanella, oleg, avagin,
benjamin, linux-kernel, linux-hardening, linux-mm, jorgelo,
sroettger, hch, ojeda, thomas.weissschuh, adobriyan, johannes,
pedro.falcato, hca, willy, anna-maria, mark.rutland,
linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 12:54 PM Liam R. Howlett
<Liam.Howlett@oracle.com> wrote:
> > > >
> > > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > > > the build. This is intentional. Any 32-bit code trying to use the
> > > > sealing function or the VM_SEALED flag will immediately fail
> > > > compilation. This makes it easier to identify incorrect usage.
> > >
> > > So you are against using the #define because the VM_SYSTEM_SEAL will be
> > > defined, even though it will be VM_NONE? This is no worse than a
> > > function that returns 0, and it aligns better with what we have today in
> > > that it can be put in the list of other flags.
> >
> > When I was reading through all of this and considering the history of
> > its development goals, it strikes me that a function is easier for the
> > future if/when this can be made a boot-time setting.
> >
>
> Reworking this change to function as a boot-time parameter, or whatever,
> would not be a significant amount of work, if/when the time comes.
> Since we don't know what the future holds, it it unnecessary to engineer
> in a potential change for a future version when the change is easy
> enough to make later and keep the code cleaner now.
>
Sure, I will put the function in mm.h for this patch. We can find a
proper place when it is time to implement the boot-time parameter
change.
The call stack for sealing system mapping is something like below:
install_special_mapping (mm/mmap.c)
map_vdso (arch/x86/entry/vdso/vma.c)
load_elf_binary (fs/binfmt_elf.c)
load_misc_binary (fs/binfmt_misc.c)
bprm_execve (fs/exec.c)
do_execveat_common
__x64_sys_execve
do_syscall_64
IMO, there's a clear divide between the API implementer and the API user.
mm and mm.h are the providers, offering the core mm functionality
through APIs/data structures like install_special_mapping().
The exe layer (bprm_execve, map_vdso, etc) is the consumer of the
install_special_mapping.
The logic related to checking if sealing system mapping is enabled
belongs to the exe layer.
>
> > If mm maintainers prefer a #define for now, that's fine of course. The
> > value of VM_SYSTEM_SEAL can be VM_NONE on 32-bit.
>
> Thanks. I think having a flag with VM_NONE on 32-bit is just as sane as
> a "flags |= system_seal()" call that unconditionally returns 0 on
> 32-bit.
>
Consider the case below in src/third_party/kernel/v6.6/fs/proc/task_mmu.c,
#ifdef CONFIG_64BIT
[ilog2(VM_SEALED)] = "sl",
#endif
If #ifdef CONFIG_64BIT is missing, it won't be detected during compile time.
Setting VM_SEALED to VM_NONE could simplify the coding in some cases
(get/set case), but it might make other cases error prone.
I would prefer to not have VM_SEALED for 32 bit.
-Jeff
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-13 22:00 ` Jeff Xu
@ 2025-02-14 0:14 ` Liam R. Howlett
2025-02-14 1:10 ` Liam R. Howlett
0 siblings, 1 reply; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-14 0:14 UTC (permalink / raw)
To: Jeff Xu
Cc: Kees Cook, akpm, jannh, torvalds, vbabka, lorenzo.stoakes,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
* Jeff Xu <jeffxu@chromium.org> [250213 17:00]:
> On Thu, Feb 13, 2025 at 12:54 PM Liam R. Howlett
> <Liam.Howlett@oracle.com> wrote:
>
> > > > >
> > > > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > > > > the build. This is intentional. Any 32-bit code trying to use the
> > > > > sealing function or the VM_SEALED flag will immediately fail
> > > > > compilation. This makes it easier to identify incorrect usage.
> > > >
> > > > So you are against using the #define because the VM_SYSTEM_SEAL will be
> > > > defined, even though it will be VM_NONE? This is no worse than a
> > > > function that returns 0, and it aligns better with what we have today in
> > > > that it can be put in the list of other flags.
> > >
> > > When I was reading through all of this and considering the history of
> > > its development goals, it strikes me that a function is easier for the
> > > future if/when this can be made a boot-time setting.
> > >
> >
> > Reworking this change to function as a boot-time parameter, or whatever,
> > would not be a significant amount of work, if/when the time comes.
> > Since we don't know what the future holds, it it unnecessary to engineer
> > in a potential change for a future version when the change is easy
> > enough to make later and keep the code cleaner now.
> >
> Sure, I will put the function in mm.h for this patch. We can find a
> proper place when it is time to implement the boot-time parameter
> change.
>
> The call stack for sealing system mapping is something like below:
>
> install_special_mapping (mm/mmap.c)
> map_vdso (arch/x86/entry/vdso/vma.c)
> load_elf_binary (fs/binfmt_elf.c)
> load_misc_binary (fs/binfmt_misc.c)
> bprm_execve (fs/exec.c)
> do_execveat_common
> __x64_sys_execve
> do_syscall_64
>
> IMO, there's a clear divide between the API implementer and the API user.
> mm and mm.h are the providers, offering the core mm functionality
> through APIs/data structures like install_special_mapping().
>
> The exe layer (bprm_execve, map_vdso, etc) is the consumer of the
> install_special_mapping.
> The logic related to checking if sealing system mapping is enabled
> belongs to the exe layer.
Since this is an all or nothing enabling, there is no reason to have
each caller check the same thing and do the same action. You should put
the logic into the provider - they all end up doing the same thing.
Also, this is a compile time option so it doesn't even need to be
checked on execution - just apply it in the first place, at the source.
Your static inline function was already doing this...?
I'm confused as to what you are arguing here because it goes against
what you had and what I suggested. The alternative you are suggesting
is more code, more instructions, and the best outcome is the same
result.
>
> >
> > > If mm maintainers prefer a #define for now, that's fine of course. The
> > > value of VM_SYSTEM_SEAL can be VM_NONE on 32-bit.
> >
> > Thanks. I think having a flag with VM_NONE on 32-bit is just as sane as
> > a "flags |= system_seal()" call that unconditionally returns 0 on
> > 32-bit.
> >
> Consider the case below in src/third_party/kernel/v6.6/fs/proc/task_mmu.c,
>
> #ifdef CONFIG_64BIT
> [ilog2(VM_SEALED)] = "sl",
> #endif
>
> If #ifdef CONFIG_64BIT is missing, it won't be detected during compile time.
>
> Setting VM_SEALED to VM_NONE could simplify the coding in some cases
> (get/set case), but it might make other cases error prone.
>
> I would prefer to not have VM_SEALED for 32 bit.
But what I posted leaves VM_SEALED undefined for 32 bit, it defines
VM_SYSTEM_SEALED which can be placed, for instance, into
_install_special_mapping() arguments directly. Reducing the change to
just adding a new flag to the list. And it's applied to all system
mappings based on if it's enabled on compile or not.
Also:
include/linux/mm.h:#define VM_NONE 0x00000000
so, I'm not sure what evaluation you are concerned about? It would be
defined as 0.
Thanks,
Liam
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-14 0:14 ` Liam R. Howlett
@ 2025-02-14 1:10 ` Liam R. Howlett
2025-02-14 14:39 ` Jeff Xu
0 siblings, 1 reply; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-14 1:10 UTC (permalink / raw)
To: Jeff Xu, Kees Cook, akpm, jannh, torvalds, vbabka,
lorenzo.stoakes, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport
* Liam R. Howlett <Liam.Howlett@oracle.com> [250213 19:14]:
> * Jeff Xu <jeffxu@chromium.org> [250213 17:00]:
> > On Thu, Feb 13, 2025 at 12:54 PM Liam R. Howlett
> > <Liam.Howlett@oracle.com> wrote:
> >
> > > > > >
> > > > > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > > > > > the build. This is intentional. Any 32-bit code trying to use the
> > > > > > sealing function or the VM_SEALED flag will immediately fail
> > > > > > compilation. This makes it easier to identify incorrect usage.
> > > > >
> > > > > So you are against using the #define because the VM_SYSTEM_SEAL will be
> > > > > defined, even though it will be VM_NONE? This is no worse than a
> > > > > function that returns 0, and it aligns better with what we have today in
> > > > > that it can be put in the list of other flags.
> > > >
> > > > When I was reading through all of this and considering the history of
> > > > its development goals, it strikes me that a function is easier for the
> > > > future if/when this can be made a boot-time setting.
> > > >
> > >
> > > Reworking this change to function as a boot-time parameter, or whatever,
> > > would not be a significant amount of work, if/when the time comes.
> > > Since we don't know what the future holds, it it unnecessary to engineer
> > > in a potential change for a future version when the change is easy
> > > enough to make later and keep the code cleaner now.
> > >
> > Sure, I will put the function in mm.h for this patch. We can find a
> > proper place when it is time to implement the boot-time parameter
> > change.
> >
> > The call stack for sealing system mapping is something like below:
> >
> > install_special_mapping (mm/mmap.c)
> > map_vdso (arch/x86/entry/vdso/vma.c)
> > load_elf_binary (fs/binfmt_elf.c)
> > load_misc_binary (fs/binfmt_misc.c)
> > bprm_execve (fs/exec.c)
> > do_execveat_common
> > __x64_sys_execve
> > do_syscall_64
> >
> > IMO, there's a clear divide between the API implementer and the API user.
> > mm and mm.h are the providers, offering the core mm functionality
> > through APIs/data structures like install_special_mapping().
> >
> > The exe layer (bprm_execve, map_vdso, etc) is the consumer of the
> > install_special_mapping.
> > The logic related to checking if sealing system mapping is enabled
> > belongs to the exe layer.
>
> Since this is an all or nothing enabling, there is no reason to have
> each caller check the same thing and do the same action. You should put
> the logic into the provider - they all end up doing the same thing.
>
> Also, this is a compile time option so it doesn't even need to be
> checked on execution - just apply it in the first place, at the source.
> Your static inline function was already doing this...?
>
> I'm confused as to what you are arguing here because it goes against
> what you had and what I suggested. The alternative you are suggesting
> is more code, more instructions, and the best outcome is the same
> result.
I think I understand what you are saying now: the interface to
install_special_mapping() needs to take the vma flag, as it does today.
What I don't understand is that what you proposed and what I proposed
both do that.
What I'm saying is that, since system mappings are enabled, we can
already know implicitly by having VM_SYSTEM_SEAL either VM_NONE or
VM_SEAL.
Turning this:
@@ -264,11 +266,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
/*
* MAYWRITE to allow gdb to COW and set breakpoints
*/
+ vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
+ vm_flags |= mseal_system_mappings();
vma = _install_special_mapping(mm,
text_start,
image->size,
- VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vm_flags,
&vdso_mapping);
to this:
/*
* MAYWRITE to allow gdb to COW and set breakpoints
*/
vma = _install_special_mapping(mm,
text_start,
image->size,
VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
+ VM_SYSTEM_SEAL,
&vdso_mapping);
No unsigned long vm_flags needed. It's easier to read and I don't think
it's any more hidden than the vm_flags |= function call option.
>
> >
> > >
> > > > If mm maintainers prefer a #define for now, that's fine of course. The
> > > > value of VM_SYSTEM_SEAL can be VM_NONE on 32-bit.
> > >
> > > Thanks. I think having a flag with VM_NONE on 32-bit is just as sane as
> > > a "flags |= system_seal()" call that unconditionally returns 0 on
> > > 32-bit.
> > >
> > Consider the case below in src/third_party/kernel/v6.6/fs/proc/task_mmu.c,
> >
> > #ifdef CONFIG_64BIT
> > [ilog2(VM_SEALED)] = "sl",
> > #endif
> >
> > If #ifdef CONFIG_64BIT is missing, it won't be detected during compile time.
> >
> > Setting VM_SEALED to VM_NONE could simplify the coding in some cases
> > (get/set case), but it might make other cases error prone.
> >
> > I would prefer to not have VM_SEALED for 32 bit.
>
> But what I posted leaves VM_SEALED undefined for 32 bit, it defines
> VM_SYSTEM_SEALED which can be placed, for instance, into
> _install_special_mapping() arguments directly. Reducing the change to
> just adding a new flag to the list. And it's applied to all system
> mappings based on if it's enabled on compile or not.
>
> Also:
> include/linux/mm.h:#define VM_NONE 0x00000000
> so, I'm not sure what evaluation you are concerned about? It would be
> defined as 0.
>
> Thanks,
> Liam
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-14 1:10 ` Liam R. Howlett
@ 2025-02-14 14:39 ` Jeff Xu
2025-02-14 14:59 ` Lorenzo Stoakes
0 siblings, 1 reply; 43+ messages in thread
From: Jeff Xu @ 2025-02-14 14:39 UTC (permalink / raw)
To: Liam R. Howlett, Jeff Xu, Kees Cook, akpm, jannh, torvalds,
vbabka, lorenzo.stoakes, adhemerval.zanella, oleg, avagin,
benjamin, linux-kernel, linux-hardening, linux-mm, jorgelo,
sroettger, hch, ojeda, thomas.weissschuh, adobriyan, johannes,
pedro.falcato, hca, willy, anna-maria, mark.rutland,
linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 5:10 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Liam R. Howlett <Liam.Howlett@oracle.com> [250213 19:14]:
> > * Jeff Xu <jeffxu@chromium.org> [250213 17:00]:
> > > On Thu, Feb 13, 2025 at 12:54 PM Liam R. Howlett
> > > <Liam.Howlett@oracle.com> wrote:
> > >
> > > > > > >
> > > > > > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > > > > > > the build. This is intentional. Any 32-bit code trying to use the
> > > > > > > sealing function or the VM_SEALED flag will immediately fail
> > > > > > > compilation. This makes it easier to identify incorrect usage.
> > > > > >
> > > > > > So you are against using the #define because the VM_SYSTEM_SEAL will be
> > > > > > defined, even though it will be VM_NONE? This is no worse than a
> > > > > > function that returns 0, and it aligns better with what we have today in
> > > > > > that it can be put in the list of other flags.
> > > > >
> > > > > When I was reading through all of this and considering the history of
> > > > > its development goals, it strikes me that a function is easier for the
> > > > > future if/when this can be made a boot-time setting.
> > > > >
> > > >
> > > > Reworking this change to function as a boot-time parameter, or whatever,
> > > > would not be a significant amount of work, if/when the time comes.
> > > > Since we don't know what the future holds, it it unnecessary to engineer
> > > > in a potential change for a future version when the change is easy
> > > > enough to make later and keep the code cleaner now.
> > > >
> > > Sure, I will put the function in mm.h for this patch. We can find a
> > > proper place when it is time to implement the boot-time parameter
> > > change.
> > >
> > > The call stack for sealing system mapping is something like below:
> > >
> > > install_special_mapping (mm/mmap.c)
> > > map_vdso (arch/x86/entry/vdso/vma.c)
> > > load_elf_binary (fs/binfmt_elf.c)
> > > load_misc_binary (fs/binfmt_misc.c)
> > > bprm_execve (fs/exec.c)
> > > do_execveat_common
> > > __x64_sys_execve
> > > do_syscall_64
> > >
> > > IMO, there's a clear divide between the API implementer and the API user.
> > > mm and mm.h are the providers, offering the core mm functionality
> > > through APIs/data structures like install_special_mapping().
> > >
> > > The exe layer (bprm_execve, map_vdso, etc) is the consumer of the
> > > install_special_mapping.
> > > The logic related to checking if sealing system mapping is enabled
> > > belongs to the exe layer.
> >
> > Since this is an all or nothing enabling, there is no reason to have
> > each caller check the same thing and do the same action. You should put
> > the logic into the provider - they all end up doing the same thing.
> >
> > Also, this is a compile time option so it doesn't even need to be
> > checked on execution - just apply it in the first place, at the source.
> > Your static inline function was already doing this...?
> >
> > I'm confused as to what you are arguing here because it goes against
> > what you had and what I suggested. The alternative you are suggesting
> > is more code, more instructions, and the best outcome is the same
> > result.
>
> I think I understand what you are saying now: the interface to
> install_special_mapping() needs to take the vma flag, as it does today.
> What I don't understand is that what you proposed and what I proposed
> both do that.
>
> What I'm saying is that, since system mappings are enabled, we can
> already know implicitly by having VM_SYSTEM_SEAL either VM_NONE or
> VM_SEAL.
>
> Turning this:
>
> @@ -264,11 +266,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
> /*
> * MAYWRITE to allow gdb to COW and set breakpoints
> */
> + vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
> + vm_flags |= mseal_system_mappings();
> vma = _install_special_mapping(mm,
> text_start,
> image->size,
> - VM_READ|VM_EXEC|
> - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> + vm_flags,
> &vdso_mapping);
>
> to this:
>
> /*
> * MAYWRITE to allow gdb to COW and set breakpoints
> */
> vma = _install_special_mapping(mm,
> text_start,
> image->size,
> VM_READ|VM_EXEC|
> - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
> + VM_SYSTEM_SEAL,
> &vdso_mapping);
>
> No unsigned long vm_flags needed. It's easier to read and I don't think
> it's any more hidden than the vm_flags |= function call option.
>
The arch code needs a mseal_system_mappings() function. Otherwise,
I'll have to change this line (in arch) again when I implement the kernel
command line or pre-process opt-in/opt-out, which requires a function
call. This isn't overthinking; based on our discussion so far, there
are clear needs for a subsequent patch series. This patch is just the
first step.
For software layering, I'd like to see a clear separation between
layers. mm implements _install_special_mapping, which accepts any
combination of vm_flags as input. Then I'd like the caller (in arch
code) to have all the necessary code to compose the vm_flags in one
place. This helps readability. In the past, we discussed merging the
vdso/vvar code across all architectures. When that happens, the new
code in arch will likely have its own .c and .h files, but it will still sit
above mm. That means mm won't need to change, and the
_install_special_mapping API from mm will remain unchanged.
The mseal_system_mappings() function can already return VM_SEALED,
and future patches will add more logic into mseal_system_mappings(), e.g.
check for kernel cmdline or opt-in/opt-out. we don't need a separate
VM_SYSTEM_SEAL, which is a *** redirect macro ***, I prefer to keep
all the important code logic relevant to configuration of enabling system
mapping sealing in one place, for easy reading.
mseal_system_mappings() can be placed in mm.h in this patch, as you
suggested. But in the near future, it will be moved out of mm.h and find
a right header. The functionality belongs to exe namespace, because of
the reasons I put in the commit message and discussions.
Thanks
-Jeff
-Jeff
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-14 14:39 ` Jeff Xu
@ 2025-02-14 14:59 ` Lorenzo Stoakes
2025-02-14 15:18 ` Jeff Xu
0 siblings, 1 reply; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-02-14 14:59 UTC (permalink / raw)
To: Jeff Xu
Cc: Liam R. Howlett, Kees Cook, akpm, jannh, torvalds, vbabka,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Fri, Feb 14, 2025 at 06:39:48AM -0800, Jeff Xu wrote:
> mseal_system_mappings() can be placed in mm.h in this patch, as you
> suggested. But in the near future, it will be moved out of mm.h and find
> a right header. The functionality belongs to exe namespace, because of
> the reasons I put in the commit message and discussions.
With respect Jeff - and I feel that this might simply be a miscommunciation
here - but this doesn't read wonderfully. 'can be placed' 'it will be moved
out', etc.
Please try to be respectful of experienced maintainers who are taking their
time to review your code, and respond politely and respectfully. I think
what you meant to say is something more like:
"I'm more than happy to do that, but I feel that it would be more suited in
a separate header, as this strictly belongs to the kernel functionality
surrounding the execution of code. However we can revisit this at a later
time!"
My feeling is that this is exactly what you mean, but you are just
essentially abbreviating this. However it reads rather rudely, which I'm
sure you don't intend.
Ultimately the fact of the matter is that your series will be merged when
it reaches the standards required of you by the relevant maintainers, as is
the case will all code submitted to the kernel when we reach consensus.
In this series you have addressed a great number of concerns which has
brought the merging of it very much closer, so I hope we can continue in
the same vein and reach this consensus soon.
Let's try to avoid any miscommunication which might delay us reaching this
aim!
Thanks, Lorenzo
>
> Thanks
> -Jeff
>
> -Jeff
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change
2025-02-14 14:59 ` Lorenzo Stoakes
@ 2025-02-14 15:18 ` Jeff Xu
0 siblings, 0 replies; 43+ messages in thread
From: Jeff Xu @ 2025-02-14 15:18 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Liam R. Howlett, Kees Cook, akpm, jannh, torvalds, vbabka,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Fri, Feb 14, 2025 at 7:00 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Feb 14, 2025 at 06:39:48AM -0800, Jeff Xu wrote:
> > mseal_system_mappings() can be placed in mm.h in this patch, as you
> > suggested. But in the near future, it will be moved out of mm.h and find
> > a right header. The functionality belongs to exe namespace, because of
> > the reasons I put in the commit message and discussions.
>
> With respect Jeff - and I feel that this might simply be a miscommunciation
> here - but this doesn't read wonderfully. 'can be placed' 'it will be moved
> out', etc.
>
> Please try to be respectful of experienced maintainers who are taking their
> time to review your code, and respond politely and respectfully. I think
> what you meant to say is something more like:
>
> "I'm more than happy to do that, but I feel that it would be more suited in
> a separate header, as this strictly belongs to the kernel functionality
> surrounding the execution of code. However we can revisit this at a later
> time!"
>
Thanks!
I apologize if my expression appears to be rude. Your revision
reflects what I'm trying to say, in better english.
> My feeling is that this is exactly what you mean, but you are just
> essentially abbreviating this. However it reads rather rudely, which I'm
> sure you don't intend.
>
> Ultimately the fact of the matter is that your series will be merged when
> it reaches the standards required of you by the relevant maintainers, as is
> the case will all code submitted to the kernel when we reach consensus.
>
> In this series you have addressed a great number of concerns which has
> brought the merging of it very much closer, so I hope we can continue in
> the same vein and reach this consensus soon.
>
> Let's try to avoid any miscommunication which might delay us reaching this
> aim!
>
Thanks !
-Jeff
> Thanks, Lorenzo
>
> >
> > Thanks
> > -Jeff
> >
> > -Jeff
^ permalink raw reply [flat|nested] 43+ messages in thread
* [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 13:03 ` Thomas Weißschuh
2025-02-12 3:21 ` [RFC PATCH v5 3/7] mseal, system mappings: enable x86-64 jeffxu
` (5 subsequent siblings)
7 siblings, 1 reply; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu
From: Jeff Xu <jeffxu@chromium.org>
Add code to detect if the vdso is memory sealed, skip the test
if it is.
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
.../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
index d53959e03593..c68077c56b22 100644
--- a/tools/testing/selftests/x86/test_mremap_vdso.c
+++ b/tools/testing/selftests/x86/test_mremap_vdso.c
@@ -14,6 +14,7 @@
#include <errno.h>
#include <unistd.h>
#include <string.h>
+#include <stdbool.h>
#include <sys/mman.h>
#include <sys/auxv.h>
@@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
}
+#define VDSO_NAME "[vdso]"
+#define VMFLAGS "VmFlags:"
+#define MSEAL_FLAGS "sl"
+#define MAX_LINE_LEN 512
+
+bool vdso_sealed(FILE *maps)
+{
+ char line[MAX_LINE_LEN];
+ bool has_vdso = false;
+
+ while (fgets(line, sizeof(line), maps)) {
+ if (strstr(line, VDSO_NAME))
+ has_vdso = true;
+
+ if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
+ if (strstr(line, MSEAL_FLAGS))
+ return true;
+
+ return false;
+ }
+ }
+
+ return false;
+}
+
int main(int argc, char **argv, char **envp)
{
pid_t child;
+ FILE *maps;
ksft_print_header();
ksft_set_plan(1);
+ maps = fopen("/proc/self/smaps", "r");
+ if (!maps) {
+ ksft_test_result_skip("Could not open /proc/self/smaps\n");
+ return 0;
+ }
+
+ if (vdso_sealed(maps)) {
+ ksft_test_result_skip("vdso is sealed\n");
+ return 0;
+ }
+
child = fork();
if (child == -1)
ksft_exit_fail_msg("failed to fork (%d): %m\n", errno);
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-12 3:21 ` [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed jeffxu
@ 2025-02-12 13:03 ` Thomas Weißschuh
2025-02-13 14:14 ` Jeff Xu
0 siblings, 1 reply; 43+ messages in thread
From: Thomas Weißschuh @ 2025-02-12 13:03 UTC (permalink / raw)
To: jeffxu
Cc: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, adobriyan, johannes, pedro.falcato, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 03:21:50AM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
>
> Add code to detect if the vdso is memory sealed, skip the test
> if it is.
>
> Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> ---
> .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
> 1 file changed, 38 insertions(+)
>
> diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
> index d53959e03593..c68077c56b22 100644
> --- a/tools/testing/selftests/x86/test_mremap_vdso.c
> +++ b/tools/testing/selftests/x86/test_mremap_vdso.c
> @@ -14,6 +14,7 @@
> #include <errno.h>
> #include <unistd.h>
> #include <string.h>
> +#include <stdbool.h>
>
> #include <sys/mman.h>
> #include <sys/auxv.h>
> @@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
>
> }
>
> +#define VDSO_NAME "[vdso]"
> +#define VMFLAGS "VmFlags:"
> +#define MSEAL_FLAGS "sl"
> +#define MAX_LINE_LEN 512
> +
> +bool vdso_sealed(FILE *maps)
> +{
> + char line[MAX_LINE_LEN];
> + bool has_vdso = false;
> +
> + while (fgets(line, sizeof(line), maps)) {
> + if (strstr(line, VDSO_NAME))
> + has_vdso = true;
> +
> + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
> + if (strstr(line, MSEAL_FLAGS))
> + return true;
> +
> + return false;
This only tests that any mapping after the vdso is sealed.
There is a real parser for /proc/self/smaps in
tools/testing/selftests/mm/vm_util.[ch], see check_vmflag_io().
> + }
> + }
> +
> + return false;
> +}
<snip>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-12 13:03 ` Thomas Weißschuh
@ 2025-02-13 14:14 ` Jeff Xu
2025-02-13 19:28 ` Kees Cook
0 siblings, 1 reply; 43+ messages in thread
From: Jeff Xu @ 2025-02-13 14:14 UTC (permalink / raw)
To: Thomas Weißschuh
Cc: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, adobriyan, johannes, pedro.falcato, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 5:04 AM Thomas Weißschuh
<thomas.weissschuh@linutronix.de> wrote:
>
> On Wed, Feb 12, 2025 at 03:21:50AM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > Add code to detect if the vdso is memory sealed, skip the test
> > if it is.
> >
> > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> > ---
> > .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
> > 1 file changed, 38 insertions(+)
> >
> > diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
> > index d53959e03593..c68077c56b22 100644
> > --- a/tools/testing/selftests/x86/test_mremap_vdso.c
> > +++ b/tools/testing/selftests/x86/test_mremap_vdso.c
> > @@ -14,6 +14,7 @@
> > #include <errno.h>
> > #include <unistd.h>
> > #include <string.h>
> > +#include <stdbool.h>
> >
> > #include <sys/mman.h>
> > #include <sys/auxv.h>
> > @@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
> >
> > }
> >
> > +#define VDSO_NAME "[vdso]"
> > +#define VMFLAGS "VmFlags:"
> > +#define MSEAL_FLAGS "sl"
> > +#define MAX_LINE_LEN 512
> > +
> > +bool vdso_sealed(FILE *maps)
> > +{
> > + char line[MAX_LINE_LEN];
> > + bool has_vdso = false;
> > +
> > + while (fgets(line, sizeof(line), maps)) {
> > + if (strstr(line, VDSO_NAME))
> > + has_vdso = true;
> > +
> > + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
> > + if (strstr(line, MSEAL_FLAGS))
> > + return true;
> > +
> > + return false;
>
> This only tests that any mapping after the vdso is sealed.
The code above begins by searching for the "[vdso]" string, then looks
for the first line that starts with "VmFlags:", and looks for the "sl"
substring within that line. We're assuming the format of smaps won't
change from release to release.
> There is a real parser for /proc/self/smaps in
> tools/testing/selftests/mm/vm_util.[ch], see check_vmflag_io().
>
This test is in selftest/x86, which makes it hard to include the
vm_util from selftest/mm, if that's what you're asking.
If you are asking reusing the code logic from vm_util, the
check_vmflag_io() calls __get_smap_entry(addr, "VmFlags:", ...), which
begins by searching for address, then looks for the first line that
started with "VmFlags:", then check for the "io" within that line.
This is the same logic as my code, if I read the code correctly.
Thanks
-Jeff
> > + }
> > + }
> > +
> > + return false;
> > +}
>
> <snip>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-13 14:14 ` Jeff Xu
@ 2025-02-13 19:28 ` Kees Cook
2025-02-13 22:20 ` Jeff Xu
0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2025-02-13 19:28 UTC (permalink / raw)
To: Jeff Xu, Thomas Weißschuh
Cc: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, adobriyan, johannes, pedro.falcato, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On February 13, 2025 6:14:00 AM PST, Jeff Xu <jeffxu@chromium.org> wrote:
>On Wed, Feb 12, 2025 at 5:04 AM Thomas Weißschuh
><thomas.weissschuh@linutronix.de> wrote:
>>
>> On Wed, Feb 12, 2025 at 03:21:50AM +0000, jeffxu@chromium.org wrote:
>> > From: Jeff Xu <jeffxu@chromium.org>
>> >
>> > Add code to detect if the vdso is memory sealed, skip the test
>> > if it is.
>> >
>> > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
>> > ---
>> > .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
>> > 1 file changed, 38 insertions(+)
>> >
>> > diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
>> > index d53959e03593..c68077c56b22 100644
>> > --- a/tools/testing/selftests/x86/test_mremap_vdso.c
>> > +++ b/tools/testing/selftests/x86/test_mremap_vdso.c
>> > @@ -14,6 +14,7 @@
>> > #include <errno.h>
>> > #include <unistd.h>
>> > #include <string.h>
>> > +#include <stdbool.h>
>> >
>> > #include <sys/mman.h>
>> > #include <sys/auxv.h>
>> > @@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
>> >
>> > }
>> >
>> > +#define VDSO_NAME "[vdso]"
>> > +#define VMFLAGS "VmFlags:"
>> > +#define MSEAL_FLAGS "sl"
>> > +#define MAX_LINE_LEN 512
>> > +
>> > +bool vdso_sealed(FILE *maps)
>> > +{
>> > + char line[MAX_LINE_LEN];
>> > + bool has_vdso = false;
>> > +
>> > + while (fgets(line, sizeof(line), maps)) {
>> > + if (strstr(line, VDSO_NAME))
>> > + has_vdso = true;
>> > +
>> > + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
>> > + if (strstr(line, MSEAL_FLAGS))
>> > + return true;
>> > +
>> > + return false;
>>
>> This only tests that any mapping after the vdso is sealed.
>
>The code above begins by searching for the "[vdso]" string, then looks
>for the first line that starts with "VmFlags:", and looks for the "sl"
>substring within that line. We're assuming the format of smaps won't
>change from release to release.
Oh, I missed this in my reviews: nothing _resets_ has_vdso to false, so if any other mapping follows vdso that happens to be sealed, this will return true...
>
>> There is a real parser for /proc/self/smaps in
>> tools/testing/selftests/mm/vm_util.[ch], see check_vmflag_io().
>>
>This test is in selftest/x86, which makes it hard to include the
>vm_util from selftest/mm, if that's what you're asking.
Hm, we have done these kinds of inter-tests dependencies before. (E.g. seccomp includes stuff from the clone tests.) I think it should be possible if it can just be a header include. Linking across tests would be more frustrating.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-13 19:28 ` Kees Cook
@ 2025-02-13 22:20 ` Jeff Xu
2025-02-14 2:52 ` Kees Cook
0 siblings, 1 reply; 43+ messages in thread
From: Jeff Xu @ 2025-02-13 22:20 UTC (permalink / raw)
To: Kees Cook
Cc: Thomas Weißschuh, akpm, keescook, jannh, torvalds, vbabka,
lorenzo.stoakes, Liam.Howlett, adhemerval.zanella, oleg, avagin,
benjamin, linux-kernel, linux-hardening, linux-mm, jorgelo,
sroettger, hch, ojeda, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 11:28 AM Kees Cook <kees@kernel.org> wrote:
>
>
>
> On February 13, 2025 6:14:00 AM PST, Jeff Xu <jeffxu@chromium.org> wrote:
> >On Wed, Feb 12, 2025 at 5:04 AM Thomas Weißschuh
> ><thomas.weissschuh@linutronix.de> wrote:
> >>
> >> On Wed, Feb 12, 2025 at 03:21:50AM +0000, jeffxu@chromium.org wrote:
> >> > From: Jeff Xu <jeffxu@chromium.org>
> >> >
> >> > Add code to detect if the vdso is memory sealed, skip the test
> >> > if it is.
> >> >
> >> > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> >> > ---
> >> > .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
> >> > 1 file changed, 38 insertions(+)
> >> >
> >> > diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
> >> > index d53959e03593..c68077c56b22 100644
> >> > --- a/tools/testing/selftests/x86/test_mremap_vdso.c
> >> > +++ b/tools/testing/selftests/x86/test_mremap_vdso.c
> >> > @@ -14,6 +14,7 @@
> >> > #include <errno.h>
> >> > #include <unistd.h>
> >> > #include <string.h>
> >> > +#include <stdbool.h>
> >> >
> >> > #include <sys/mman.h>
> >> > #include <sys/auxv.h>
> >> > @@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
> >> >
> >> > }
> >> >
> >> > +#define VDSO_NAME "[vdso]"
> >> > +#define VMFLAGS "VmFlags:"
> >> > +#define MSEAL_FLAGS "sl"
> >> > +#define MAX_LINE_LEN 512
> >> > +
> >> > +bool vdso_sealed(FILE *maps)
> >> > +{
> >> > + char line[MAX_LINE_LEN];
> >> > + bool has_vdso = false;
> >> > +
> >> > + while (fgets(line, sizeof(line), maps)) {
> >> > + if (strstr(line, VDSO_NAME))
> >> > + has_vdso = true;
> >> > +
> >> > + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
> >> > + if (strstr(line, MSEAL_FLAGS))
> >> > + return true;
> >> > +
> >> > + return false;
> >>
> >> This only tests that any mapping after the vdso is sealed.
> >
> >The code above begins by searching for the "[vdso]" string, then looks
> >for the first line that starts with "VmFlags:", and looks for the "sl"
> >substring within that line. We're assuming the format of smaps won't
> >change from release to release.
>
> Oh, I missed this in my reviews: nothing _resets_ has_vdso to false, so if any other mapping follows vdso that happens to be sealed, this will return true...
>
It won't return the next mapping's sealing flag.
After finding the "[vdso]", if the next line that contains VMFLAGS
doesn't have the "sl" flag, the function returns false immediately.
> >
> >> There is a real parser for /proc/self/smaps in
> >> tools/testing/selftests/mm/vm_util.[ch], see check_vmflag_io().
> >>
> >This test is in selftest/x86, which makes it hard to include the
> >vm_util from selftest/mm, if that's what you're asking.
>
> Hm, we have done these kinds of inter-tests dependencies before. (E.g. seccomp includes stuff from the clone tests.) I think it should be possible if it can just be a header include. Linking across tests would be more frustrating.
>
I can switch to vm_util, I will need to add a new parsing function in
vm_util, the existing __get_smap_entry() only searches for vm's
starting address, not name.
Thanks
-Jeff
> -Kees
>
> --
> Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-13 22:20 ` Jeff Xu
@ 2025-02-14 2:52 ` Kees Cook
2025-02-14 14:15 ` Jeff Xu
0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2025-02-14 2:52 UTC (permalink / raw)
To: Jeff Xu
Cc: Thomas Weißschuh, akpm, keescook, jannh, torvalds, vbabka,
lorenzo.stoakes, Liam.Howlett, adhemerval.zanella, oleg, avagin,
benjamin, linux-kernel, linux-hardening, linux-mm, jorgelo,
sroettger, hch, ojeda, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On February 13, 2025 2:20:01 PM PST, Jeff Xu <jeffxu@chromium.org> wrote:
>On Thu, Feb 13, 2025 at 11:28 AM Kees Cook <kees@kernel.org> wrote:
>>
>>
>>
>> On February 13, 2025 6:14:00 AM PST, Jeff Xu <jeffxu@chromium.org> wrote:
>> >On Wed, Feb 12, 2025 at 5:04 AM Thomas Weißschuh
>> ><thomas.weissschuh@linutronix.de> wrote:
>> >>
>> >> On Wed, Feb 12, 2025 at 03:21:50AM +0000, jeffxu@chromium.org wrote:
>> >> > From: Jeff Xu <jeffxu@chromium.org>
>> >> >
>> >> > Add code to detect if the vdso is memory sealed, skip the test
>> >> > if it is.
>> >> >
>> >> > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
>> >> > ---
>> >> > .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
>> >> > 1 file changed, 38 insertions(+)
>> >> >
>> >> > diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
>> >> > index d53959e03593..c68077c56b22 100644
>> >> > --- a/tools/testing/selftests/x86/test_mremap_vdso.c
>> >> > +++ b/tools/testing/selftests/x86/test_mremap_vdso.c
>> >> > @@ -14,6 +14,7 @@
>> >> > #include <errno.h>
>> >> > #include <unistd.h>
>> >> > #include <string.h>
>> >> > +#include <stdbool.h>
>> >> >
>> >> > #include <sys/mman.h>
>> >> > #include <sys/auxv.h>
>> >> > @@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
>> >> >
>> >> > }
>> >> >
>> >> > +#define VDSO_NAME "[vdso]"
>> >> > +#define VMFLAGS "VmFlags:"
>> >> > +#define MSEAL_FLAGS "sl"
>> >> > +#define MAX_LINE_LEN 512
>> >> > +
>> >> > +bool vdso_sealed(FILE *maps)
>> >> > +{
>> >> > + char line[MAX_LINE_LEN];
>> >> > + bool has_vdso = false;
>> >> > +
>> >> > + while (fgets(line, sizeof(line), maps)) {
>> >> > + if (strstr(line, VDSO_NAME))
>> >> > + has_vdso = true;
>> >> > +
>> >> > + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
>> >> > + if (strstr(line, MSEAL_FLAGS))
>> >> > + return true;
>> >> > +
>> >> > + return false;
>> >>
>> >> This only tests that any mapping after the vdso is sealed.
>> >
>> >The code above begins by searching for the "[vdso]" string, then looks
>> >for the first line that starts with "VmFlags:", and looks for the "sl"
>> >substring within that line. We're assuming the format of smaps won't
>> >change from release to release.
>>
>> Oh, I missed this in my reviews: nothing _resets_ has_vdso to false, so if any other mapping follows vdso that happens to be sealed, this will return true...
>>
>It won't return the next mapping's sealing flag.
>After finding the "[vdso]", if the next line that contains VMFLAGS
>doesn't have the "sl" flag, the function returns false immediately.
Oh! Agh, yes. You are right, this is all fine.
>I can switch to vm_util, I will need to add a new parsing function in
>vm_util, the existing __get_smap_entry() only searches for vm's
>starting address, not name.
Unless someone feels strongly about this, my instinct is to avoid the higher complexity of a cross-test thing.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
2025-02-14 2:52 ` Kees Cook
@ 2025-02-14 14:15 ` Jeff Xu
0 siblings, 0 replies; 43+ messages in thread
From: Jeff Xu @ 2025-02-14 14:15 UTC (permalink / raw)
To: Kees Cook
Cc: Thomas Weißschuh, akpm, keescook, jannh, torvalds, vbabka,
lorenzo.stoakes, Liam.Howlett, adhemerval.zanella, oleg, avagin,
benjamin, linux-kernel, linux-hardening, linux-mm, jorgelo,
sroettger, hch, ojeda, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 6:52 PM Kees Cook <kees@kernel.org> wrote:
>
>
>
> On February 13, 2025 2:20:01 PM PST, Jeff Xu <jeffxu@chromium.org> wrote:
> >On Thu, Feb 13, 2025 at 11:28 AM Kees Cook <kees@kernel.org> wrote:
> >>
> >>
> >>
> >> On February 13, 2025 6:14:00 AM PST, Jeff Xu <jeffxu@chromium.org> wrote:
> >> >On Wed, Feb 12, 2025 at 5:04 AM Thomas Weißschuh
> >> ><thomas.weissschuh@linutronix.de> wrote:
> >> >>
> >> >> On Wed, Feb 12, 2025 at 03:21:50AM +0000, jeffxu@chromium.org wrote:
> >> >> > From: Jeff Xu <jeffxu@chromium.org>
> >> >> >
> >> >> > Add code to detect if the vdso is memory sealed, skip the test
> >> >> > if it is.
> >> >> >
> >> >> > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> >> >> > ---
> >> >> > .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
> >> >> > 1 file changed, 38 insertions(+)
> >> >> >
> >> >> > diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
> >> >> > index d53959e03593..c68077c56b22 100644
> >> >> > --- a/tools/testing/selftests/x86/test_mremap_vdso.c
> >> >> > +++ b/tools/testing/selftests/x86/test_mremap_vdso.c
> >> >> > @@ -14,6 +14,7 @@
> >> >> > #include <errno.h>
> >> >> > #include <unistd.h>
> >> >> > #include <string.h>
> >> >> > +#include <stdbool.h>
> >> >> >
> >> >> > #include <sys/mman.h>
> >> >> > #include <sys/auxv.h>
> >> >> > @@ -55,13 +56,50 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
> >> >> >
> >> >> > }
> >> >> >
> >> >> > +#define VDSO_NAME "[vdso]"
> >> >> > +#define VMFLAGS "VmFlags:"
> >> >> > +#define MSEAL_FLAGS "sl"
> >> >> > +#define MAX_LINE_LEN 512
> >> >> > +
> >> >> > +bool vdso_sealed(FILE *maps)
> >> >> > +{
> >> >> > + char line[MAX_LINE_LEN];
> >> >> > + bool has_vdso = false;
> >> >> > +
> >> >> > + while (fgets(line, sizeof(line), maps)) {
> >> >> > + if (strstr(line, VDSO_NAME))
> >> >> > + has_vdso = true;
> >> >> > +
> >> >> > + if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
> >> >> > + if (strstr(line, MSEAL_FLAGS))
> >> >> > + return true;
> >> >> > +
> >> >> > + return false;
> >> >>
> >> >> This only tests that any mapping after the vdso is sealed.
> >> >
> >> >The code above begins by searching for the "[vdso]" string, then looks
> >> >for the first line that starts with "VmFlags:", and looks for the "sl"
> >> >substring within that line. We're assuming the format of smaps won't
> >> >change from release to release.
> >>
> >> Oh, I missed this in my reviews: nothing _resets_ has_vdso to false, so if any other mapping follows vdso that happens to be sealed, this will return true...
> >>
> >It won't return the next mapping's sealing flag.
> >After finding the "[vdso]", if the next line that contains VMFLAGS
> >doesn't have the "sl" flag, the function returns false immediately.
>
> Oh! Agh, yes. You are right, this is all fine.
>
> >I can switch to vm_util, I will need to add a new parsing function in
> >vm_util, the existing __get_smap_entry() only searches for vm's
> >starting address, not name.
>
> Unless someone feels strongly about this, my instinct is to avoid the higher complexity of a cross-test thing.
>
OK. I will keep the existing test.
If we decide to use vm_util, it would be best to refactor it
separately later on. The existing vm_util can't be used as is for my
needs, so some refactoring would be necessary.
Thanks
-Jeff
> -Kees
>
>
> --
> Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread
* [RFC PATCH v5 3/7] mseal, system mappings: enable x86-64
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 1/7] mseal, system mappings: kernel config and header change jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 4/7] mseal, system mappings: enable arm64 jeffxu
` (4 subsequent siblings)
7 siblings, 0 replies; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu
From: Jeff Xu <jeffxu@chromium.org>
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64,
covering the vdso, vvar, vvar_vclock.
Production release testing passes on Android and Chrome OS.
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 17 +++++++++++------
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 87198d957e2f..8fa17032ca46 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,6 +26,7 @@ config X86_64
depends on 64BIT
# Options that are inherently 64-bit kernel only:
select ARCH_HAS_GIGANTIC_PAGE
+ select ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 39e6efc1a9ca..b5273dadd64a 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -13,6 +13,7 @@
#include <linux/random.h>
#include <linux/elf.h>
#include <linux/cpu.h>
+#include <linux/userprocess.h>
#include <linux/ptrace.h>
#include <linux/time_namespace.h>
@@ -247,6 +248,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
unsigned long text_start;
+ unsigned long vm_flags;
int ret = 0;
if (mmap_write_lock_killable(mm))
@@ -264,11 +266,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
/*
* MAYWRITE to allow gdb to COW and set breakpoints
*/
+ vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
+ vm_flags |= mseal_system_mappings();
vma = _install_special_mapping(mm,
text_start,
image->size,
- VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vm_flags,
&vdso_mapping);
if (IS_ERR(vma)) {
@@ -276,11 +279,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
goto up_fail;
}
+ vm_flags = VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|VM_PFNMAP;
+ vm_flags |= mseal_system_mappings();
vma = _install_special_mapping(mm,
addr,
(__VVAR_PAGES - VDSO_NR_VCLOCK_PAGES) * PAGE_SIZE,
- VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
- VM_PFNMAP,
+ vm_flags,
&vvar_mapping);
if (IS_ERR(vma)) {
@@ -289,11 +293,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
goto up_fail;
}
+ vm_flags = VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|VM_PFNMAP;
+ vm_flags |= mseal_system_mappings();
vma = _install_special_mapping(mm,
addr + (__VVAR_PAGES - VDSO_NR_VCLOCK_PAGES) * PAGE_SIZE,
VDSO_NR_VCLOCK_PAGES * PAGE_SIZE,
- VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
- VM_PFNMAP,
+ vm_flags,
&vvar_vclock_mapping);
if (IS_ERR(vma)) {
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* [RFC PATCH v5 4/7] mseal, system mappings: enable arm64
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
` (2 preceding siblings ...)
2025-02-12 3:21 ` [RFC PATCH v5 3/7] mseal, system mappings: enable x86-64 jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 5/7] mseal, system mappings: enable uml architecture jeffxu
` (3 subsequent siblings)
7 siblings, 0 replies; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu
From: Jeff Xu <jeffxu@chromium.org>
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on arm64, covering
the vdso, vvar, and compat-mode vectors and sigpage mappings.
Production release testing passes on Android and Chrome OS.
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/vdso.c | 23 ++++++++++++++++-------
2 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fcdd0ed3eca8..39202aa9a5af 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -38,6 +38,7 @@ config ARM64
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEM_ENCRYPT
+ select ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e8ed8e5b713b..cfe2f5b344c4 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -15,6 +15,7 @@
#include <linux/gfp.h>
#include <linux/kernel.h>
#include <linux/mm.h>
+#include <linux/userprocess.h>
#include <linux/sched.h>
#include <linux/signal.h>
#include <linux/slab.h>
@@ -183,6 +184,7 @@ static int __setup_additional_pages(enum vdso_abi abi,
{
unsigned long vdso_base, vdso_text_len, vdso_mapping_len;
unsigned long gp_flags = 0;
+ unsigned long vm_flags;
void *ret;
BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);
@@ -197,8 +199,10 @@ static int __setup_additional_pages(enum vdso_abi abi,
goto up_fail;
}
+ vm_flags = VM_READ|VM_MAYREAD|VM_PFNMAP;
+ vm_flags |= mseal_system_mappings();
ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE,
- VM_READ|VM_MAYREAD|VM_PFNMAP,
+ vm_flags,
&vvar_map);
if (IS_ERR(ret))
goto up_fail;
@@ -208,9 +212,10 @@ static int __setup_additional_pages(enum vdso_abi abi,
vdso_base += VVAR_NR_PAGES * PAGE_SIZE;
mm->context.vdso = (void *)vdso_base;
+ vm_flags = VM_READ|VM_EXEC|gp_flags|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
+ vm_flags |= mseal_system_mappings();
ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
- VM_READ|VM_EXEC|gp_flags|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vm_flags,
vdso_info[abi].cm);
if (IS_ERR(ret))
goto up_fail;
@@ -326,6 +331,7 @@ arch_initcall(aarch32_alloc_vdso_pages);
static int aarch32_kuser_helpers_setup(struct mm_struct *mm)
{
void *ret;
+ unsigned long vm_flags;
if (!IS_ENABLED(CONFIG_KUSER_HELPERS))
return 0;
@@ -334,9 +340,10 @@ static int aarch32_kuser_helpers_setup(struct mm_struct *mm)
* Avoid VM_MAYWRITE for compatibility with arch/arm/, where it's
* not safe to CoW the page containing the CPU exception vectors.
*/
+ vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYEXEC;
+ vm_flags |= mseal_system_mappings();
ret = _install_special_mapping(mm, AARCH32_VECTORS_BASE, PAGE_SIZE,
- VM_READ | VM_EXEC |
- VM_MAYREAD | VM_MAYEXEC,
+ vm_flags,
&aarch32_vdso_maps[AA32_MAP_VECTORS]);
return PTR_ERR_OR_ZERO(ret);
@@ -345,6 +352,7 @@ static int aarch32_kuser_helpers_setup(struct mm_struct *mm)
static int aarch32_sigreturn_setup(struct mm_struct *mm)
{
unsigned long addr;
+ unsigned long vm_flags;
void *ret;
addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
@@ -357,9 +365,10 @@ static int aarch32_sigreturn_setup(struct mm_struct *mm)
* VM_MAYWRITE is required to allow gdb to Copy-on-Write and
* set breakpoints.
*/
+ vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
+ vm_flags |= mseal_system_mappings();
ret = _install_special_mapping(mm, addr, PAGE_SIZE,
- VM_READ | VM_EXEC | VM_MAYREAD |
- VM_MAYWRITE | VM_MAYEXEC,
+ vm_flags,
&aarch32_vdso_maps[AA32_MAP_SIGPAGE]);
if (IS_ERR(ret))
goto out;
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* [RFC PATCH v5 5/7] mseal, system mappings: enable uml architecture
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
` (3 preceding siblings ...)
2025-02-12 3:21 ` [RFC PATCH v5 4/7] mseal, system mappings: enable arm64 jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 6/7] mseal, system mappings: uprobe mapping jeffxu
` (2 subsequent siblings)
7 siblings, 0 replies; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu,
Benjamin Berg
From: Jeff Xu <jeffxu@chromium.org>
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on UML, covering
the vdso.
Testing passes on UML.
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Tested-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/Kconfig | 1 +
arch/x86/um/vdso/vma.c | 7 +++++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 18051b1cfce0..eb2d439a5334 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -10,6 +10,7 @@ config UML
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_KCOV
+ select ARCH_HAS_MSEAL_SYSTEM_MAPPINGS
select ARCH_HAS_STRNCPY_FROM_USER
select ARCH_HAS_STRNLEN_USER
select HAVE_ARCH_AUDITSYSCALL
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index f238f7b33cdd..a68919db0ff7 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -6,6 +6,7 @@
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/mm.h>
+#include <linux/userprocess.h>
#include <asm/page.h>
#include <asm/elf.h>
#include <linux/init.h>
@@ -54,6 +55,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
+ unsigned long vm_flags;
static struct vm_special_mapping vdso_mapping = {
.name = "[vdso]",
};
@@ -65,9 +67,10 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
return -EINTR;
vdso_mapping.pages = vdsop;
+ vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
+ vm_flags |= mseal_system_mappings();
vma = _install_special_mapping(mm, um_vdso_addr, PAGE_SIZE,
- VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vm_flags,
&vdso_mapping);
mmap_write_unlock(mm);
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* [RFC PATCH v5 6/7] mseal, system mappings: uprobe mapping
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
` (4 preceding siblings ...)
2025-02-12 3:21 ` [RFC PATCH v5 5/7] mseal, system mappings: enable uml architecture jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 3:21 ` [RFC PATCH v5 7/7] mseal, system mappings: update mseal.rst jeffxu
2025-02-12 11:24 ` [RFC PATCH v5 0/7] mseal system mappings Lorenzo Stoakes
7 siblings, 0 replies; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu
From: Jeff Xu <jeffxu@chromium.org>
Provide support to mseal the uprobe mapping.
Unlike other system mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime. It could be sealed from creation.
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
kernel/events/uprobes.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2ca797cbe465..55e0fa21eee6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -22,6 +22,7 @@
#include <linux/ptrace.h> /* user_enable_single_step */
#include <linux/kdebug.h> /* notifier mechanism */
#include <linux/percpu-rwsem.h>
+#include <linux/userprocess.h>
#include <linux/task_work.h>
#include <linux/shmem_fs.h>
#include <linux/khugepaged.h>
@@ -1662,6 +1663,7 @@ static const struct vm_special_mapping xol_mapping = {
static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
{
struct vm_area_struct *vma;
+ unsigned long vm_flags;
int ret;
if (mmap_write_lock_killable(mm))
@@ -1682,8 +1684,10 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
}
}
+ vm_flags = VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO;
+ vm_flags |= mseal_system_mappings();
vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
- VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
+ vm_flags,
&xol_mapping);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* [RFC PATCH v5 7/7] mseal, system mappings: update mseal.rst
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
` (5 preceding siblings ...)
2025-02-12 3:21 ` [RFC PATCH v5 6/7] mseal, system mappings: uprobe mapping jeffxu
@ 2025-02-12 3:21 ` jeffxu
2025-02-12 11:24 ` [RFC PATCH v5 0/7] mseal system mappings Lorenzo Stoakes
7 siblings, 0 replies; 43+ messages in thread
From: jeffxu @ 2025-02-12 3:21 UTC (permalink / raw)
To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
Cc: linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, pedro.falcato,
hca, willy, anna-maria, mark.rutland, linus.walleij, Jason,
deller, rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen,
mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes,
groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport, Jeff Xu
From: Jeff Xu <jeffxu@chromium.org>
Update memory sealing documentation to include details about system
mappings.
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
---
Documentation/userspace-api/mseal.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst
index 41102f74c5e2..1e4c996dfb75 100644
--- a/Documentation/userspace-api/mseal.rst
+++ b/Documentation/userspace-api/mseal.rst
@@ -130,6 +130,11 @@ Use cases
- Chrome browser: protect some security sensitive data structures.
+- System mappings:
+ If supported by an architecture (via CONFIG_ARCH_HAS_MSEAL_SYSTEM_MAPPINGS),
+ the CONFIG_MSEAL_SYSTEM_MAPPINGS seals system mappings, e.g. vdso, vvar,
+ uprobes, sigpage, vectors, etc.
+
When not to use mseal
=====================
Applications can apply sealing to any virtual memory region from userspace,
--
2.48.1.502.g6dc24dfdaf-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 3:21 [RFC PATCH v5 0/7] mseal system mappings jeffxu
` (6 preceding siblings ...)
2025-02-12 3:21 ` [RFC PATCH v5 7/7] mseal, system mappings: update mseal.rst jeffxu
@ 2025-02-12 11:24 ` Lorenzo Stoakes
2025-02-12 12:37 ` Pedro Falcato
` (2 more replies)
7 siblings, 3 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-02-12 11:24 UTC (permalink / raw)
To: jeffxu
Cc: akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
>
> The commit message in the first patch contains the full description of
> this series.
Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
this obviously isn't urgent, just be nice when we un-RFC.
Thanks for sending as RFC, appreciated, keen to figure out a way forward
with this series and this gives us space to discuss.
One thing that came up recently with the LWN article (...!) was that rr is
also impacted by this [0].
I think with this behind a config flag we're fine (this refers to my
'opt-in' comment in the reply on LWN) as my concerns about this being
enabled in a broken way without an explicit kernel configuration are
addressed, and actually we do expose a means by which a user can detect if
the VDSO for instance is sealed via /proc/$pid/[s]maps.
So tools like rr and such can be updated to check for this. I wonder if we
ought to try to liaise with the known problematic ones?
It'd be nice to update the documentation to have a list of 'known
problematic userland software with sealed VDSO' so we make people aware.
Hopefully we are acheiving the opt-in nature of the thing here, but it
makes me wonder whether we need a prctl() interface to optionally disable
even if the system has it enabled as a whole.
That way, rr for instance can just turn it off for debugging purposes. To
be clear I am not trying to add additional extranous tasks here - my one
and only motive is to ensure that the feature works and we address concerns
about any possible breakage.
And I _want the series to land_ :>) I suspect we're close now.
I am tied up with a number of other tasks at the moment so apologies if I
take a second to get back to this series, but just wanted to say thanks for
addressing my various points, and that I will definitely provide review
(it's on the whiteboard, the only true guarantee I will do something :P).
I will also come back to your testing series which I owe you review on,
which is equally on the same whiteboard...
Thanks, Lorenzo
[0]:https://lwn.net/Articles/1007984/
>
> ------------------
> History:
>
> V5
> - Remove kernel cmd line (Lorenzo Stoakes)
> - Add test info (Lorenzo Stoakes)
> - Add threat model info (Lorenzo Stoakes)
> - Fix x86 selftest: test_mremap_vdso
> - Restrict code change to ARM64/x86-64/UM arch only.
> - Add userprocess.h to include seal_system_mapping().
> - Remove sealing vsyscall.
> - Split the patch.
>
> V4:
> https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
>
> V3:
> https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
>
> V2:
> https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
>
> V1:
> https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
>
> Jeff Xu (7):
> mseal, system mappings: kernel config and header change
> selftests: x86: test_mremap_vdso: skip if vdso is msealed
> mseal, system mappings: enable x86-64
> mseal, system mappings: enable arm64
> mseal, system mappings: enable uml architecture
> mseal, system mappings: uprobe mapping
> mseal, system mappings: update mseal.rst
>
> Documentation/userspace-api/mseal.rst | 5 +++
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/vdso.c | 23 +++++++----
> arch/um/Kconfig | 1 +
> arch/x86/Kconfig | 1 +
> arch/x86/entry/vdso/vma.c | 17 ++++++---
> arch/x86/um/vdso/vma.c | 7 +++-
> include/linux/userprocess.h | 18 +++++++++
> init/Kconfig | 18 +++++++++
> kernel/events/uprobes.c | 6 ++-
> security/Kconfig | 18 +++++++++
> .../testing/selftests/x86/test_mremap_vdso.c | 38 +++++++++++++++++++
> 12 files changed, 137 insertions(+), 16 deletions(-)
> create mode 100644 include/linux/userprocess.h
>
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 11:24 ` [RFC PATCH v5 0/7] mseal system mappings Lorenzo Stoakes
@ 2025-02-12 12:37 ` Pedro Falcato
2025-02-12 14:01 ` Lorenzo Stoakes
2025-02-12 22:05 ` Kees Cook
2025-02-13 14:19 ` Jeff Xu
2 siblings, 1 reply; 43+ messages in thread
From: Pedro Falcato @ 2025-02-12 12:37 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: jeffxu, akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, hca, willy, anna-maria,
mark.rutland, linus.walleij, Jason, deller, rdunlap, davem,
peterx, f.fainelli, gerg, dave.hansen, mingo, ardb, mhocko,
42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > The commit message in the first patch contains the full description of
> > this series.
>
> Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> this obviously isn't urgent, just be nice when we un-RFC.
>
> Thanks for sending as RFC, appreciated, keen to figure out a way forward
> with this series and this gives us space to discuss.
>
> One thing that came up recently with the LWN article (...!) was that rr is
> also impacted by this [0].
>
> I think with this behind a config flag we're fine (this refers to my
> 'opt-in' comment in the reply on LWN) as my concerns about this being
> enabled in a broken way without an explicit kernel configuration are
> addressed, and actually we do expose a means by which a user can detect if
> the VDSO for instance is sealed via /proc/$pid/[s]maps.
>
> So tools like rr and such can be updated to check for this. I wonder if we
> ought to try to liaise with the known problematic ones?
>
> It'd be nice to update the documentation to have a list of 'known
> problematic userland software with sealed VDSO' so we make people aware.
>
> Hopefully we are acheiving the opt-in nature of the thing here, but it
> makes me wonder whether we need a prctl() interface to optionally disable
> even if the system has it enabled as a whole.
Just noting that (as we discussed off-list) doing prctl() would not
work, because that would effectively be an munseal for those vdso
regions.
Possibly something like a personality() flag (that's *not* inherited
when AT_SECURE/secureexec). But personalities have other issues...
FWIW, although it would (at the moment) be hard to pull off in the
libc, I still much prefer it to playing these weird games with CONFIG
options and kernel command line options and prctl and personality and
whatnot. It seems to me like we're trying to stick policy where it
doesn't belong.
--
Pedro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 12:37 ` Pedro Falcato
@ 2025-02-12 14:01 ` Lorenzo Stoakes
2025-02-12 14:08 ` Johannes Berg
2025-02-13 19:59 ` Pedro Falcato
0 siblings, 2 replies; 43+ messages in thread
From: Lorenzo Stoakes @ 2025-02-12 14:01 UTC (permalink / raw)
To: Pedro Falcato
Cc: jeffxu, akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, hca, willy, anna-maria,
mark.rutland, linus.walleij, Jason, deller, rdunlap, davem,
peterx, f.fainelli, gerg, dave.hansen, mingo, ardb, mhocko,
42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
(sorry I really am struggling to reply to mail as lore still seems to be
broken).
On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > The commit message in the first patch contains the full description of
> > > this series.
> >
> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> > this obviously isn't urgent, just be nice when we un-RFC.
> >
> > Thanks for sending as RFC, appreciated, keen to figure out a way forward
> > with this series and this gives us space to discuss.
> >
> > One thing that came up recently with the LWN article (...!) was that rr is
> > also impacted by this [0].
> >
> > I think with this behind a config flag we're fine (this refers to my
> > 'opt-in' comment in the reply on LWN) as my concerns about this being
> > enabled in a broken way without an explicit kernel configuration are
> > addressed, and actually we do expose a means by which a user can detect if
> > the VDSO for instance is sealed via /proc/$pid/[s]maps.
> >
> > So tools like rr and such can be updated to check for this. I wonder if we
> > ought to try to liaise with the known problematic ones?
> >
> > It'd be nice to update the documentation to have a list of 'known
> > problematic userland software with sealed VDSO' so we make people aware.
> >
> > Hopefully we are acheiving the opt-in nature of the thing here, but it
> > makes me wonder whether we need a prctl() interface to optionally disable
> > even if the system has it enabled as a whole.
>
> Just noting that (as we discussed off-list) doing prctl() would not
> work, because that would effectively be an munseal for those vdso
> regions.
> Possibly something like a personality() flag (that's *not* inherited
> when AT_SECURE/secureexec). But personalities have other issues...
Thanks, yeah that's a good point, it would have to be implemented as a
personality or something similar otherwise you're essentially relying on
'unsealing' which can't be permitted.
I'm not sure how useful that'd be for the likes of rr though. But I suppose
if it makes everything exec'd by a child inherit it then maybe that works
for a debugging session etc.?
>
> FWIW, although it would (at the moment) be hard to pull off in the
> libc, I still much prefer it to playing these weird games with CONFIG
> options and kernel command line options and prctl and personality and
> whatnot. It seems to me like we're trying to stick policy where it
> doesn't belong.
The problem is, as a security feature, you don't want to make it trivially
easy to disable.
I mean we _need_ a config option to be able to strictly enforce only making
the feature enable-able on architectures and configuration option
combinations that work.
But if there is userspace that will be broken, we really have to have some
way of avoiding the disconnect between somebody making policy decision at
the kernel level and somebody trying to run something.
Because I can easily envision somebody enabling this as a 'good security
feature' for a distro release or such, only for somebody else to later try
rr, CRIU, or whatever else and for it to just not work or fail subtly and
to have no idea why.
I mean one option is to have it as a CONFIG_ flag _and_ you have to enable
it via a tunable, so then it can become sysctl.d policy for instance.
The CONFIG_ flag dependency is critical because we don't want to enable
this on arches that have not been tested against it.
It's vital at any rate that we document everywhere we can that _this might
break some userland that depends on remapping the VDSO_.
>
> --
> Pedro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 14:01 ` Lorenzo Stoakes
@ 2025-02-12 14:08 ` Johannes Berg
2025-02-13 19:59 ` Pedro Falcato
1 sibling, 0 replies; 43+ messages in thread
From: Johannes Berg @ 2025-02-12 14:08 UTC (permalink / raw)
To: Lorenzo Stoakes, Pedro Falcato
Cc: jeffxu, akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, hca, willy, anna-maria,
mark.rutland, linus.walleij, Jason, deller, rdunlap, davem,
peterx, f.fainelli, gerg, dave.hansen, mingo, ardb, mhocko,
42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Wed, 2025-02-12 at 14:01 +0000, Lorenzo Stoakes wrote:
> Thanks, yeah that's a good point, it would have to be implemented as a
> personality or something similar otherwise you're essentially relying on
> 'unsealing' which can't be permitted.
>
> I'm not sure how useful that'd be for the likes of rr though. But I suppose
> if it makes everything exec'd by a child inherit it then maybe that works
> for a debugging session etc.?
For whatever that's worth, ARCH=um should not need 'unsealing' or 'not
sealing' it for *itself*, but rather only for the *children* it starts,
which are for the userspace processes inside of it. Which I suppose
could actually start without a VDSO in the first place, but I don't
think that's possible now?
Which I'll note should not have access to the host, so in a way this
outer security feature (sealing) breaks the inner ARCH=um security, I
think.
johannes
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 14:01 ` Lorenzo Stoakes
2025-02-12 14:08 ` Johannes Berg
@ 2025-02-13 19:59 ` Pedro Falcato
2025-02-13 20:47 ` Kees Cook
1 sibling, 1 reply; 43+ messages in thread
From: Pedro Falcato @ 2025-02-13 19:59 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: jeffxu, akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, hca, willy, anna-maria,
mark.rutland, linus.walleij, Jason, deller, rdunlap, davem,
peterx, f.fainelli, gerg, dave.hansen, mingo, ardb, mhocko,
42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> (sorry I really am struggling to reply to mail as lore still seems to be
> broken).
>
> On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > From: Jeff Xu <jeffxu@chromium.org>
> > > >
> > > > The commit message in the first patch contains the full description of
> > > > this series.
> > >
> > > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> > > this obviously isn't urgent, just be nice when we un-RFC.
> > >
> > > Thanks for sending as RFC, appreciated, keen to figure out a way forward
> > > with this series and this gives us space to discuss.
> > >
> > > One thing that came up recently with the LWN article (...!) was that rr is
> > > also impacted by this [0].
> > >
> > > I think with this behind a config flag we're fine (this refers to my
> > > 'opt-in' comment in the reply on LWN) as my concerns about this being
> > > enabled in a broken way without an explicit kernel configuration are
> > > addressed, and actually we do expose a means by which a user can detect if
> > > the VDSO for instance is sealed via /proc/$pid/[s]maps.
> > >
> > > So tools like rr and such can be updated to check for this. I wonder if we
> > > ought to try to liaise with the known problematic ones?
> > >
> > > It'd be nice to update the documentation to have a list of 'known
> > > problematic userland software with sealed VDSO' so we make people aware.
> > >
> > > Hopefully we are acheiving the opt-in nature of the thing here, but it
> > > makes me wonder whether we need a prctl() interface to optionally disable
> > > even if the system has it enabled as a whole.
> >
> > Just noting that (as we discussed off-list) doing prctl() would not
> > work, because that would effectively be an munseal for those vdso
> > regions.
> > Possibly something like a personality() flag (that's *not* inherited
> > when AT_SECURE/secureexec). But personalities have other issues...
>
> Thanks, yeah that's a good point, it would have to be implemented as a
> personality or something similar otherwise you're essentially relying on
> 'unsealing' which can't be permitted.
>
> I'm not sure how useful that'd be for the likes of rr though. But I suppose
> if it makes everything exec'd by a child inherit it then maybe that works
> for a debugging session etc.?
>
> >
> > FWIW, although it would (at the moment) be hard to pull off in the
> > libc, I still much prefer it to playing these weird games with CONFIG
> > options and kernel command line options and prctl and personality and
> > whatnot. It seems to me like we're trying to stick policy where it
> > doesn't belong.
>
> The problem is, as a security feature, you don't want to make it trivially
> easy to disable.
>
> I mean we _need_ a config option to be able to strictly enforce only making
> the feature enable-able on architectures and configuration option
> combinations that work.
>
> But if there is userspace that will be broken, we really have to have some
> way of avoiding the disconnect between somebody making policy decision at
> the kernel level and somebody trying to run something.
>
> Because I can easily envision somebody enabling this as a 'good security
> feature' for a distro release or such, only for somebody else to later try
> rr, CRIU, or whatever else and for it to just not work or fail subtly and
> to have no idea why.
Ok so I went looking around for the glibc patchset. It seems they're
moving away from tunables and there was a nice
GNU_PROPERTY_MEMORY_SEAL added to binutils.
So my proposal is to parse this property on the binfmt_elf.c side, and
mm would use this to know if we should seal these mappings. This seems
to tackle compatibility problems,
and glibc isn't sealing programs without this program header anyway. Thoughts?
>
> I mean one option is to have it as a CONFIG_ flag _and_ you have to enable
> it via a tunable, so then it can become sysctl.d policy for instance.
sysctl is also an option but the idea of dropping a random feature
behind a CONFIG_ that's unusable by lots of people (including the
general GNU/Linux ecosystem) is really really unappealing to me.
--
Pedro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-13 19:59 ` Pedro Falcato
@ 2025-02-13 20:47 ` Kees Cook
2025-02-18 23:18 ` Pedro Falcato
0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2025-02-13 20:47 UTC (permalink / raw)
To: Pedro Falcato
Cc: Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds, vbabka,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
> On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > (sorry I really am struggling to reply to mail as lore still seems to be
> > broken).
> >
> > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > > From: Jeff Xu <jeffxu@chromium.org>
> > > > >
> > > > > The commit message in the first patch contains the full description of
> > > > > this series.
> > > >
> > [...]
> > >
> > > FWIW, although it would (at the moment) be hard to pull off in the
> > > libc, I still much prefer it to playing these weird games with CONFIG
> > > options and kernel command line options and prctl and personality and
> > > whatnot. It seems to me like we're trying to stick policy where it
> > > doesn't belong.
> >
> > The problem is, as a security feature, you don't want to make it trivially
> > easy to disable.
> >
> > I mean we _need_ a config option to be able to strictly enforce only making
> > the feature enable-able on architectures and configuration option
> > combinations that work.
> >
> > But if there is userspace that will be broken, we really have to have some
> > way of avoiding the disconnect between somebody making policy decision at
> > the kernel level and somebody trying to run something.
> >
> > Because I can easily envision somebody enabling this as a 'good security
> > feature' for a distro release or such, only for somebody else to later try
> > rr, CRIU, or whatever else and for it to just not work or fail subtly and
> > to have no idea why.
>
> Ok so I went looking around for the glibc patchset. It seems they're
> moving away from tunables and there was a nice
> GNU_PROPERTY_MEMORY_SEAL added to binutils.
> So my proposal is to parse this property on the binfmt_elf.c side, and
> mm would use this to know if we should seal these mappings. This seems
> to tackle compatibility problems,
> and glibc isn't sealing programs without this program header anyway. Thoughts?
It seems to me that doing this ties it to the binary, rather than
execution context, which may want to seal/not-seal, etc. I have a sense
that it's be better as a secure bit, or prctl, or something like that. The
properties seem to be better suited for "this binary _can_ do a thing"
or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
maybe there's more to this I'm not considering?
> > I mean one option is to have it as a CONFIG_ flag _and_ you have to enable
> > it via a tunable, so then it can become sysctl.d policy for instance.
>
> sysctl is also an option but the idea of dropping a random feature
> behind a CONFIG_ that's unusable by lots of people (including the
> general GNU/Linux ecosystem) is really really unappealing to me.
I agree 100%, but I think we need to make small steps. Behind a CONFIG
means we get it implemented, and then we can look at how to make it more
flexible. I'm motivated to figure this out because I've long wanted to
have a boot param to disable CRIU since I have distro systems that I
don't use CRIU on, and I don't want the (very small) interface changes
it makes available into seccomp filter visibility. And if CRIU could be
run-time based, so could system mapping sealing. :)
--
Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-13 20:47 ` Kees Cook
@ 2025-02-18 23:18 ` Pedro Falcato
2025-02-19 13:46 ` Adhemerval Zanella Netto
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Pedro Falcato @ 2025-02-18 23:18 UTC (permalink / raw)
To: Kees Cook
Cc: Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds, vbabka,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote:
>
> On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
> > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > (sorry I really am struggling to reply to mail as lore still seems to be
> > > broken).
> > >
> > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > >
> > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > > > From: Jeff Xu <jeffxu@chromium.org>
> > > > > >
> > > > > > The commit message in the first patch contains the full description of
> > > > > > this series.
> > > > >
> > > [...]
> > > >
> > > > FWIW, although it would (at the moment) be hard to pull off in the
> > > > libc, I still much prefer it to playing these weird games with CONFIG
> > > > options and kernel command line options and prctl and personality and
> > > > whatnot. It seems to me like we're trying to stick policy where it
> > > > doesn't belong.
> > >
> > > The problem is, as a security feature, you don't want to make it trivially
> > > easy to disable.
> > >
> > > I mean we _need_ a config option to be able to strictly enforce only making
> > > the feature enable-able on architectures and configuration option
> > > combinations that work.
> > >
> > > But if there is userspace that will be broken, we really have to have some
> > > way of avoiding the disconnect between somebody making policy decision at
> > > the kernel level and somebody trying to run something.
> > >
> > > Because I can easily envision somebody enabling this as a 'good security
> > > feature' for a distro release or such, only for somebody else to later try
> > > rr, CRIU, or whatever else and for it to just not work or fail subtly and
> > > to have no idea why.
> >
> > Ok so I went looking around for the glibc patchset. It seems they're
> > moving away from tunables and there was a nice
> > GNU_PROPERTY_MEMORY_SEAL added to binutils.
> > So my proposal is to parse this property on the binfmt_elf.c side, and
> > mm would use this to know if we should seal these mappings. This seems
> > to tackle compatibility problems,
> > and glibc isn't sealing programs without this program header anyway. Thoughts?
>
> It seems to me that doing this ties it to the binary, rather than
> execution context, which may want to seal/not-seal, etc. I have a sense
> that it's be better as a secure bit, or prctl, or something like that. The
> properties seem to be better suited for "this binary _can_ do a thing"
> or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
> maybe there's more to this I'm not considering?
Doesn't this exactly kind of Just Work though? "This binary can
do/tolerate sealing". I would blindly guess that we don't have very
opinionated shared libraries that do this kind of shenanigans
unilaterally, so that's probably not something we really need to worry
about (though I admittedly need to read through the glibc patchset,
and nail down what they're thinking about doing with linking
mseal-ready and mseal-non-ready ELF execs/shared objects together).
The problem with something like prctl is that we either indirectly
provide some kind of limited form of munseal, or we require some sort
of handover (like personality(2) + execve(2)), which both sound like a
huge PITA and still don't solve any of our backwards compat issues...
all binaries would need to be patched with this
prctl/personality()/whatever call, and old ones wouldn't work.
The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
the toolchain folks could shed some light?), but it sounds like it'd
fit perfectly.
I suspect we probably want to parse this on the kernel's side anyway
(to seal the main program/interp's segments)[1], then extending them
to the kernel system mappings should be somewhat trivial...
I don't think we'll ever get a program that can't cope with sealing
the system mappings but can cope with sealing itself (and if we do, we
just won't seal the entire thing and that's _okay_).
Deploying mseal-ready programs could then be done in a phased way by
distros. e.g chromeOS and android could simply enable the
corresponding linker option in LDFLAGS and let it rip. Other more
mainstream distros could obviously take a little longer or test/deploy
this on all programs not named gVisor and/or after CRIU is okay with
all of this. We then might not need a user-configurable CONFIG_ (only
an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
everyone is happy.
I glanced through libc-alpha again and it seems like the glibc folks
also seem to have reached the same idea, but I'd love to hear from
Adhemerval.
Am I missing anything?
[1] we should probably nail this responsibility handover down before
glibc msealing (or bionic) makes it to a release. It'd probably be a
little nicer if we could mseal these segments from the kernel instead
of forcing the libc to take care of this, now that we have this
property.
--
Pedro
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-18 23:18 ` Pedro Falcato
@ 2025-02-19 13:46 ` Adhemerval Zanella Netto
2025-02-19 17:17 ` enh
2025-02-23 2:05 ` Jeff Xu
2 siblings, 0 replies; 43+ messages in thread
From: Adhemerval Zanella Netto @ 2025-02-19 13:46 UTC (permalink / raw)
To: Pedro Falcato, Kees Cook
Cc: Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds, vbabka,
Liam.Howlett, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, hca, willy, anna-maria,
mark.rutland, linus.walleij, Jason, deller, rdunlap, davem,
peterx, f.fainelli, gerg, dave.hansen, mingo, ardb, mhocko,
42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On 18/02/25 20:18, Pedro Falcato wrote:
> On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote:
>>
>> On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
>>> On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
>>> <lorenzo.stoakes@oracle.com> wrote:
>>>>
>>>> (sorry I really am struggling to reply to mail as lore still seems to be
>>>> broken).
>>>>
>>>> On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
>>>>> On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
>>>>> <lorenzo.stoakes@oracle.com> wrote:
>>>>>>
>>>>>> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
>>>>>>> From: Jeff Xu <jeffxu@chromium.org>
>>>>>>>
>>>>>>> The commit message in the first patch contains the full description of
>>>>>>> this series.
>>>>>>
>>>> [...]
>>>>>
>>>>> FWIW, although it would (at the moment) be hard to pull off in the
>>>>> libc, I still much prefer it to playing these weird games with CONFIG
>>>>> options and kernel command line options and prctl and personality and
>>>>> whatnot. It seems to me like we're trying to stick policy where it
>>>>> doesn't belong.
>>>>
>>>> The problem is, as a security feature, you don't want to make it trivially
>>>> easy to disable.
>>>>
>>>> I mean we _need_ a config option to be able to strictly enforce only making
>>>> the feature enable-able on architectures and configuration option
>>>> combinations that work.
>>>>
>>>> But if there is userspace that will be broken, we really have to have some
>>>> way of avoiding the disconnect between somebody making policy decision at
>>>> the kernel level and somebody trying to run something.
>>>>
>>>> Because I can easily envision somebody enabling this as a 'good security
>>>> feature' for a distro release or such, only for somebody else to later try
>>>> rr, CRIU, or whatever else and for it to just not work or fail subtly and
>>>> to have no idea why.
>>>
>>> Ok so I went looking around for the glibc patchset. It seems they're
>>> moving away from tunables and there was a nice
>>> GNU_PROPERTY_MEMORY_SEAL added to binutils.
>>> So my proposal is to parse this property on the binfmt_elf.c side, and
>>> mm would use this to know if we should seal these mappings. This seems
>>> to tackle compatibility problems,
>>> and glibc isn't sealing programs without this program header anyway. Thoughts?
>>
>> It seems to me that doing this ties it to the binary, rather than
>> execution context, which may want to seal/not-seal, etc. I have a sense
>> that it's be better as a secure bit, or prctl, or something like that. The
>> properties seem to be better suited for "this binary _can_ do a thing"
>> or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
>> maybe there's more to this I'm not considering?
>
> Doesn't this exactly kind of Just Work though? "This binary can
> do/tolerate sealing". I would blindly guess that we don't have very
> opinionated shared libraries that do this kind of shenanigans
> unilaterally, so that's probably not something we really need to worry
> about (though I admittedly need to read through the glibc patchset,
> and nail down what they're thinking about doing with linking
> mseal-ready and mseal-non-ready ELF execs/shared objects together).
> The problem with something like prctl is that we either indirectly
> provide some kind of limited form of munseal, or we require some sort
> of handover (like personality(2) + execve(2)), which both sound like a
> huge PITA and still don't solve any of our backwards compat issues...
> all binaries would need to be patched with this
> prctl/personality()/whatever call, and old ones wouldn't work.
>
> The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
> the toolchain folks could shed some light?), but it sounds like it'd
> fit perfectly.
> I suspect we probably want to parse this on the kernel's side anyway
> (to seal the main program/interp's segments)[1], then extending them
> to the kernel system mappings should be somewhat trivial...
> I don't think we'll ever get a program that can't cope with sealing
> the system mappings but can cope with sealing itself (and if we do, we
> just won't seal the entire thing and that's _okay_).
>
> Deploying mseal-ready programs could then be done in a phased way by
> distros. e.g chromeOS and android could simply enable the
> corresponding linker option in LDFLAGS and let it rip. Other more
> mainstream distros could obviously take a little longer or test/deploy
> this on all programs not named gVisor and/or after CRIU is okay with
> all of this. We then might not need a user-configurable CONFIG_ (only
> an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
> everyone is happy.
>
> I glanced through libc-alpha again and it seems like the glibc folks
> also seem to have reached the same idea, but I'd love to hear from
> Adhemerval.
>
> Am I missing anything?
Hi Pedro,
After discussing with CRIU developers, I plan to change how glibc will
handle GNU_PROPERTY_MEMORY_SEAL to allow a more smooth enablement in
distros (the latest version is at [1]).
The idea is only enable memory sealing iff the main binary (either
ET_EXEC or PIE) has the new sealing attribute. If it were the case,
the loader will still check if the dependency has the attribute and
seal accordingly.
So if for any reason the binary does not support memory sealing at all,
like CRIU for the snapshot restore phase, it just need to be built
with -Wl,-z,nomemory-seal. It is also for the case for libraries,
although I think this will be rare.
I will also work on adding a way to enable partially non-sealing
sections, since Florian has hinted that he wants to use on some libgcc
metadata (similar to how RELRO '.data.rel.ro' works). My initial plan
is just mimic what OpenBSD does with PT_OPENBSD_MUTABLE, which only
works for ET_DYN (and I think it should not be a problem). But it is
a userland problem, so no kernel support would be required.
>
>
> [1] we should probably nail this responsibility handover down before
> glibc msealing (or bionic) makes it to a release. It'd probably be a
> little nicer if we could mseal these segments from the kernel instead
> of forcing the libc to take care of this, now that we have this
> property.
>
Keep in mind that GNU_PROPERTY_MEMORY_SEAL is currently a glibc extension
and I am not sure if other runtime would be willing to adopt as well.
So its presence can not be implied that sealing will be applied by runtime.
[1] https://patchwork.sourceware.org/project/glibc/list/?series=43524
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-18 23:18 ` Pedro Falcato
2025-02-19 13:46 ` Adhemerval Zanella Netto
@ 2025-02-19 17:17 ` enh
2025-02-23 2:05 ` Jeff Xu
2 siblings, 0 replies; 43+ messages in thread
From: enh @ 2025-02-19 17:17 UTC (permalink / raw)
To: Pedro Falcato
Cc: Kees Cook, Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds,
vbabka, Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Tue, Feb 18, 2025 at 6:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Thu, Feb 13, 2025 at 8:47 PM Kees Cook <kees@kernel.org> wrote:
> >
> > On Thu, Feb 13, 2025 at 07:59:48PM +0000, Pedro Falcato wrote:
> > > On Wed, Feb 12, 2025 at 2:02 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > (sorry I really am struggling to reply to mail as lore still seems to be
> > > > broken).
> > > >
> > > > On Wed, Feb 12, 2025 at 12:37:50PM +0000, Pedro Falcato wrote:
> > > > > On Wed, Feb 12, 2025 at 11:25 AM Lorenzo Stoakes
> > > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > > > > > From: Jeff Xu <jeffxu@chromium.org>
> > > > > > >
> > > > > > > The commit message in the first patch contains the full description of
> > > > > > > this series.
> > > > > >
> > > > [...]
> > > > >
> > > > > FWIW, although it would (at the moment) be hard to pull off in the
> > > > > libc, I still much prefer it to playing these weird games with CONFIG
> > > > > options and kernel command line options and prctl and personality and
> > > > > whatnot. It seems to me like we're trying to stick policy where it
> > > > > doesn't belong.
> > > >
> > > > The problem is, as a security feature, you don't want to make it trivially
> > > > easy to disable.
> > > >
> > > > I mean we _need_ a config option to be able to strictly enforce only making
> > > > the feature enable-able on architectures and configuration option
> > > > combinations that work.
> > > >
> > > > But if there is userspace that will be broken, we really have to have some
> > > > way of avoiding the disconnect between somebody making policy decision at
> > > > the kernel level and somebody trying to run something.
> > > >
> > > > Because I can easily envision somebody enabling this as a 'good security
> > > > feature' for a distro release or such, only for somebody else to later try
> > > > rr, CRIU, or whatever else and for it to just not work or fail subtly and
> > > > to have no idea why.
> > >
> > > Ok so I went looking around for the glibc patchset. It seems they're
> > > moving away from tunables and there was a nice
> > > GNU_PROPERTY_MEMORY_SEAL added to binutils.
> > > So my proposal is to parse this property on the binfmt_elf.c side, and
> > > mm would use this to know if we should seal these mappings. This seems
> > > to tackle compatibility problems,
> > > and glibc isn't sealing programs without this program header anyway. Thoughts?
> >
> > It seems to me that doing this ties it to the binary, rather than
> > execution context, which may want to seal/not-seal, etc. I have a sense
> > that it's be better as a secure bit, or prctl, or something like that. The
> > properties seem to be better suited for "this binary _can_ do a thing"
> > or "this binary _requires_ a thing", like the GNU_STACK bits, etc. But
> > maybe there's more to this I'm not considering?
>
> Doesn't this exactly kind of Just Work though? "This binary can
> do/tolerate sealing". I would blindly guess that we don't have very
> opinionated shared libraries that do this kind of shenanigans
> unilaterally, so that's probably not something we really need to worry
> about (though I admittedly need to read through the glibc patchset,
> and nail down what they're thinking about doing with linking
> mseal-ready and mseal-non-ready ELF execs/shared objects together).
> The problem with something like prctl is that we either indirectly
> provide some kind of limited form of munseal, or we require some sort
> of handover (like personality(2) + execve(2)), which both sound like a
> huge PITA and still don't solve any of our backwards compat issues...
> all binaries would need to be patched with this
> prctl/personality()/whatever call, and old ones wouldn't work.
>
> The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
> the toolchain folks could shed some light?), but it sounds like it'd
> fit perfectly.
> I suspect we probably want to parse this on the kernel's side anyway
> (to seal the main program/interp's segments)[1], then extending them
> to the kernel system mappings should be somewhat trivial...
> I don't think we'll ever get a program that can't cope with sealing
> the system mappings but can cope with sealing itself (and if we do, we
> just won't seal the entire thing and that's _okay_).
>
> Deploying mseal-ready programs could then be done in a phased way by
> distros. e.g chromeOS and android could simply enable the
> corresponding linker option in LDFLAGS and let it rip. Other more
> mainstream distros could obviously take a little longer or test/deploy
> this on all programs not named gVisor and/or after CRIU is okay with
> all of this. We then might not need a user-configurable CONFIG_ (only
> an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
> everyone is happy.
>
> I glanced through libc-alpha again and it seems like the glibc folks
> also seem to have reached the same idea, but I'd love to hear from
> Adhemerval.
>
> Am I missing anything?
Android's a bit interesting because there isn't really a "binary" in
the usual sense. an Android app is basically a shared library of JNI
code dlopen()ed into a clone() of an already-initialized Java runtime
(the "zygote").
that said, i'm not expecting sealing the vdso to be problematic (even
if i'm not sure how useful it is to do so, being unaware of any
exploit that's ever used this?).
for me the tricky part is when it's used for regular user shared
libraries which seems the most convincing use case to me, albeit
mainly for app compat reasons ["stopping apps from poking at
implementation details they shouldn't, which later causes breakage if
the implementation detail changes"]. the irony being that it'll
_cause_ app compat problems by breaking all such things all at once.
so that'll be "fun" for someone to try to roll out!
it also breaks dlclose() (unless your dlopen() was with RTLD_NODELETE,
which is not the default on Android either). that's fine for OS
libraries that have already been loaded by the zygote, but hard to
make a default. that said, i do expect games and banking apps to be
interested to try this when they hear about it.
anyway, in general i think an ELF property per-library sounds
reasonable, matching what we have for arm64 BTI. i have no strong
opinion on the system mappings and what if anything executables should
do, for the reasons given above, but the elf property sounds
reasonable.
> [1] we should probably nail this responsibility handover down before
> glibc msealing (or bionic) makes it to a release. It'd probably be a
> little nicer if we could mseal these segments from the kernel instead
> of forcing the libc to take care of this, now that we have this
> property.
>
> --
> Pedro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-18 23:18 ` Pedro Falcato
2025-02-19 13:46 ` Adhemerval Zanella Netto
2025-02-19 17:17 ` enh
@ 2025-02-23 2:05 ` Jeff Xu
2 siblings, 0 replies; 43+ messages in thread
From: Jeff Xu @ 2025-02-23 2:05 UTC (permalink / raw)
To: Pedro Falcato
Cc: Kees Cook, Lorenzo Stoakes, akpm, jannh, torvalds, vbabka,
Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin,
linux-kernel, linux-hardening, linux-mm, jorgelo, sroettger, hch,
ojeda, thomas.weissschuh, adobriyan, johannes, hca, willy,
anna-maria, mark.rutland, linus.walleij, Jason, deller, rdunlap,
davem, peterx, f.fainelli, gerg, dave.hansen, mingo, ardb,
mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck, mpe,
aleksandr.mikhalitsyn, mike.rapoport
On Tue, Feb 18, 2025 at 3:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> The problem with something like prctl is that we either indirectly
> provide some kind of limited form of munseal, or we require some sort
> of handover (like personality(2) + execve(2)), which both sound like a
> huge PITA and still don't solve any of our backwards compat issues...
> all binaries would need to be patched with this
> prctl/personality()/whatever call, and old ones wouldn't work.
>
SECBIT (such as AT_EXECVE_CHECK) can be locked, which provides a
strong security guarantee.
> The semantics behind GNU_PROPERTY_MEMORY_SEAL are unclear to me (maybe
> the toolchain folks could shed some light?), but it sounds like it'd
> fit perfectly.
> I suspect we probably want to parse this on the kernel's side anyway
> (to seal the main program/interp's segments)[1], then extending them
> to the kernel system mappings should be somewhat trivial...
> I don't think we'll ever get a program that can't cope with sealing
> the system mappings but can cope with sealing itself (and if we do, we
> just won't seal the entire thing and that's _okay_).
>
> Deploying mseal-ready programs could then be done in a phased way by
> distros. e.g chromeOS and android could simply enable the
> corresponding linker option in LDFLAGS and let it rip. Other more
> mainstream distros could obviously take a little longer or test/deploy
> this on all programs not named gVisor and/or after CRIU is okay with
> all of this. We then might not need a user-configurable CONFIG_ (only
> an arch HAS_SYSTEM_MAPPINGS_MSEAL or whatever), nor a sysctl, and
> everyone is happy.
>
> I glanced through libc-alpha again and it seems like the glibc folks
> also seem to have reached the same idea, but I'd love to hear from
> Adhemerval.
>
> Am I missing anything?
>
I'm hesitant to rely on HAS_SYSTEM_MAPPINGS_MSEAL in the kernel for
special mappings.
Think this way: some apps don't want vdso, but the kernel creates it
anyway, those apps were forced to unmap or remap them.
If the kernel should take a new dependency on the elf header, a better
approach is adding a bit in the elf header to indicate "not-creating
vdso". This would solve problems for those apps that want to unmap
vdso.
CRIU would need more than this new bit, i.e. an ability to create
vdso on demand. Currently, during restore workflow, CRIU remaps the
vdso created by the current kernel (can't use vdso from the back-up,
because vdso is tied to the kernel version), and the remapping address
must be the same address as the backed-up process, to avoid
symbol-relocation. Kernel can provide a new functionality to create
this vdso mapping as desired by CRIU.
Then the vdso can be sealed from creation in all cases, this is the
most secure approach. And the kernel also doesn't need that code to
remap/unmap vdso - that is also cleaner IMO.
Thanks.
-Jeff
>
> [1] we should probably nail this responsibility handover down before
> glibc msealing (or bionic) makes it to a release. It'd probably be a
> little nicer if we could mseal these segments from the kernel instead
> of forcing the libc to take care of this, now that we have this
> property.
>
> --
> Pedro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 11:24 ` [RFC PATCH v5 0/7] mseal system mappings Lorenzo Stoakes
2025-02-12 12:37 ` Pedro Falcato
@ 2025-02-12 22:05 ` Kees Cook
2025-02-13 14:20 ` Jeff Xu
2025-02-13 18:35 ` Liam R. Howlett
2025-02-13 14:19 ` Jeff Xu
2 siblings, 2 replies; 43+ messages in thread
From: Kees Cook @ 2025-02-12 22:05 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: jeffxu, akpm, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
> On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > The commit message in the first patch contains the full description of
> > this series.
>
> Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> this obviously isn't urgent, just be nice when we un-RFC.
I advised Jeff against this because I've found it can sometimes cause
"thread splitting" in that some people reply to the cover letter, and
some people reply to the first patch, etc. I've tended to try to keep
cover letters very general, with the bulk of the prose in the first
patch.
> It'd be nice to update the documentation to have a list of 'known
> problematic userland software with sealed VDSO' so we make people aware.
I like this idea! Probably in mseal.rst, as the Kconfig help already
points there.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 22:05 ` Kees Cook
@ 2025-02-13 14:20 ` Jeff Xu
2025-02-13 18:35 ` Liam R. Howlett
1 sibling, 0 replies; 43+ messages in thread
From: Jeff Xu @ 2025-02-13 14:20 UTC (permalink / raw)
To: Kees Cook
Cc: Lorenzo Stoakes, akpm, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 2:05 PM Kees Cook <kees@kernel.org> wrote:
>
> > It'd be nice to update the documentation to have a list of 'known
> > problematic userland software with sealed VDSO' so we make people aware.
>
> I like this idea! Probably in mseal.rst, as the Kconfig help already
> points there.
>
Will update mseal.rst to include the above suggestion in the next version.
Thanks
-Jeff
> -Kees
>
> --
> Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 22:05 ` Kees Cook
2025-02-13 14:20 ` Jeff Xu
@ 2025-02-13 18:35 ` Liam R. Howlett
2025-02-13 19:34 ` Kees Cook
1 sibling, 1 reply; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-13 18:35 UTC (permalink / raw)
To: Kees Cook
Cc: Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds, vbabka,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
* Kees Cook <kees@kernel.org> [250212 17:05]:
> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > The commit message in the first patch contains the full description of
> > > this series.
> >
> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> > this obviously isn't urgent, just be nice when we un-RFC.
>
> I advised Jeff against this because I've found it can sometimes cause
> "thread splitting" in that some people reply to the cover letter, and
> some people reply to the first patch, etc. I've tended to try to keep
> cover letters very general, with the bulk of the prose in the first
> patch.
Interesting idea, but I think thread splitting is less of a concern than
diluting the meaning of a patch by including a lengthy change log with a
fraction of the text being about the code that follows.
I think this is the reason for a cover letter in the first place; not
just version control. After all, we could tack the version information
into the first patch too and avoid it being in the final commit message.
Thanks,
Liam
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-13 18:35 ` Liam R. Howlett
@ 2025-02-13 19:34 ` Kees Cook
2025-02-13 20:10 ` Liam R. Howlett
0 siblings, 1 reply; 43+ messages in thread
From: Kees Cook @ 2025-02-13 19:34 UTC (permalink / raw)
To: Liam R. Howlett
Cc: Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds, vbabka,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On February 13, 2025 10:35:21 AM PST, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
>* Kees Cook <kees@kernel.org> [250212 17:05]:
>> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
>> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
>> > > From: Jeff Xu <jeffxu@chromium.org>
>> > >
>> > > The commit message in the first patch contains the full description of
>> > > this series.
>> >
>> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
>> > this obviously isn't urgent, just be nice when we un-RFC.
>>
>> I advised Jeff against this because I've found it can sometimes cause
>> "thread splitting" in that some people reply to the cover letter, and
>> some people reply to the first patch, etc. I've tended to try to keep
>> cover letters very general, with the bulk of the prose in the first
>> patch.
>
>Interesting idea, but I think thread splitting is less of a concern than
>diluting the meaning of a patch by including a lengthy change log with a
>fraction of the text being about the code that follows.
>
>I think this is the reason for a cover letter in the first place; not
>just version control. After all, we could tack the version information
>into the first patch too and avoid it being in the final commit message.
Okay, so to be clear: you'd prefer to put the rationales and other stuff in the cover, and put more specific details in the first patch? I've not liked this because cover letters aren't (except for akpm's trees) included anywhere in git, which makes archeology much harder.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-13 19:34 ` Kees Cook
@ 2025-02-13 20:10 ` Liam R. Howlett
0 siblings, 0 replies; 43+ messages in thread
From: Liam R. Howlett @ 2025-02-13 20:10 UTC (permalink / raw)
To: Kees Cook
Cc: Lorenzo Stoakes, jeffxu, akpm, jannh, torvalds, vbabka,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
* Kees Cook <kees@kernel.org> [250213 14:34]:
>
>
> On February 13, 2025 10:35:21 AM PST, "Liam R. Howlett" <Liam.Howlett@oracle.com> wrote:
> >* Kees Cook <kees@kernel.org> [250212 17:05]:
> >> On Wed, Feb 12, 2025 at 11:24:35AM +0000, Lorenzo Stoakes wrote:
> >> > On Wed, Feb 12, 2025 at 03:21:48AM +0000, jeffxu@chromium.org wrote:
> >> > > From: Jeff Xu <jeffxu@chromium.org>
> >> > >
> >> > > The commit message in the first patch contains the full description of
> >> > > this series.
> >> >
> >> > Sorry to nit, but it'd be useful to reproduce in the cover letter too! But
> >> > this obviously isn't urgent, just be nice when we un-RFC.
> >>
> >> I advised Jeff against this because I've found it can sometimes cause
> >> "thread splitting" in that some people reply to the cover letter, and
> >> some people reply to the first patch, etc. I've tended to try to keep
> >> cover letters very general, with the bulk of the prose in the first
> >> patch.
> >
> >Interesting idea, but I think thread splitting is less of a concern than
> >diluting the meaning of a patch by including a lengthy change log with a
> >fraction of the text being about the code that follows.
> >
> >I think this is the reason for a cover letter in the first place; not
> >just version control. After all, we could tack the version information
> >into the first patch too and avoid it being in the final commit message.
>
> Okay, so to be clear: you'd prefer to put the rationales and other stuff in the cover, and put more specific details in the first patch? I've not liked this because cover letters aren't (except for akpm's trees) included anywhere in git, which makes archeology much harder.
Yes, rationales in the cover letter. I like the way the akpm's tree
does things because it's the best of both worlds. There is also a
separation of the cover letter with the actual commit message on the
first patch.
Having the full cover letter on patch 1 makes it difficult to understand
*that* patch on its own during review.
I've also gotten emails from Linus asking why in the ____ing ____ I did
it this way when I said why in the cover letter.. to that note I like
the patches to have _all_ the necessary details for that one patch,
including the sometimes "this is changed in the very next patch" lines
to spell out in-transit patches, or what ever else is needed from the
cover letter/context.
Taking this example, we have a 111 line cover letter and a patch that
adds a new file with a single function, and two kconfig options. The
justification and reason for the patch is in the middle of that huge
block of text. That seems silly.
That is to say:
Cover letters have the rationale and over-arching reason.
Patches have more than enough details to code inspect and know why this
patch is necessary.
Thanks,
Liam
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH v5 0/7] mseal system mappings
2025-02-12 11:24 ` [RFC PATCH v5 0/7] mseal system mappings Lorenzo Stoakes
2025-02-12 12:37 ` Pedro Falcato
2025-02-12 22:05 ` Kees Cook
@ 2025-02-13 14:19 ` Jeff Xu
2 siblings, 0 replies; 43+ messages in thread
From: Jeff Xu @ 2025-02-13 14:19 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
linux-hardening, linux-mm, jorgelo, sroettger, hch, ojeda,
thomas.weissschuh, adobriyan, johannes, pedro.falcato, hca,
willy, anna-maria, mark.rutland, linus.walleij, Jason, deller,
rdunlap, davem, peterx, f.fainelli, gerg, dave.hansen, mingo,
ardb, mhocko, 42.hyeyoo, peterz, ardb, enh, rientjes, groeck,
mpe, aleksandr.mikhalitsyn, mike.rapoport
On Wed, Feb 12, 2025 at 3:24 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> It'd be nice to update the documentation to have a list of 'known
> problematic userland software with sealed VDSO' so we make people aware.
>
Sure. It will be added in the next version.
>
> And I _want the series to land_ :>) I suspect we're close now.
>
Thank you for reviewing the patch and giving support.
> I am tied up with a number of other tasks at the moment so apologies if I
> take a second to get back to this series, but just wanted to say thanks for
> addressing my various points, and that I will definitely provide review
> (it's on the whiteboard, the only true guarantee I will do something :P).
>
> I will also come back to your testing series which I owe you review on,
> which is equally on the same whiteboard...
>
Thank you for the heads up.
-Jeff
^ permalink raw reply [flat|nested] 43+ messages in thread