linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/7] mseal system mappings
@ 2025-03-05  2:17 jeffxu
  2025-03-05  2:17 ` [PATCH v9 1/7] mseal sysmap: kernel config and header change jeffxu
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu

From: Jeff Xu <jeffxu@chromium.org>

This is V9 version, addressing comments from V8, without code logic
change.

-------------------------------------------------------------------
As discussed during mseal() upstream process [1], mseal() protects
the VMAs of a given virtual memory range against modifications, such
as the read/write (RW) and no-execute (NX) bits. For complete
descriptions of memory sealing, please see mseal.rst [2].

The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped.

The system mappings are readonly only, memory sealing can protect
them from ever changing to writable or unmmap/remapped as different
attributes.

System mappings such as vdso, vvar, vvar_vclock,
vectors (arm compat-mode), sigpage (arm compat-mode),
are created by the kernel during program initialization, and could
be sealed after creation.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [3]. It could be sealed from creation.

The vsyscall on x86-64 uses a special address (0xffffffffff600000),
which is outside the mm managed range. This means mprotect, munmap, and
mremap won't work on the vsyscall. Since sealing doesn't enhance
the vsyscall's security, it is skipped in this patch. If we ever seal
the vsyscall, it is probably only for decorative purpose, i.e. showing
the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.

To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag. Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime. The existence of this flag for an architecture implies that
it does not require the remapping of thest system mappings during
process lifetime, so sealing these mappings is safe from a kernel
perspective.

This version covers x86-64 and arm64 archiecture as minimum viable feature.

While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel. Other
architectures can choose whether or not to adopt this feature. Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace. The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4]. The UML kernel can add this support
when KUnit tests require it [5].

In this version, we've improved the handling of system mapping sealing from
previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.

Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
  than its actual size). The dynamic linker, which relies on PT_LOAD
  information to determine mapping size, would then split and partially
  seal the vdso mapping. Since each architecture has its own vdso/vvar
  code, fixing this in the kernel would require going through each
  archiecture. Our initial goal was to enable sealing readonly mappings,
  e.g. .text, across all architectures, sealing vdso from kernel since
  creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not length
  information. Similar issues might exist for other special mappings.
- Mappings like uprobe are not covered by the dynamic linker,
  and there is no effective solution for them.

This feature's security enhancements will benefit ChromeOS, Android,
and other high security systems.

Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
  i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
  Android to ensure the sealing doesn't affect the functionality of
  Chromebook and Android phone.

I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
  normal operations such as browsing the web, open/edit doc are OK.

Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]

-------------------------------------------
History:

V9:
 - Add negative test in selftest (Kees Cook)
 - fx typos in text (Kees Cook)

V8:
  - Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett)
  - Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett)
  - Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes)
  - Remove "vm_flags =" (Kees Cook, Liam R. Howlett,  Oleg Nesterov)
  - Drop uml architecture (Lorenzo Stoakes, Kees Cook)
  - Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)

V7:
  https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/
  - Remove cover letter from the first patch (Liam R. Howlett)
  - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
  - logging and fclose() in selftest (Liam R. Howlett)

V6:
  https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
  - mseal.rst: fix a typo (Randy Dunlap)
  - security/Kconfig: add rr into note (Liam R. Howlett)
  - remove mseal_system_mappings() and use macro instead (Liam R. Howlett)
  - mseal.rst: add incompatible userland software (Lorenzo Stoakes)
  - remove RFC from title (Kees Cook)

V5
  https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
  - Remove kernel cmd line (Lorenzo Stoakes)
  - Add test info (Lorenzo Stoakes)
  - Add threat model info (Lorenzo Stoakes)
  - Fix x86 selftest: test_mremap_vdso
  - Restrict code change to ARM64/x86-64/UM arch only.
  - Add userprocess.h to include seal_system_mapping().
  - Remove sealing vsyscall.
  - Split the patch.

V4:
  https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
  - ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes)
  - test info (Lorenzo Stoakes)
  - Update  mseal.rst (Liam R. Howlett)
  - Update test_mremap_vdso.c (Liam R. Howlett)
  - Misc. style, comments, doc update (Liam R. Howlett)

V3:
  https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
  - Revert uprobe to v1 logic (Oleg Nesterov)
  - use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook)
  - Move kernel cmd line from fs/exec.c to mm/mseal.c and
    misc. (Liam R. Howlett)

V2:
  https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
  - Seal uprobe always (Oleg Nesterov)
  - Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
  - Rebase to linux_main

V1:
 - https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/

--------------------------------------------------



Jeff Xu (7):
  mseal sysmap: kernel config and header change
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: enable x86-64
  mseal sysmap: enable arm64
  mseal sysmap: uprobe mapping
  mseal sysmap: update mseal.rst
  selftest: test system mappings are sealed.

 Documentation/userspace-api/mseal.rst         |  20 +++
 arch/arm64/Kconfig                            |   1 +
 arch/arm64/kernel/vdso.c                      |  12 +-
 arch/x86/Kconfig                              |   1 +
 arch/x86/entry/vdso/vma.c                     |   7 +-
 include/linux/mm.h                            |  10 ++
 init/Kconfig                                  |  22 ++++
 kernel/events/uprobes.c                       |   3 +-
 security/Kconfig                              |  21 ++++
 tools/testing/selftests/Makefile              |   1 +
 .../mseal_system_mappings/.gitignore          |   2 +
 .../selftests/mseal_system_mappings/Makefile  |   6 +
 .../selftests/mseal_system_mappings/config    |   1 +
 .../mseal_system_mappings/sysmap_is_sealed.c  | 119 ++++++++++++++++++
 .../testing/selftests/x86/test_mremap_vdso.c  |  43 +++++++
 15 files changed, 261 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore
 create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile
 create mode 100644 tools/testing/selftests/mseal_system_mappings/config
 create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c

-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 1/7] mseal sysmap: kernel config and header change
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
@ 2025-03-05  2:17 ` jeffxu
       [not found]   ` <2a42ac63-d7a2-48ae-ae86-568d0fc59d51@lucifer.local>
  2025-03-05  2:17 ` [PATCH v9 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed jeffxu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu, Kees Cook

From: Jeff Xu <jeffxu@chromium.org>

Provide infrastructure to mseal system mappings. Establish
two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP
macro for future patches.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
---
 include/linux/mm.h | 10 ++++++++++
 init/Kconfig       | 22 ++++++++++++++++++++++
 security/Kconfig   | 21 +++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7b1068ddcbb7..8b800941678d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4155,4 +4155,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st
 int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
 int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
 
+
+/*
+ * mseal of userspace process's system mappings.
+ */
+#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
+#define VM_SEALED_SYSMAP	VM_SEALED
+#else
+#define VM_SEALED_SYSMAP	VM_NONE
+#endif
+
 #endif /* _LINUX_MM_H */
diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..7f67d8942a09 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1882,6 +1882,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
 config ARCH_HAS_MEMBARRIER_SYNC_CORE
 	bool
 
+config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
+	bool
+	help
+	  Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
+
+	  A 64-bit kernel is required for the memory sealing feature.
+	  No specific hardware features from the CPU are needed.
+
+	  To enable this feature, the architecture needs to update their
+	  special mappings calls to include the sealing flag and confirm
+	  that it doesn't unmap/remap system mappings during the life
+	  time of the process. The existence of this flag for an architecture
+	  implies that it does not require the remapping of the system
+	  mappings during process lifetime, so sealing these mappings is safe
+	  from a kernel perspective.
+
+	  After the architecture enables this, a distribution can set
+	  CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
+
+	  For complete descriptions of memory sealing, please see
+	  Documentation/userspace-api/mseal.rst
+
 config HAVE_PERF_EVENTS
 	bool
 	help
diff --git a/security/Kconfig b/security/Kconfig
index f10dbf15c294..a914a02df27e 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
 
 endchoice
 
+config MSEAL_SYSTEM_MAPPINGS
+	bool "mseal system mappings"
+	depends on 64BIT
+	depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
+	depends on !CHECKPOINT_RESTORE
+	help
+	  Apply mseal on system mappings.
+	  The system mappings includes vdso, vvar, vvar_vclock,
+	  vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
+
+	  A 64-bit kernel is required for the memory sealing feature.
+	  No specific hardware features from the CPU are needed.
+
+	  WARNING: This feature breaks programs which rely on relocating
+	  or unmapping system mappings. Known broken software at the time
+	  of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
+	  this config can't be enabled universally.
+
+	  For complete descriptions of memory sealing, please see
+	  Documentation/userspace-api/mseal.rst
+
 config SECURITY
 	bool "Enable different security models"
 	depends on SYSFS
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
  2025-03-05  2:17 ` [PATCH v9 1/7] mseal sysmap: kernel config and header change jeffxu
@ 2025-03-05  2:17 ` jeffxu
  2025-03-05  2:17 ` [PATCH v9 3/7] mseal sysmap: enable x86-64 jeffxu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu, Kees Cook, Liam R. Howlett

From: Jeff Xu <jeffxu@chromium.org>

Add code to detect if the vdso is memory sealed, skip the test
if it is.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 .../testing/selftests/x86/test_mremap_vdso.c  | 43 +++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tools/testing/selftests/x86/test_mremap_vdso.c b/tools/testing/selftests/x86/test_mremap_vdso.c
index d53959e03593..94bee6e0c813 100644
--- a/tools/testing/selftests/x86/test_mremap_vdso.c
+++ b/tools/testing/selftests/x86/test_mremap_vdso.c
@@ -14,6 +14,7 @@
 #include <errno.h>
 #include <unistd.h>
 #include <string.h>
+#include <stdbool.h>
 
 #include <sys/mman.h>
 #include <sys/auxv.h>
@@ -55,13 +56,55 @@ static int try_to_remap(void *vdso_addr, unsigned long size)
 
 }
 
+#define VDSO_NAME "[vdso]"
+#define VMFLAGS "VmFlags:"
+#define MSEAL_FLAGS "sl"
+#define MAX_LINE_LEN 512
+
+bool vdso_sealed(FILE *maps)
+{
+	char line[MAX_LINE_LEN];
+	bool has_vdso = false;
+
+	while (fgets(line, sizeof(line), maps)) {
+		if (strstr(line, VDSO_NAME))
+			has_vdso = true;
+
+		if (has_vdso && !strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
+			if (strstr(line, MSEAL_FLAGS))
+				return true;
+
+			return false;
+		}
+	}
+
+	return false;
+}
+
 int main(int argc, char **argv, char **envp)
 {
 	pid_t child;
+	FILE *maps;
 
 	ksft_print_header();
 	ksft_set_plan(1);
 
+	maps = fopen("/proc/self/smaps", "r");
+	if (!maps) {
+		ksft_test_result_skip(
+			"Could not open /proc/self/smaps, errno=%d\n",
+			 errno);
+
+		return 0;
+	}
+
+	if (vdso_sealed(maps)) {
+		ksft_test_result_skip("vdso is sealed\n");
+		return 0;
+	}
+
+	fclose(maps);
+
 	child = fork();
 	if (child == -1)
 		ksft_exit_fail_msg("failed to fork (%d): %m\n", errno);
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 3/7] mseal sysmap: enable x86-64
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
  2025-03-05  2:17 ` [PATCH v9 1/7] mseal sysmap: kernel config and header change jeffxu
  2025-03-05  2:17 ` [PATCH v9 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed jeffxu
@ 2025-03-05  2:17 ` jeffxu
  2025-03-05  2:17 ` [PATCH v9 4/7] mseal sysmap: enable arm64 jeffxu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu, Liam R. Howlett, Kees Cook

From: Jeff Xu <jeffxu@chromium.org>

Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64,
covering the vdso, vvar, vvar_vclock.

Production release testing passes on Android and Chrome OS.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
---
 arch/x86/Kconfig          | 1 +
 arch/x86/entry/vdso/vma.c | 7 ++++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index be2c311f5118..c6f9ebcbe009 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,6 +26,7 @@ config X86_64
 	depends on 64BIT
 	# Options that are inherently 64-bit kernel only:
 	select ARCH_HAS_GIGANTIC_PAGE
+	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
 	select ARCH_SUPPORTS_PER_VMA_LOCK
 	select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 39e6efc1a9ca..a4f312495de1 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -268,7 +268,8 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
 				       text_start,
 				       image->size,
 				       VM_READ|VM_EXEC|
-				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
+				       VM_SEALED_SYSMAP,
 				       &vdso_mapping);
 
 	if (IS_ERR(vma)) {
@@ -280,7 +281,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
 				       addr,
 				       (__VVAR_PAGES - VDSO_NR_VCLOCK_PAGES) * PAGE_SIZE,
 				       VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
-				       VM_PFNMAP,
+				       VM_PFNMAP|VM_SEALED_SYSMAP,
 				       &vvar_mapping);
 
 	if (IS_ERR(vma)) {
@@ -293,7 +294,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
 				       addr + (__VVAR_PAGES - VDSO_NR_VCLOCK_PAGES) * PAGE_SIZE,
 				       VDSO_NR_VCLOCK_PAGES * PAGE_SIZE,
 				       VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
-				       VM_PFNMAP,
+				       VM_PFNMAP|VM_SEALED_SYSMAP,
 				       &vvar_vclock_mapping);
 
 	if (IS_ERR(vma)) {
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 4/7] mseal sysmap: enable arm64
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
                   ` (2 preceding siblings ...)
  2025-03-05  2:17 ` [PATCH v9 3/7] mseal sysmap: enable x86-64 jeffxu
@ 2025-03-05  2:17 ` jeffxu
  2025-03-05  2:17 ` [PATCH v9 5/7] mseal sysmap: uprobe mapping jeffxu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu, Liam R. Howlett, Kees Cook

From: Jeff Xu <jeffxu@chromium.org>

Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on arm64, covering
the vdso, vvar, and compat-mode vectors and sigpage mappings.

Production release testing passes on Android and Chrome OS.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
---
 arch/arm64/Kconfig       |  1 +
 arch/arm64/kernel/vdso.c | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 940343beb3d4..282d6cb13cfb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -38,6 +38,7 @@ config ARM64
 	select ARCH_HAS_KEEPINITRD
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_MEM_ENCRYPT
+	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
 	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e8ed8e5b713b..69d2b5ceb092 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -198,7 +198,8 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	}
 
 	ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE,
-				       VM_READ|VM_MAYREAD|VM_PFNMAP,
+				       VM_READ|VM_MAYREAD|VM_PFNMAP|
+				       VM_SEALED_SYSMAP,
 				       &vvar_map);
 	if (IS_ERR(ret))
 		goto up_fail;
@@ -210,7 +211,8 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	mm->context.vdso = (void *)vdso_base;
 	ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
 				       VM_READ|VM_EXEC|gp_flags|
-				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
+				       VM_SEALED_SYSMAP,
 				       vdso_info[abi].cm);
 	if (IS_ERR(ret))
 		goto up_fail;
@@ -336,7 +338,8 @@ static int aarch32_kuser_helpers_setup(struct mm_struct *mm)
 	 */
 	ret = _install_special_mapping(mm, AARCH32_VECTORS_BASE, PAGE_SIZE,
 				       VM_READ | VM_EXEC |
-				       VM_MAYREAD | VM_MAYEXEC,
+				       VM_MAYREAD | VM_MAYEXEC |
+				       VM_SEALED_SYSMAP,
 				       &aarch32_vdso_maps[AA32_MAP_VECTORS]);
 
 	return PTR_ERR_OR_ZERO(ret);
@@ -359,7 +362,8 @@ static int aarch32_sigreturn_setup(struct mm_struct *mm)
 	 */
 	ret = _install_special_mapping(mm, addr, PAGE_SIZE,
 				       VM_READ | VM_EXEC | VM_MAYREAD |
-				       VM_MAYWRITE | VM_MAYEXEC,
+				       VM_MAYWRITE | VM_MAYEXEC |
+				       VM_SEALED_SYSMAP,
 				       &aarch32_vdso_maps[AA32_MAP_SIGPAGE]);
 	if (IS_ERR(ret))
 		goto out;
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 5/7] mseal sysmap: uprobe mapping
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
                   ` (3 preceding siblings ...)
  2025-03-05  2:17 ` [PATCH v9 4/7] mseal sysmap: enable arm64 jeffxu
@ 2025-03-05  2:17 ` jeffxu
  2025-03-05  2:17 ` [PATCH v9 6/7] mseal sysmap: update mseal.rst jeffxu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu, Liam R. Howlett, Kees Cook

From: Jeff Xu <jeffxu@chromium.org>

Provide support to mseal the uprobe mapping.

Unlike other system mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime. It could be sealed from creation.

Test was done with perf tool, and observe the uprobe mapping is sealed.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
---
 kernel/events/uprobes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bf2a87a0a378..98632bc47216 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1683,7 +1683,8 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	}
 
 	vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
-				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
+				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|
+				VM_SEALED_SYSMAP,
 				&xol_mapping);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 6/7] mseal sysmap: update mseal.rst
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
                   ` (4 preceding siblings ...)
  2025-03-05  2:17 ` [PATCH v9 5/7] mseal sysmap: uprobe mapping jeffxu
@ 2025-03-05  2:17 ` jeffxu
  2025-03-05  2:17 ` [PATCH v9 7/7] selftest: test system mappings are sealed jeffxu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu, Kees Cook, Liam R. Howlett

From: Jeff Xu <jeffxu@chromium.org>

Update memory sealing documentation to include details about system
mappings.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
 Documentation/userspace-api/mseal.rst | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst
index 41102f74c5e2..56aee46a9307 100644
--- a/Documentation/userspace-api/mseal.rst
+++ b/Documentation/userspace-api/mseal.rst
@@ -130,6 +130,26 @@ Use cases
 
 - Chrome browser: protect some security sensitive data structures.
 
+- System mappings:
+  The system mappings are created by the kernel and includes vdso, vvar,
+  vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
+
+  Those system mappings are readonly only or execute only, memory sealing can
+  protect them from ever changing to writable or unmmap/remapped as different
+  attributes. This is useful to mitigate memory corruption issues where a
+  corrupted pointer is passed to a memory management system.
+
+  If supported by an architecture (CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS),
+  the CONFIG_MSEAL_SYSTEM_MAPPINGS seals all system mappings of this
+  architecture.
+
+  The following architectures currently support this feature: x86-64 and arm64.
+
+  WARNING: This feature breaks programs which rely on relocating
+  or unmapping system mappings. Known broken software at the time
+  of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
+  this config can't be enabled universally.
+
 When not to use mseal
 =====================
 Applications can apply sealing to any virtual memory region from userspace,
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v9 7/7] selftest: test system mappings are sealed.
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
                   ` (5 preceding siblings ...)
  2025-03-05  2:17 ` [PATCH v9 6/7] mseal sysmap: update mseal.rst jeffxu
@ 2025-03-05  2:17 ` jeffxu
  2025-03-05  3:29   ` Kees Cook
       [not found] ` <dfc9d260-ad0d-404d-8b90-fbcc5ce212df@lucifer.local>
  2025-03-05 23:58 ` Andrew Morton
  8 siblings, 1 reply; 12+ messages in thread
From: jeffxu @ 2025-03-05  2:17 UTC (permalink / raw)
  To: akpm, keescook, jannh, torvalds, vbabka, lorenzo.stoakes,
	Liam.Howlett, adhemerval.zanella, oleg, avagin, benjamin
  Cc: linux-kernel, linux-hardening, linux-mm, linux-kselftest,
	jorgelo, sroettger, hch, ojeda, thomas.weissschuh, adobriyan,
	johannes, pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Jeff Xu

From: Jeff Xu <jeffxu@chromium.org>

Add sysmap_is_sealed.c to test system mappings are sealed.

Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in
config file.

Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 tools/testing/selftests/Makefile              |   1 +
 .../mseal_system_mappings/.gitignore          |   2 +
 .../selftests/mseal_system_mappings/Makefile  |   6 +
 .../selftests/mseal_system_mappings/config    |   1 +
 .../mseal_system_mappings/sysmap_is_sealed.c  | 119 ++++++++++++++++++
 5 files changed, 129 insertions(+)
 create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore
 create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile
 create mode 100644 tools/testing/selftests/mseal_system_mappings/config
 create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 8daac70c2f9d..be836be8f03f 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -61,6 +61,7 @@ TARGETS += mount
 TARGETS += mount_setattr
 TARGETS += move_mount_set_group
 TARGETS += mqueue
+TARGETS += mseal_system_mappings
 TARGETS += nci
 TARGETS += net
 TARGETS += net/af_unix
diff --git a/tools/testing/selftests/mseal_system_mappings/.gitignore b/tools/testing/selftests/mseal_system_mappings/.gitignore
new file mode 100644
index 000000000000..319c497a595e
--- /dev/null
+++ b/tools/testing/selftests/mseal_system_mappings/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+sysmap_is_sealed
diff --git a/tools/testing/selftests/mseal_system_mappings/Makefile b/tools/testing/selftests/mseal_system_mappings/Makefile
new file mode 100644
index 000000000000..2b4504e2f52f
--- /dev/null
+++ b/tools/testing/selftests/mseal_system_mappings/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES)
+
+TEST_GEN_PROGS := sysmap_is_sealed
+
+include ../lib.mk
diff --git a/tools/testing/selftests/mseal_system_mappings/config b/tools/testing/selftests/mseal_system_mappings/config
new file mode 100644
index 000000000000..675cb9f37b86
--- /dev/null
+++ b/tools/testing/selftests/mseal_system_mappings/config
@@ -0,0 +1 @@
+CONFIG_MSEAL_SYSTEM_MAPPINGS=y
diff --git a/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c b/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
new file mode 100644
index 000000000000..0d2af30c3bf5
--- /dev/null
+++ b/tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * test system mappings are sealed when
+ * KCONFIG_MSEAL_SYSTEM_MAPPINGS=y
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdbool.h>
+
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+
+#define VMFLAGS "VmFlags:"
+#define MSEAL_FLAGS "sl"
+#define MAX_LINE_LEN 512
+
+bool has_mapping(char *name, FILE *maps)
+{
+	char line[MAX_LINE_LEN];
+
+	while (fgets(line, sizeof(line), maps)) {
+		if (strstr(line, name))
+			return true;
+	}
+
+	return false;
+}
+
+bool mapping_is_sealed(char *name, FILE *maps)
+{
+	char line[MAX_LINE_LEN];
+
+	while (fgets(line, sizeof(line), maps)) {
+		if (!strncmp(line, VMFLAGS, strlen(VMFLAGS))) {
+			if (strstr(line, MSEAL_FLAGS))
+				return true;
+
+			return false;
+		}
+	}
+
+	return false;
+}
+
+FIXTURE(basic) {
+	FILE *maps;
+};
+
+FIXTURE_SETUP(basic)
+{
+	self->maps = fopen("/proc/self/smaps", "r");
+	if (!self->maps)
+		SKIP(return, "Could not open /proc/self/smap, errno=%d",
+			errno);
+};
+
+FIXTURE_TEARDOWN(basic)
+{
+	if (self->maps)
+		fclose(self->maps);
+};
+
+FIXTURE_VARIANT(basic)
+{
+	char *name;
+	bool sealed;
+};
+
+FIXTURE_VARIANT_ADD(basic, vdso) {
+	.name = "[vdso]",
+	.sealed = true,
+};
+
+FIXTURE_VARIANT_ADD(basic, vvar) {
+	.name = "[vvar]",
+	.sealed = true,
+};
+
+FIXTURE_VARIANT_ADD(basic, vvar_vclock) {
+	.name = "[vvar_vclock]",
+	.sealed = true,
+};
+
+FIXTURE_VARIANT_ADD(basic, sigpage) {
+	.name = "[sigpage]",
+	.sealed = true,
+};
+
+FIXTURE_VARIANT_ADD(basic, vectors) {
+	.name = "[vectors]",
+	.sealed = true,
+};
+
+FIXTURE_VARIANT_ADD(basic, uprobes) {
+	.name = "[uprobes]",
+	.sealed = true,
+};
+
+FIXTURE_VARIANT_ADD(basic, stack) {
+	.name = "[stack]",
+	.sealed = false,
+};
+
+TEST_F(basic, check_sealed)
+{
+	if (!has_mapping(variant->name, self->maps)) {
+		SKIP(return, "could not find the mapping, %s",
+			variant->name);
+	}
+
+	EXPECT_EQ(variant->sealed,
+		mapping_is_sealed(variant->name, self->maps));
+};
+
+TEST_HARNESS_MAIN
-- 
2.48.1.711.g2feabab25a-goog



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v9 7/7] selftest: test system mappings are sealed.
  2025-03-05  2:17 ` [PATCH v9 7/7] selftest: test system mappings are sealed jeffxu
@ 2025-03-05  3:29   ` Kees Cook
  0 siblings, 0 replies; 12+ messages in thread
From: Kees Cook @ 2025-03-05  3:29 UTC (permalink / raw)
  To: jeffxu
  Cc: akpm, jannh, torvalds, vbabka, lorenzo.stoakes, Liam.Howlett,
	adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
	linux-hardening, linux-mm, linux-kselftest, jorgelo, sroettger,
	hch, ojeda, thomas.weissschuh, adobriyan, johannes,
	pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport

On Wed, Mar 05, 2025 at 02:17:11AM +0000, jeffxu@chromium.org wrote:
> From: Jeff Xu <jeffxu@chromium.org>
> 
> Add sysmap_is_sealed.c to test system mappings are sealed.
> 
> Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in
> config file.
> 
> Signed-off-by: Jeff Xu <jeffxu@chromium.org>

Great! Thanks for the negative test addition. :)

Reviewed-by: Kees Cook <kees@kernel.org>

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v9 0/7] mseal system mappings
       [not found] ` <dfc9d260-ad0d-404d-8b90-fbcc5ce212df@lucifer.local>
@ 2025-03-05 15:29   ` Jeff Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Xu @ 2025-03-05 15:29 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
	adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
	linux-hardening, linux-mm, linux-kselftest, jorgelo, sroettger,
	hch, ojeda, thomas.weissschuh, adobriyan, johannes,
	pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport

On Tue, Mar 4, 2025 at 9:51 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Mar 05, 2025 at 02:17:04AM +0000, jeffxu@chromium.org wrote:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > This is V9 version, addressing comments from V8, without code logic
> > change.
> >
> > -------------------------------------------------------------------
> > As discussed during mseal() upstream process [1], mseal() protects
> > the VMAs of a given virtual memory range against modifications, such
> > as the read/write (RW) and no-execute (NX) bits. For complete
> > descriptions of memory sealing, please see mseal.rst [2].
> >
> > The mseal() is useful to mitigate memory corruption issues where a
> > corrupted pointer is passed to a memory management system. For
> > example, such an attacker primitive can break control-flow integrity
> > guarantees since read-only memory that is supposed to be trusted can
> > become writable or .text pages can get remapped.
> >
> > The system mappings are readonly only, memory sealing can protect
> > them from ever changing to writable or unmmap/remapped as different
> > attributes.
> >
> > System mappings such as vdso, vvar, vvar_vclock,
> > vectors (arm compat-mode), sigpage (arm compat-mode),
> > are created by the kernel during program initialization, and could
> > be sealed after creation.
> >
> > Unlike the aforementioned mappings, the uprobe mapping is not
> > established during program startup. However, its lifetime is the same
> > as the process's lifetime [3]. It could be sealed from creation.
> >
> > The vsyscall on x86-64 uses a special address (0xffffffffff600000),
> > which is outside the mm managed range. This means mprotect, munmap, and
> > mremap won't work on the vsyscall. Since sealing doesn't enhance
> > the vsyscall's security, it is skipped in this patch. If we ever seal
> > the vsyscall, it is probably only for decorative purpose, i.e. showing
> > the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored.
> >
> > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > alter the system mappings during restore operations. UML(User Mode Linux)
> > and gVisor, rr are also known to change the vdso/vvar mappings.
> > Consequently, this feature cannot be universally enabled across all
> > systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
> >
> > To support mseal of system mappings, architectures must define
> > CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
> > mappings calls to pass mseal flag. Additionally, architectures must
> > confirm they do not unmap/remap system mappings during the process
> > lifetime. The existence of this flag for an architecture implies that
> > it does not require the remapping of thest system mappings during
> > process lifetime, so sealing these mappings is safe from a kernel
> > perspective.
> >
> > This version covers x86-64 and arm64 archiecture as minimum viable feature.
> >
> > While no specific CPU hardware features are required for enable this
> > feature on an archiecture, memory sealing requires a 64-bit kernel. Other
> > architectures can choose whether or not to adopt this feature. Currently,
> > I'm not aware of any instances in the kernel code that actively
> > munmap/mremap a system mapping without a request from userspace. The PPC
> > does call munmap when _install_special_mapping fails for vdso; however,
> > it's uncertain if this will ever fail for PPC - this needs to be
> > investigated by PPC in the future [4]. The UML kernel can add this support
> > when KUnit tests require it [5].
> >
> > In this version, we've improved the handling of system mapping sealing from
> > previous versions, instead of modifying the _install_special_mapping
> > function itself, which would affect all architectures, we now call
> > _install_special_mapping with a sealing flag only within the specific
> > architecture that requires it. This targeted approach offers two key
> > advantages: 1) It limits the code change's impact to the necessary
> > architectures, and 2) It aligns with the software architecture by keeping
> > the core memory management within the mm layer, while delegating the
> > decision of sealing system mappings to the individual architecture, which
> > is particularly relevant since 32-bit architectures never require sealing.
> >
> > Prior to this patch series, we explored sealing special mappings from
> > userspace using glibc's dynamic linker. This approach revealed several
> > issues:
> > - The PT_LOAD header may report an incorrect length for vdso, (smaller
> >   than its actual size). The dynamic linker, which relies on PT_LOAD
> >   information to determine mapping size, would then split and partially
> >   seal the vdso mapping. Since each architecture has its own vdso/vvar
> >   code, fixing this in the kernel would require going through each
> >   archiecture. Our initial goal was to enable sealing readonly mappings,
> >   e.g. .text, across all architectures, sealing vdso from kernel since
> >   creation appears to be simpler than sealing vdso at glibc.
> > - The [vvar] mapping header only contains address information, not length
> >   information. Similar issues might exist for other special mappings.
> > - Mappings like uprobe are not covered by the dynamic linker,
> >   and there is no effective solution for them.
> >
> > This feature's security enhancements will benefit ChromeOS, Android,
> > and other high security systems.
> >
> > Testing:
> > This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
> > - Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
> >   i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
> > - Passing various automation tests (e.g. pre-checkin) on ChromeOS and
> >   Android to ensure the sealing doesn't affect the functionality of
> >   Chromebook and Android phone.
> >
> > I also tested the feature on Ubuntu on x86-64:
> > - With config disabled, vdso/vvar is not sealed,
> > - with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
> >   normal operations such as browsing the web, open/edit doc are OK.
> >
> > Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
> > Link: Documentation/userspace-api/mseal.rst [2]
> > Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
> > Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4]
> > Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
> >
> > -------------------------------------------
> > History:
> >
> > V9:
> >  - Add negative test in selftest (Kees Cook)
> >  - fx typos in text (Kees Cook)
>
> You have a bad habit of missing stuff off these logs. Usually I don't
> comment, as it's trivial, but while we're here :)
>
> Please try to keep an accurate log of changes requested so you can populate
> these properly.
>
> Obviously this is not going to block anything. But for future reference...
>
>   - Add selftest to main selftest Makefile (Lorenzo Stoakes)
>
> >
> > V8:
>
> Nit, but no lore link?
https://lore.kernel.org/all/20250303050921.3033083-1-jeffxu@google.com/

Thanks for noticing this.

>
> >   - Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett)
> >   - Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett)
> >   - Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes)
> >   - Remove "vm_flags =" (Kees Cook, Liam R. Howlett,  Oleg Nesterov)
> >   - Drop uml architecture (Lorenzo Stoakes, Kees Cook)
> >   - Add a selftest to verify system mappings are sealed (Lorenzo Stoakes)
> >
> > V7:
> >   https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@google.com/
> >   - Remove cover letter from the first patch (Liam R. Howlett)
> >   - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett)
> >   - logging and fclose() in selftest (Liam R. Howlett)
> >
> > V6:
> >   https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@google.com/
> >   - mseal.rst: fix a typo (Randy Dunlap)
> >   - security/Kconfig: add rr into note (Liam R. Howlett)
> >   - remove mseal_system_mappings() and use macro instead (Liam R. Howlett)
> >   - mseal.rst: add incompatible userland software (Lorenzo Stoakes)
> >   - remove RFC from title (Kees Cook)
> >
> > V5
> >   https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@google.com/
> >   - Remove kernel cmd line (Lorenzo Stoakes)
> >   - Add test info (Lorenzo Stoakes)
> >   - Add threat model info (Lorenzo Stoakes)
> >   - Fix x86 selftest: test_mremap_vdso
> >   - Restrict code change to ARM64/x86-64/UM arch only.
> >   - Add userprocess.h to include seal_system_mapping().
> >   - Remove sealing vsyscall.
> >   - Split the patch.
> >
> > V4:
> >   https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@google.com/
> >   - ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes)
> >   - test info (Lorenzo Stoakes)
> >   - Update  mseal.rst (Liam R. Howlett)
> >   - Update test_mremap_vdso.c (Liam R. Howlett)
> >   - Misc. style, comments, doc update (Liam R. Howlett)
> >
> > V3:
> >   https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@google.com/
> >   - Revert uprobe to v1 logic (Oleg Nesterov)
> >   - use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook)
> >   - Move kernel cmd line from fs/exec.c to mm/mseal.c and
> >     misc. (Liam R. Howlett)
> >
> > V2:
> >   https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@google.com/
> >   - Seal uprobe always (Oleg Nesterov)
> >   - Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
> >   - Rebase to linux_main
> >
> > V1:
> >  - https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
> >
> > --------------------------------------------------
> >
> >
> >
> > Jeff Xu (7):
> >   mseal sysmap: kernel config and header change
> >   selftests: x86: test_mremap_vdso: skip if vdso is msealed
> >   mseal sysmap: enable x86-64
> >   mseal sysmap: enable arm64
> >   mseal sysmap: uprobe mapping
> >   mseal sysmap: update mseal.rst
> >   selftest: test system mappings are sealed.
> >
> >  Documentation/userspace-api/mseal.rst         |  20 +++
> >  arch/arm64/Kconfig                            |   1 +
> >  arch/arm64/kernel/vdso.c                      |  12 +-
> >  arch/x86/Kconfig                              |   1 +
> >  arch/x86/entry/vdso/vma.c                     |   7 +-
> >  include/linux/mm.h                            |  10 ++
> >  init/Kconfig                                  |  22 ++++
> >  kernel/events/uprobes.c                       |   3 +-
> >  security/Kconfig                              |  21 ++++
> >  tools/testing/selftests/Makefile              |   1 +
> >  .../mseal_system_mappings/.gitignore          |   2 +
> >  .../selftests/mseal_system_mappings/Makefile  |   6 +
> >  .../selftests/mseal_system_mappings/config    |   1 +
> >  .../mseal_system_mappings/sysmap_is_sealed.c  | 119 ++++++++++++++++++
> >  .../testing/selftests/x86/test_mremap_vdso.c  |  43 +++++++
> >  15 files changed, 261 insertions(+), 8 deletions(-)
> >  create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore
> >  create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile
> >  create mode 100644 tools/testing/selftests/mseal_system_mappings/config
> >  create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c
> >
> > --
> > 2.48.1.711.g2feabab25a-goog
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v9 1/7] mseal sysmap: kernel config and header change
       [not found]     ` <544138c0-5668-4a6b-9160-59da95b990f6@lucifer.local>
@ 2025-03-05 15:32       ` Jeff Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Xu @ 2025-03-05 15:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, keescook, jannh, torvalds, vbabka, Liam.Howlett,
	adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
	linux-hardening, linux-mm, linux-kselftest, jorgelo, sroettger,
	hch, ojeda, thomas.weissschuh, adobriyan, johannes,
	pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport,
	Kees Cook

On Tue, Mar 4, 2025 at 9:57 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Mar 05, 2025 at 05:54:24AM +0000, Lorenzo Stoakes wrote:
> > On Wed, Mar 05, 2025 at 02:17:05AM +0000, jeffxu@chromium.org wrote:
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > Provide infrastructure to mseal system mappings. Establish
> > > two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
> > > ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP
> > > macro for future patches.
> > >
> > > Signed-off-by: Jeff Xu <jeffxu@chromium.org>
> > > Reviewed-by: Kees Cook <kees@kernel.org>
> >
> > Umm... I reviewed this too? :) unless you made substantial changes here
> > (doesn't appear so), please do propagate tags for each revision :>)
> >
> > Anyway, FWIW:
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
>
> (you also forgot to propagate Liam's tag here)
>
Sorry about that, I missed  "Reviewed-by" from you and Liam's from V8 [1] [2]
[1] https://lore.kernel.org/all/maamck3gjqjikefwlubtzg4ymaa6vh47hlxqqn4v23gqwl2tli@t372meccgycq/
[2] https://lore.kernel.org/all/0ea20f84-bd66-4180-aa04-0f66ce91bdf6@lucifer.local/

Thanks


> > > ---
> > >  include/linux/mm.h | 10 ++++++++++
> > >  init/Kconfig       | 22 ++++++++++++++++++++++
> > >  security/Kconfig   | 21 +++++++++++++++++++++
> > >  3 files changed, 53 insertions(+)
> > >
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 7b1068ddcbb7..8b800941678d 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -4155,4 +4155,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st
> > >  int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
> > >  int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
> > >
> > > +
> > > +/*
> > > + * mseal of userspace process's system mappings.
> > > + */
> > > +#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
> > > +#define VM_SEALED_SYSMAP   VM_SEALED
> > > +#else
> > > +#define VM_SEALED_SYSMAP   VM_NONE
> > > +#endif
> > > +
> > >  #endif /* _LINUX_MM_H */
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index d0d021b3fa3b..7f67d8942a09 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1882,6 +1882,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
> > >  config ARCH_HAS_MEMBARRIER_SYNC_CORE
> > >     bool
> > >
> > > +config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
> > > +   bool
> > > +   help
> > > +     Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
> > > +
> > > +     A 64-bit kernel is required for the memory sealing feature.
> > > +     No specific hardware features from the CPU are needed.
> > > +
> > > +     To enable this feature, the architecture needs to update their
> > > +     special mappings calls to include the sealing flag and confirm
> > > +     that it doesn't unmap/remap system mappings during the life
> > > +     time of the process. The existence of this flag for an architecture
> > > +     implies that it does not require the remapping of the system
> > > +     mappings during process lifetime, so sealing these mappings is safe
> > > +     from a kernel perspective.
> > > +
> > > +     After the architecture enables this, a distribution can set
> > > +     CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
> > > +
> > > +     For complete descriptions of memory sealing, please see
> > > +     Documentation/userspace-api/mseal.rst
> > > +
> > >  config HAVE_PERF_EVENTS
> > >     bool
> > >     help
> > > diff --git a/security/Kconfig b/security/Kconfig
> > > index f10dbf15c294..a914a02df27e 100644
> > > --- a/security/Kconfig
> > > +++ b/security/Kconfig
> > > @@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
> > >
> > >  endchoice
> > >
> > > +config MSEAL_SYSTEM_MAPPINGS
> > > +   bool "mseal system mappings"
> > > +   depends on 64BIT
> > > +   depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
> > > +   depends on !CHECKPOINT_RESTORE
> > > +   help
> > > +     Apply mseal on system mappings.
> > > +     The system mappings includes vdso, vvar, vvar_vclock,
> > > +     vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
> > > +
> > > +     A 64-bit kernel is required for the memory sealing feature.
> > > +     No specific hardware features from the CPU are needed.
> > > +
> > > +     WARNING: This feature breaks programs which rely on relocating
> > > +     or unmapping system mappings. Known broken software at the time
> > > +     of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
> > > +     this config can't be enabled universally.
> > > +
> > > +     For complete descriptions of memory sealing, please see
> > > +     Documentation/userspace-api/mseal.rst
> > > +
> > >  config SECURITY
> > >     bool "Enable different security models"
> > >     depends on SYSFS
> > > --
> > > 2.48.1.711.g2feabab25a-goog
> > >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v9 0/7] mseal system mappings
  2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
                   ` (7 preceding siblings ...)
       [not found] ` <dfc9d260-ad0d-404d-8b90-fbcc5ce212df@lucifer.local>
@ 2025-03-05 23:58 ` Andrew Morton
  8 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2025-03-05 23:58 UTC (permalink / raw)
  To: jeffxu
  Cc: keescook, jannh, torvalds, vbabka, lorenzo.stoakes, Liam.Howlett,
	adhemerval.zanella, oleg, avagin, benjamin, linux-kernel,
	linux-hardening, linux-mm, linux-kselftest, jorgelo, sroettger,
	hch, ojeda, thomas.weissschuh, adobriyan, johannes,
	pedro.falcato, hca, willy, anna-maria, mark.rutland,
	linus.walleij, Jason, deller, rdunlap, davem, peterx, f.fainelli,
	gerg, dave.hansen, mingo, ardb, mhocko, 42.hyeyoo, peterz, ardb,
	enh, rientjes, groeck, mpe, aleksandr.mikhalitsyn, mike.rapoport

On Wed,  5 Mar 2025 02:17:04 +0000 jeffxu@chromium.org wrote:

> This is V9 version, addressing comments from V8, without code logic
> change.

I added this series to mm.git, thanks all.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-03-05 23:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-05  2:17 [PATCH v9 0/7] mseal system mappings jeffxu
2025-03-05  2:17 ` [PATCH v9 1/7] mseal sysmap: kernel config and header change jeffxu
     [not found]   ` <2a42ac63-d7a2-48ae-ae86-568d0fc59d51@lucifer.local>
     [not found]     ` <544138c0-5668-4a6b-9160-59da95b990f6@lucifer.local>
2025-03-05 15:32       ` Jeff Xu
2025-03-05  2:17 ` [PATCH v9 2/7] selftests: x86: test_mremap_vdso: skip if vdso is msealed jeffxu
2025-03-05  2:17 ` [PATCH v9 3/7] mseal sysmap: enable x86-64 jeffxu
2025-03-05  2:17 ` [PATCH v9 4/7] mseal sysmap: enable arm64 jeffxu
2025-03-05  2:17 ` [PATCH v9 5/7] mseal sysmap: uprobe mapping jeffxu
2025-03-05  2:17 ` [PATCH v9 6/7] mseal sysmap: update mseal.rst jeffxu
2025-03-05  2:17 ` [PATCH v9 7/7] selftest: test system mappings are sealed jeffxu
2025-03-05  3:29   ` Kees Cook
     [not found] ` <dfc9d260-ad0d-404d-8b90-fbcc5ce212df@lucifer.local>
2025-03-05 15:29   ` [PATCH v9 0/7] mseal system mappings Jeff Xu
2025-03-05 23:58 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox