linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] add support for mm-local memory allocations
@ 2024-06-21 20:14 Roman Kagan
  2024-06-21 20:14 ` [PATCH RFC 1/3] mseal: expose interface to seal / unseal user memory ranges Roman Kagan
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Roman Kagan @ 2024-06-21 20:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shuah Khan, Dragan Cvetic, Fares Mehanna, Alexander Graf,
	Derek Kiernan, linux-kselftest, nh-open-source,
	Greg Kroah-Hartman, linux-mm, David Woodhouse, Andrew Morton,
	Arnd Bergmann

In a series posted a few years ago [1], a proposal was put forward to allow the
kernel to allocate memory local to a mm and thus push it out of reach for
current and future speculation-based cross-process attacks.  We still believe
this is a nice thing to have.

However, in the time passed since that post Linux mm has grown quite a few new
goodies, so we'd like to explore possibilities to implement this functionality
with less effort and churn leveraging the now available facilities.

Specifically, this is a proof-of-concept attempt to implement mm-local
allocations piggy-backing on memfd_secret(), using regular user addressess but
pinning the pages and flipping the user/supervisor flag on the respective PTEs
to make them directly accessible from kernel, and sealing the VMA to prevent
userland from taking over the address range.  The approach allowed to delegate
all the heavy lifting -- address management, interactions with the direct map,
cleanup on mm teardown -- to the existing infrastructure, and required zero
architecture-specific code.

Compared to the approach used in the orignal series, where a dedicated kernel
address range and thus a dedicated PGD was used for mm-local allocations, the
one proposed here may have certain drawbacks, in particular

- using user addresses for kernel memory may violate assumptions in various
  parts of kernel code which we may not have identified with smoke tests we did

- the allocated addresses are guessable by the userland (ATM they are even
  visible in /proc/PID/maps but that's fixable) which may weaken the security
  posture

Also included is a simple test driver and selftest to smoke test and showcase
the feature.

The code is PoC RFC and lacks a lot of checks and special case handling, but
demonstrates the idea.  We'd appreciate any feedback on whether it's a viable
approach or it should better be abandoned in favor of the one with dedicated
PGD / kernel address range or yet something else.

[1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/

Fares Mehanna (2):
  mseal: expose interface to seal / unseal user memory ranges
  mm/secretmem: implement mm-local kernel allocations

Roman Kagan (1):
  drivers/misc: add test driver and selftest for proclocal allocator

 drivers/misc/Makefile                         |   1 +
 tools/testing/selftests/proclocal/Makefile    |   6 +
 include/linux/secretmem.h                     |   8 +
 mm/internal.h                                 |   7 +
 drivers/misc/proclocal-test.c                 | 200 +++++++++++++++++
 mm/gup.c                                      |   4 +-
 mm/mseal.c                                    |  81 ++++---
 mm/secretmem.c                                | 208 ++++++++++++++++++
 .../selftests/proclocal/proclocal-test.c      | 150 +++++++++++++
 drivers/misc/Kconfig                          |  15 ++
 tools/testing/selftests/proclocal/.gitignore  |   1 +
 11 files changed, 649 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/proclocal/Makefile
 create mode 100644 drivers/misc/proclocal-test.c
 create mode 100644 tools/testing/selftests/proclocal/proclocal-test.c
 create mode 100644 tools/testing/selftests/proclocal/.gitignore

-- 
2.34.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/3] mseal: expose interface to seal / unseal user memory ranges
  2024-06-21 20:14 [PATCH RFC 0/3] add support for mm-local memory allocations Roman Kagan
@ 2024-06-21 20:14 ` Roman Kagan
  2024-06-21 20:15 ` [PATCH RFC 2/3] mm/secretmem: implement mm-local kernel allocations Roman Kagan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Roman Kagan @ 2024-06-21 20:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shuah Khan, Dragan Cvetic, Fares Mehanna, Alexander Graf,
	Derek Kiernan, linux-kselftest, nh-open-source,
	Greg Kroah-Hartman, linux-mm, David Woodhouse, Andrew Morton,
	Arnd Bergmann

From: Fares Mehanna <faresx@amazon.de>

To make sure the kernel mm-local mapping is untouched by the user, we will seal
the VMA before changing the protection to be used by the kernel.

This will guarantee that userspace can't unmap or alter this VMA while it is
being used by the kernel.

After the kernel is done with the secret memory, it will unseal the VMA to be
able to unmap and free it.

Unseal operation is not exposed to userspace.

Signed-off-by: Fares Mehanna <faresx@amazon.de>
Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
 mm/internal.h |  7 +++++
 mm/mseal.c    | 81 ++++++++++++++++++++++++++++++++-------------------
 2 files changed, 58 insertions(+), 30 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index b2c75b12014e..5278989610f5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1453,6 +1453,8 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start,
 		unsigned long end);
 bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
 		unsigned long end, int behavior);
+/* mm's mmap write lock must be taken before seal/unseal operation */
+int do_mseal(unsigned long start, unsigned long end, bool seal);
 #else
 static inline int can_do_mseal(unsigned long flags)
 {
@@ -1470,6 +1472,11 @@ static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
 {
 	return true;
 }
+
+static inline int do_mseal(unsigned long start, unsigned long end, bool seal)
+{
+	return -EINVAL;
+}
 #endif
 
 #ifdef CONFIG_SHRINKER_DEBUG
diff --git a/mm/mseal.c b/mm/mseal.c
index bf783bba8ed0..331745ac7064 100644
--- a/mm/mseal.c
+++ b/mm/mseal.c
@@ -26,6 +26,11 @@ static inline void set_vma_sealed(struct vm_area_struct *vma)
 	vm_flags_set(vma, VM_SEALED);
 }
 
+static inline void clear_vma_sealed(struct vm_area_struct *vma)
+{
+	vm_flags_clear(vma, VM_SEALED);
+}
+
 /*
  * check if a vma is sealed for modification.
  * return true, if modification is allowed.
@@ -109,7 +114,7 @@ bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long
 
 static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		struct vm_area_struct **prev, unsigned long start,
-		unsigned long end, vm_flags_t newflags)
+		unsigned long end, vm_flags_t newflags, bool seal)
 {
 	int ret = 0;
 	vm_flags_t oldflags = vma->vm_flags;
@@ -123,7 +128,10 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		goto out;
 	}
 
-	set_vma_sealed(vma);
+	if (seal)
+		set_vma_sealed(vma);
+	else
+		clear_vma_sealed(vma);
 out:
 	*prev = vma;
 	return ret;
@@ -159,9 +167,9 @@ static int check_mm_seal(unsigned long start, unsigned long end)
 }
 
 /*
- * Apply sealing.
+ * Apply sealing / unsealing.
  */
-static int apply_mm_seal(unsigned long start, unsigned long end)
+static int apply_mm_seal(unsigned long start, unsigned long end, bool seal)
 {
 	unsigned long nstart;
 	struct vm_area_struct *vma, *prev;
@@ -183,11 +191,14 @@ static int apply_mm_seal(unsigned long start, unsigned long end)
 		unsigned long tmp;
 		vm_flags_t newflags;
 
-		newflags = vma->vm_flags | VM_SEALED;
+		if (seal)
+			newflags = vma->vm_flags | VM_SEALED;
+		else
+			newflags = vma->vm_flags & ~(VM_SEALED);
 		tmp = vma->vm_end;
 		if (tmp > end)
 			tmp = end;
-		error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
+		error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags, seal);
 		if (error)
 			return error;
 		nstart = vma_iter_end(&vmi);
@@ -196,6 +207,37 @@ static int apply_mm_seal(unsigned long start, unsigned long end)
 	return 0;
 }
 
+int do_mseal(unsigned long start, unsigned long end, bool seal)
+{
+	int ret;
+
+	if (end < start)
+		return -EINVAL;
+
+	if (end == start)
+		return 0;
+
+	/*
+	 * First pass, this helps to avoid
+	 * partial sealing in case of error in input address range,
+	 * e.g. ENOMEM error.
+	 */
+	ret = check_mm_seal(start, end);
+	if (ret)
+		goto out;
+
+	/*
+	 * Second pass, this should success, unless there are errors
+	 * from vma_modify_flags, e.g. merge/split error, or process
+	 * reaching the max supported VMAs, however, those cases shall
+	 * be rare.
+	 */
+	ret = apply_mm_seal(start, end, seal);
+
+out:
+	return ret;
+}
+
 /*
  * mseal(2) seals the VM's meta data from
  * selected syscalls.
@@ -248,7 +290,7 @@ static int apply_mm_seal(unsigned long start, unsigned long end)
  *
  *  unseal() is not supported.
  */
-static int do_mseal(unsigned long start, size_t len_in, unsigned long flags)
+static int __do_mseal(unsigned long start, size_t len_in, unsigned long flags)
 {
 	size_t len;
 	int ret = 0;
@@ -269,33 +311,12 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long flags)
 		return -EINVAL;
 
 	end = start + len;
-	if (end < start)
-		return -EINVAL;
-
-	if (end == start)
-		return 0;
 
 	if (mmap_write_lock_killable(mm))
 		return -EINTR;
 
-	/*
-	 * First pass, this helps to avoid
-	 * partial sealing in case of error in input address range,
-	 * e.g. ENOMEM error.
-	 */
-	ret = check_mm_seal(start, end);
-	if (ret)
-		goto out;
+	ret = do_mseal(start, end, true);
 
-	/*
-	 * Second pass, this should success, unless there are errors
-	 * from vma_modify_flags, e.g. merge/split error, or process
-	 * reaching the max supported VMAs, however, those cases shall
-	 * be rare.
-	 */
-	ret = apply_mm_seal(start, end);
-
-out:
 	mmap_write_unlock(current->mm);
 	return ret;
 }
@@ -303,5 +324,5 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long flags)
 SYSCALL_DEFINE3(mseal, unsigned long, start, size_t, len, unsigned long,
 		flags)
 {
-	return do_mseal(start, len, flags);
+	return __do_mseal(start, len, flags);
 }
-- 
2.34.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/3] mm/secretmem: implement mm-local kernel allocations
  2024-06-21 20:14 [PATCH RFC 0/3] add support for mm-local memory allocations Roman Kagan
  2024-06-21 20:14 ` [PATCH RFC 1/3] mseal: expose interface to seal / unseal user memory ranges Roman Kagan
@ 2024-06-21 20:15 ` Roman Kagan
  2024-06-21 20:15 ` [PATCH RFC 3/3] drivers/misc: add test driver and selftest for proclocal allocator Roman Kagan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Roman Kagan @ 2024-06-21 20:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shuah Khan, Dragan Cvetic, Fares Mehanna, Alexander Graf,
	Derek Kiernan, linux-kselftest, nh-open-source,
	Greg Kroah-Hartman, linux-mm, David Woodhouse, Andrew Morton,
	Arnd Bergmann

From: Fares Mehanna <faresx@amazon.de>

In order to be resilient against cross-process speculation-based
attacks, it makes sense to store certain (secret) items in kernel memory
local to mm.

Implement such allocations on top of secretmem infrastructure.

Specifically, on allocate

1. Create secretmem file.
2. To distinguish it from the conventional memfd_secret()-created one
   and to maintain associated mm-local allocation context, put the
   latter on ->private_data of the file.
3. Create virtual mapping in user virtual address space using mmap().
4. Seal the virtual mapping to disallow the user from affecting it in
   any way.
5. Fault the pages in, effectively calling secretmem fault handler to
   remove the pages from kernel linear address and make them local to
   process mm.
6. Change the PTE from user mode to kernel mode, any access from
   userspace will result in segmentation fault. Kernel can access this
   virtual address now.
7. Return the secure area as a struct containing the pointer to the
   actual memory and providing the context for the release function
   later.

On release,

- if called while mm is still in use, remove the mapping
- otherwise, if performed at mm teardown, no unmapping is necessary

The rest is taken care of by secretmem file cleanup, including returning
the pages to the kernel direct map.

Signed-off-by: Fares Mehanna <faresx@amazon.de>
Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
 include/linux/secretmem.h |   8 ++
 mm/gup.c                  |   4 +-
 mm/secretmem.c            | 208 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 218 insertions(+), 2 deletions(-)

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index e918f96881f5..c6d65b5a89bd 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -4,6 +4,10 @@
 
 #ifdef CONFIG_SECRETMEM
 
+struct secretmem_area {
+	void *ptr;
+};
+
 extern const struct address_space_operations secretmem_aops;
 
 static inline bool secretmem_mapping(struct address_space *mapping)
@@ -12,8 +16,12 @@ static inline bool secretmem_mapping(struct address_space *mapping)
 }
 
 bool vma_is_secretmem(struct vm_area_struct *vma);
+bool can_access_secretmem_vma(struct vm_area_struct *vma);
 bool secretmem_active(void);
 
+struct secretmem_area *secretmem_allocate_pages(unsigned int order);
+void secretmem_release_pages(struct secretmem_area *data);
+
 #else
 
 static inline bool vma_is_secretmem(struct vm_area_struct *vma)
diff --git a/mm/gup.c b/mm/gup.c
index ca0f5cedce9b..ccf28b7baab9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1172,7 +1172,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 	struct follow_page_context ctx = { NULL };
 	struct page *page;
 
-	if (vma_is_secretmem(vma))
+	if (!can_access_secretmem_vma(vma))
 		return NULL;
 
 	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
@@ -1377,7 +1377,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 	if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
 		return -EOPNOTSUPP;
 
-	if (vma_is_secretmem(vma))
+	if (!can_access_secretmem_vma(vma))
 		return -EFAULT;
 
 	if (write) {
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 3afb5ad701e1..de6860a0cb1b 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -13,13 +13,17 @@
 #include <linux/bitops.h>
 #include <linux/printk.h>
 #include <linux/pagemap.h>
+#include <linux/hugetlb.h>
 #include <linux/syscalls.h>
 #include <linux/pseudo_fs.h>
 #include <linux/secretmem.h>
 #include <linux/set_memory.h>
 #include <linux/sched/signal.h>
+#include <linux/sched/mm.h>
 
+#include <uapi/asm-generic/mman-common.h>
 #include <uapi/linux/magic.h>
+#include <uapi/linux/mman.h>
 
 #include <asm/tlbflush.h>
 
@@ -42,6 +46,15 @@ MODULE_PARM_DESC(secretmem_enable,
 
 static atomic_t secretmem_users;
 
+/* secretmem file private context */
+struct secretmem_ctx {
+	struct secretmem_area _area;
+	struct page **_pages;
+	unsigned long _nr_pages;
+	struct file *_file;
+	struct mm_struct *_mm;
+};
+
 bool secretmem_active(void)
 {
 	return !!atomic_read(&secretmem_users);
@@ -116,6 +129,7 @@ static const struct vm_operations_struct secretmem_vm_ops = {
 
 static int secretmem_release(struct inode *inode, struct file *file)
 {
+	kfree(file->private_data);
 	atomic_dec(&secretmem_users);
 	return 0;
 }
@@ -123,13 +137,23 @@ static int secretmem_release(struct inode *inode, struct file *file)
 static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	unsigned long len = vma->vm_end - vma->vm_start;
+	struct secretmem_ctx *ctx = file->private_data;
+	unsigned long kernel_no_permissions;
+
+	kernel_no_permissions = (VM_READ | VM_WRITE | VM_EXEC | VM_MAYEXEC);
 
 	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
 		return -EINVAL;
 
+	if (ctx && (vma->vm_flags & kernel_no_permissions))
+		return -EINVAL;
+
 	if (!mlock_future_ok(vma->vm_mm, vma->vm_flags | VM_LOCKED, len))
 		return -EAGAIN;
 
+	if (ctx)
+		vm_flags_set(vma, VM_MIXEDMAP);
+
 	vm_flags_set(vma, VM_LOCKED | VM_DONTDUMP);
 	vma->vm_ops = &secretmem_vm_ops;
 
@@ -141,6 +165,24 @@ bool vma_is_secretmem(struct vm_area_struct *vma)
 	return vma->vm_ops == &secretmem_vm_ops;
 }
 
+bool can_access_secretmem_vma(struct vm_area_struct *vma)
+{
+	struct secretmem_ctx *ctx;
+
+	if (!vma_is_secretmem(vma))
+		return true;
+
+	/*
+	 * If VMA is owned by running process, and marked for kernel
+	 * usage, then allow access.
+	 */
+	ctx = vma->vm_file->private_data;
+	if (ctx && current->mm == vma->vm_mm)
+		return true;
+
+	return false;
+}
+
 static const struct file_operations secretmem_fops = {
 	.release	= secretmem_release,
 	.mmap		= secretmem_mmap,
@@ -230,6 +272,172 @@ static struct file *secretmem_file_create(unsigned long flags)
 	return file;
 }
 
+struct secretmem_area *secretmem_allocate_pages(unsigned int order)
+{
+	unsigned long uvaddr, uvaddr_inc, unused, nr_pages, bytes_length;
+	struct file *kernel_secfile;
+	struct vm_area_struct *vma;
+	struct secretmem_ctx *ctx;
+	struct page **sec_pages;
+	struct mm_struct *mm;
+	long nr_pinned_pages;
+	pte_t pte, old_pte;
+	spinlock_t *ptl;
+	pte_t *upte;
+	int rc;
+
+	nr_pages = (1 << order);
+	bytes_length = nr_pages * PAGE_SIZE;
+	mm = current->mm;
+
+	if (!mm || !mmget_not_zero(mm))
+		return NULL;
+
+	/* Create secret memory file / truncate it */
+	kernel_secfile = secretmem_file_create(0);
+	if (IS_ERR(kernel_secfile))
+		goto put_mm;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (IS_ERR(ctx))
+		goto close_secfile;
+	kernel_secfile->private_data = ctx;
+
+	rc = do_truncate(file_mnt_idmap(kernel_secfile),
+			 file_dentry(kernel_secfile), bytes_length, 0, NULL);
+	if (rc)
+		goto close_secfile;
+
+	if (mmap_write_lock_killable(mm))
+		goto close_secfile;
+
+	/* Map pages to the secretmem file */
+	uvaddr = do_mmap(kernel_secfile, 0, bytes_length, PROT_NONE,
+			 MAP_SHARED, 0, 0, &unused, NULL);
+	if (IS_ERR_VALUE(uvaddr))
+		goto unlock_mmap;
+
+	/* mseal() the VMA to make sure it won't change */
+	rc = do_mseal(uvaddr, uvaddr + bytes_length, true);
+	if (rc)
+		goto unmap_pages;
+
+	/* Make sure VMA is there, and is kernel-secure */
+	vma = find_vma(current->mm, uvaddr);
+	if (!vma)
+		goto unseal_vma;
+
+	if (!vma_is_secretmem(vma) ||
+	    !can_access_secretmem_vma(vma))
+		goto unseal_vma;
+
+	/* Pin user pages; fault them in */
+	sec_pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
+	if (!sec_pages)
+		goto unseal_vma;
+
+	nr_pinned_pages = pin_user_pages(uvaddr, nr_pages, FOLL_FORCE | FOLL_LONGTERM, sec_pages);
+	if (nr_pinned_pages < 0)
+		goto free_sec_pages;
+	if (nr_pinned_pages != nr_pages)
+		goto unpin_pages;
+
+	/* Modify the existing mapping to be kernel accessible, local to this process mm */
+	uvaddr_inc = uvaddr;
+	while (uvaddr_inc < uvaddr + bytes_length) {
+		upte = get_locked_pte(mm, uvaddr_inc, &ptl);
+		if (!upte)
+			goto unpin_pages;
+		old_pte = ptep_modify_prot_start(vma, uvaddr_inc, upte);
+		pte = pte_modify(old_pte, PAGE_KERNEL);
+		ptep_modify_prot_commit(vma, uvaddr_inc, upte, old_pte, pte);
+		pte_unmap_unlock(upte, ptl);
+		uvaddr_inc += PAGE_SIZE;
+	}
+	flush_tlb_range(vma, uvaddr, uvaddr + bytes_length);
+
+	/* Return data */
+	mmgrab(mm);
+	ctx->_area.ptr = (void *) uvaddr;
+	ctx->_pages = sec_pages;
+	ctx->_nr_pages = nr_pages;
+	ctx->_mm = mm;
+	ctx->_file = kernel_secfile;
+
+	mmap_write_unlock(mm);
+	mmput(mm);
+
+	return &(ctx->_area);
+
+unpin_pages:
+	unpin_user_pages(sec_pages, nr_pinned_pages);
+free_sec_pages:
+	kfree(sec_pages);
+unseal_vma:
+	rc = do_mseal(uvaddr, uvaddr + bytes_length, false);
+	if (rc)
+		BUG();
+unmap_pages:
+	rc = do_munmap(mm, uvaddr, bytes_length, NULL);
+	if (rc)
+		BUG();
+unlock_mmap:
+	mmap_write_unlock(mm);
+close_secfile:
+	fput(kernel_secfile);
+put_mm:
+	mmput(mm);
+	return NULL;
+}
+
+void secretmem_release_pages(struct secretmem_area *data)
+{
+	unsigned long uvaddr, bytes_length;
+	struct secretmem_ctx *ctx;
+	int rc;
+
+	if (!data || !data->ptr)
+		BUG();
+
+	ctx = container_of(data, struct secretmem_ctx, _area);
+	if (!ctx || !ctx->_file || !ctx->_pages || !ctx->_mm)
+		BUG();
+
+	bytes_length = ctx->_nr_pages * PAGE_SIZE;
+	uvaddr = (unsigned long) data->ptr;
+
+	/*
+	 * Remove the mapping if mm is still in use.
+	 * Not secure to continue if unmapping failed.
+	 */
+	if (mmget_not_zero(ctx->_mm)) {
+		mmap_write_lock(ctx->_mm);
+		rc = do_mseal(uvaddr, uvaddr + bytes_length, false);
+		if (rc) {
+			mmap_write_unlock(ctx->_mm);
+			BUG();
+		}
+		rc = do_munmap(ctx->_mm, uvaddr, bytes_length, NULL);
+		if (rc) {
+			mmap_write_unlock(ctx->_mm);
+			BUG();
+		}
+		mmap_write_unlock(ctx->_mm);
+		mmput(ctx->_mm);
+	}
+
+	mmdrop(ctx->_mm);
+	unpin_user_pages(ctx->_pages, ctx->_nr_pages);
+	fput(ctx->_file);
+	kfree(ctx->_pages);
+
+	ctx->_nr_pages = 0;
+	ctx->_pages = NULL;
+	ctx->_file = NULL;
+	ctx->_mm = NULL;
+	ctx->_area.ptr = NULL;
+}
+
 SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
 {
 	struct file *file;
-- 
2.34.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 3/3] drivers/misc: add test driver and selftest for proclocal allocator
  2024-06-21 20:14 [PATCH RFC 0/3] add support for mm-local memory allocations Roman Kagan
  2024-06-21 20:14 ` [PATCH RFC 1/3] mseal: expose interface to seal / unseal user memory ranges Roman Kagan
  2024-06-21 20:15 ` [PATCH RFC 2/3] mm/secretmem: implement mm-local kernel allocations Roman Kagan
@ 2024-06-21 20:15 ` Roman Kagan
  2024-07-03 14:36   ` Greg Kroah-Hartman
  2024-07-04 11:11 ` [PATCH RFC 0/3] add support for mm-local memory allocations David Woodhouse
  2024-08-28  9:50 ` Alexander Graf
  4 siblings, 1 reply; 7+ messages in thread
From: Roman Kagan @ 2024-06-21 20:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shuah Khan, Dragan Cvetic, Fares Mehanna, Alexander Graf,
	Derek Kiernan, linux-kselftest, nh-open-source,
	Greg Kroah-Hartman, linux-mm, David Woodhouse, Andrew Morton,
	Arnd Bergmann

Introduce a simple driver for functional and stress testing of proclocal
kernel allocator.  The driver exposes a device node /dev/proclocal-test,
which allows userland programs to request creation of proclocal areas
and to obtain their addresses as seen by the kernel, and in addition to
read and write kernel memory at arbitrary address content (simplified
/dev/kmem good enough to access proclocal allocations under selftest
responsibility).

The driver is not meant for use with production kernels, as it exposes
internal kernel pointers and data.

Also add a basic selftest that uses this driver.

Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
 drivers/misc/Makefile                         |   1 +
 tools/testing/selftests/proclocal/Makefile    |   6 +
 drivers/misc/proclocal-test.c                 | 200 ++++++++++++++++++
 .../selftests/proclocal/proclocal-test.c      | 150 +++++++++++++
 drivers/misc/Kconfig                          |  15 ++
 tools/testing/selftests/proclocal/.gitignore  |   1 +
 6 files changed, 373 insertions(+)
 create mode 100644 tools/testing/selftests/proclocal/Makefile
 create mode 100644 drivers/misc/proclocal-test.c
 create mode 100644 tools/testing/selftests/proclocal/proclocal-test.c
 create mode 100644 tools/testing/selftests/proclocal/.gitignore

diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 153a3f4837e8..33c244cee92d 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -69,3 +69,4 @@ obj-$(CONFIG_TMR_INJECT)	+= xilinx_tmr_inject.o
 obj-$(CONFIG_TPS6594_ESM)	+= tps6594-esm.o
 obj-$(CONFIG_TPS6594_PFSM)	+= tps6594-pfsm.o
 obj-$(CONFIG_NSM)		+= nsm.o
+obj-$(CONFIG_PROCLOCAL_TEST)	+= proclocal-test.o
diff --git a/tools/testing/selftests/proclocal/Makefile b/tools/testing/selftests/proclocal/Makefile
new file mode 100644
index 000000000000..b93baecee762
--- /dev/null
+++ b/tools/testing/selftests/proclocal/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_GEN_PROGS := proclocal-test
+CFLAGS += -O2 -g -Wall $(KHDR_INCLUDES)
+
+include ../lib.mk
diff --git a/drivers/misc/proclocal-test.c b/drivers/misc/proclocal-test.c
new file mode 100644
index 000000000000..9b3d0ed9b2f9
--- /dev/null
+++ b/drivers/misc/proclocal-test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Author: Roman Kagan <rkagan@amazon.de>
+ *
+ * test driver for proclocal memory allocator
+ */
+
+#include <linux/compat.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/workqueue.h>
+#include <linux/file.h>
+#include <linux/secretmem.h>
+
+struct proclocal_test_alloc {
+	u64 size;
+	u64 ptr;
+};
+
+#define PROCLOCAL_TEST_ALLOC _IOWR('A', 0x10, struct proclocal_test_alloc)
+
+#define BOUNCE_BUF_SIZE PAGE_SIZE
+
+struct proclocal_test {
+	struct secretmem_area *area;
+	size_t size;
+	void *bounce;
+};
+
+static int proclocal_test_open(struct inode *inode, struct file *f)
+{
+	struct proclocal_test *plt;
+
+	plt = kzalloc(sizeof(*plt), GFP_KERNEL);
+	if (!plt)
+		return -ENOMEM;
+
+	plt->bounce = kmalloc(BOUNCE_BUF_SIZE, GFP_KERNEL);
+	if (!plt->bounce) {
+		kfree(plt);
+		return -ENOMEM;
+	}
+
+	f->f_mode |= FMODE_UNSIGNED_OFFSET;
+	f->private_data = plt;
+	return 0;
+}
+
+static int proclocal_test_release(struct inode *inode, struct file *f)
+{
+	struct proclocal_test *plt = f->private_data;
+	if (plt->area)
+		secretmem_release_pages(plt->area);
+	kfree(plt->bounce);
+	kfree(plt);
+	return 0;
+}
+
+static ssize_t proclocal_test_read(struct file *f, char __user *buf,
+				   size_t count, loff_t *ppos)
+{
+	struct proclocal_test *plt = f->private_data;
+	const void *p = (const void *)*ppos;
+	ssize_t ret = -EFAULT;
+
+	if (p + count < p)
+		return -EINVAL;
+
+	while (count) {
+		size_t chunk = min_t(size_t, count, BOUNCE_BUF_SIZE);
+		size_t left;
+
+		/*
+		 * copy_to_user() disables superuser checks, so need to copy to
+		 * bounce buffer first to test the access
+		 */
+		memcpy(plt->bounce, p, chunk);
+
+		left = copy_to_user(buf, plt->bounce, chunk);
+		if (left == chunk)
+			goto out;
+		chunk -= left;
+
+		buf += chunk;
+		p += chunk;
+		count -= chunk;
+	}
+
+	ret = p - (const void *)*ppos;
+	*ppos = (loff_t)p;
+out:
+	return ret;
+}
+
+static ssize_t proclocal_test_write(struct file *f, const char __user *buf,
+				    size_t count, loff_t *ppos)
+{
+	struct proclocal_test *plt = f->private_data;
+	void *p = (void *)*ppos;
+	ssize_t ret = -EFAULT;
+
+	if (p + count < p)
+		return -EINVAL;
+
+	while (count) {
+		size_t chunk = min_t(size_t, count, BOUNCE_BUF_SIZE);
+		size_t left;
+
+		/*
+		 * copy_from_user() disables superuser checks, so need to copy
+		 * to bounce buffer first to test the access
+		 */
+		left = copy_from_user(plt->bounce, buf, chunk);
+		if (left == chunk)
+			goto out;
+		chunk -= left;
+
+		memcpy(p, plt->bounce, chunk);
+
+		buf += chunk;
+		p += chunk;
+		count -= chunk;
+	}
+
+	ret = p - (void *)*ppos;
+	*ppos = (loff_t)p;
+out:
+	return ret;
+}
+
+static long proclocal_test_alloc(struct proclocal_test *plt,
+				 void __user *argp)
+{
+	struct proclocal_test_alloc pta;
+	unsigned long pages_needed;
+
+	if (plt->size)
+		return -EEXIST;
+
+	if (copy_from_user(&pta, argp, sizeof(pta)))
+		return -EFAULT;
+
+	if (!pta.size)
+		return -EINVAL;
+
+	pages_needed = (pta.size + PAGE_SIZE - 1) / PAGE_SIZE;
+	plt->area = secretmem_allocate_pages(fls(pages_needed - 1));
+	if (!plt->area)
+		return -ENOMEM;
+
+	plt->size = pta.size;
+
+	pta.ptr = (u64)plt->area->ptr;
+	if (copy_to_user(argp, &pta, sizeof(pta)))
+		goto err;
+
+	return 0;
+err:
+	secretmem_release_pages(plt->area);
+	plt->area = NULL;
+	plt->size = 0;
+	return -EFAULT;
+}
+
+static long proclocal_test_ioctl(struct file *f, unsigned int ioctl,
+				 unsigned long arg)
+{
+	struct proclocal_test *plt = f->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case PROCLOCAL_TEST_ALLOC:
+		return proclocal_test_alloc(plt, argp);
+	default:
+		return -EINVAL;
+	}
+}
+
+static const struct file_operations proclocal_test_fops = {
+	.owner          = THIS_MODULE,
+	.release        = proclocal_test_release,
+	.unlocked_ioctl = proclocal_test_ioctl,
+	.compat_ioctl   = compat_ptr_ioctl,
+	.open           = proclocal_test_open,
+	.read           = proclocal_test_read,
+	.write          = proclocal_test_write,
+	.llseek		= no_seek_end_llseek,
+};
+
+static struct miscdevice proclocal_test_misc = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name  = "proclocal-test",
+	.fops  = &proclocal_test_fops,
+};
+module_misc_device(proclocal_test_misc);
+
+MODULE_VERSION("0.0.1");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Roman Kagan");
+MODULE_DESCRIPTION("Test driver for proclocal allocator");
diff --git a/tools/testing/selftests/proclocal/proclocal-test.c b/tools/testing/selftests/proclocal/proclocal-test.c
new file mode 100644
index 000000000000..386cc5d9e51a
--- /dev/null
+++ b/tools/testing/selftests/proclocal/proclocal-test.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Author: Roman Kagan <rkagan@amazon.de>
+ *
+ * test for proclocal memory allocator using the corresponding test device
+ */
+
+#include <fcntl.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include "../kselftest_harness.h"
+
+struct proclocal_test_alloc {
+	uint64_t size;
+	uint64_t ptr;
+};
+
+#define PROCLOCAL_TEST_ALLOC _IOWR('A', 0x10, struct proclocal_test_alloc)
+
+const char proclocal_content[] = "this is test";
+char buf[256];
+
+FIXTURE(proclocal) {
+	int fd;
+	void *ptr;
+};
+
+FIXTURE_SETUP(proclocal)
+{
+	struct proclocal_test_alloc pta = {
+		.size = sizeof(buf),
+	};
+
+	self->fd = open("/dev/proclocal-test", O_RDWR);
+	ASSERT_LE(0, self->fd);
+
+	ASSERT_LE(0, ioctl(self->fd, PROCLOCAL_TEST_ALLOC, &pta));
+
+	self->ptr = (void *) pta.ptr;
+	TH_LOG("self->ptr = %p\n", self->ptr);
+}
+
+FIXTURE_TEARDOWN(proclocal)
+{
+}
+
+TEST_F(proclocal, kernel_access)
+{
+	ASSERT_EQ((off_t)self->ptr,
+		  lseek(self->fd, (off_t)self->ptr, SEEK_SET));
+	EXPECT_EQ(sizeof(proclocal_content),
+		  write(self->fd,
+			proclocal_content, sizeof(proclocal_content)));
+	ASSERT_EQ((off_t)self->ptr,
+		  lseek(self->fd, (off_t)self->ptr, SEEK_SET));
+	EXPECT_EQ(sizeof(proclocal_content),
+		  read(self->fd, buf, sizeof(proclocal_content)));
+	EXPECT_STREQ(proclocal_content, buf);
+}
+
+sigjmp_buf jmpbuf;
+void segv_handler(int signum, siginfo_t *si, void *uc)
+{
+	if (signum == SIGSEGV)
+		siglongjmp(jmpbuf, 1);
+}
+
+TEST_F(proclocal, direct_access)
+{
+	bool access_succeeded;
+	struct sigaction sa;
+
+	if (sigsetjmp(jmpbuf, 1) == 0) {
+		sa.sa_sigaction = segv_handler;
+		sa.sa_flags = SA_SIGINFO | SA_RESETHAND;
+		sigemptyset(&sa.sa_mask);
+
+		sigaction(SIGSEGV, &sa, NULL);
+
+		(void)((volatile char *)self->ptr)[0];
+
+		access_succeeded = true;
+	} else
+		access_succeeded = false;
+
+	EXPECT_FALSE(access_succeeded);
+}
+
+#define PAGE_SIZE 0x1000
+
+TEST_F(proclocal, map_over)
+{
+	void *ptr_page = (void *)((uintptr_t)self->ptr & ~(PAGE_SIZE - 1));
+	void *map;
+	int errno_save;
+
+	errno = 0;
+	map = mmap(ptr_page, PAGE_SIZE, PROT_NONE,
+		   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	errno_save = errno;
+
+	EXPECT_EQ(MAP_FAILED, map);
+	TH_LOG("errno = %d", errno_save);
+
+	if (map != MAP_FAILED)
+		munmap(map, PAGE_SIZE);
+}
+
+TEST_F(proclocal, release)
+{
+	EXPECT_EQ(0, close(self->fd));
+}
+
+TEST_F(proclocal, map_over_closed)
+{
+	void *ptr_page = (void *)((uintptr_t)self->ptr & ~(PAGE_SIZE - 1));
+	void *map;
+	int errno_save;
+
+	ASSERT_EQ(0, close(self->fd));
+
+	errno = 0;
+	map = mmap(ptr_page, PAGE_SIZE, PROT_NONE,
+		   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	errno_save = errno;
+
+	EXPECT_EQ(ptr_page, map);
+	TH_LOG("errno = %d", errno_save);
+
+	if (map != MAP_FAILED)
+		munmap(map, PAGE_SIZE);
+}
+
+TEST_F(proclocal, kernel_access_closed)
+{
+	ASSERT_EQ(0, close(self->fd));
+	self->fd = open("/dev/proclocal-test", O_RDWR);
+	ASSERT_LE(0, self->fd);
+
+	ASSERT_EQ((off_t)self->ptr,
+		  lseek(self->fd, (off_t)self->ptr, SEEK_SET));
+	EXPECT_EQ(-1, read(self->fd, buf, sizeof(proclocal_content)));
+}
+
+TEST_HARNESS_MAIN
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index faf983680040..29a334de0ca8 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -585,6 +585,21 @@ config NSM
 	  To compile this driver as a module, choose M here.
 	  The module will be called nsm.
 
+config PROCLOCAL_TEST
+	tristate "Proclocal allocator test driver"
+	depends on SECRETMEM
+	help
+	  This driver allows to perform functional and stress tests for
+	  proclocal memory allocator.  It exposes /dev/proclocal-test that
+	  userland test programs can use to create and manipulate proclocal
+	  kernel allocations.
+
+	  To compile this driver as a module, choose M here: the module will be
+	  called proclocal-test.
+
+	  If unsure, say N.
+	  This driver is not meant to be used on production systems.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/tools/testing/selftests/proclocal/.gitignore b/tools/testing/selftests/proclocal/.gitignore
new file mode 100644
index 000000000000..47e0fdcd6e3a
--- /dev/null
+++ b/tools/testing/selftests/proclocal/.gitignore
@@ -0,0 +1 @@
+/proclocal-test
-- 
2.34.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 3/3] drivers/misc: add test driver and selftest for proclocal allocator
  2024-06-21 20:15 ` [PATCH RFC 3/3] drivers/misc: add test driver and selftest for proclocal allocator Roman Kagan
@ 2024-07-03 14:36   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-03 14:36 UTC (permalink / raw)
  To: Roman Kagan
  Cc: linux-kernel, Shuah Khan, Dragan Cvetic, Fares Mehanna,
	Alexander Graf, Derek Kiernan, linux-kselftest, nh-open-source,
	linux-mm, David Woodhouse, Andrew Morton, Arnd Bergmann

On Fri, Jun 21, 2024 at 10:15:01PM +0200, Roman Kagan wrote:
> Introduce a simple driver for functional and stress testing of proclocal
> kernel allocator.  The driver exposes a device node /dev/proclocal-test,
> which allows userland programs to request creation of proclocal areas
> and to obtain their addresses as seen by the kernel, and in addition to
> read and write kernel memory at arbitrary address content (simplified
> /dev/kmem good enough to access proclocal allocations under selftest
> responsibility).
> 
> The driver is not meant for use with production kernels, as it exposes
> internal kernel pointers and data.

Then you MUST taint the kernel and throw up huge warnings whenever it is
loaded, otherwise distros will build this in and end up running it.


> 
> Also add a basic selftest that uses this driver.
> 
> Signed-off-by: Roman Kagan <rkagan@amazon.de>
> ---
>  drivers/misc/Makefile                         |   1 +
>  tools/testing/selftests/proclocal/Makefile    |   6 +
>  drivers/misc/proclocal-test.c                 | 200 ++++++++++++++++++
>  .../selftests/proclocal/proclocal-test.c      | 150 +++++++++++++
>  drivers/misc/Kconfig                          |  15 ++
>  tools/testing/selftests/proclocal/.gitignore  |   1 +
>  6 files changed, 373 insertions(+)
>  create mode 100644 tools/testing/selftests/proclocal/Makefile
>  create mode 100644 drivers/misc/proclocal-test.c
>  create mode 100644 tools/testing/selftests/proclocal/proclocal-test.c
>  create mode 100644 tools/testing/selftests/proclocal/.gitignore
> 
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 153a3f4837e8..33c244cee92d 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -69,3 +69,4 @@ obj-$(CONFIG_TMR_INJECT)	+= xilinx_tmr_inject.o
>  obj-$(CONFIG_TPS6594_ESM)	+= tps6594-esm.o
>  obj-$(CONFIG_TPS6594_PFSM)	+= tps6594-pfsm.o
>  obj-$(CONFIG_NSM)		+= nsm.o
> +obj-$(CONFIG_PROCLOCAL_TEST)	+= proclocal-test.o
> diff --git a/tools/testing/selftests/proclocal/Makefile b/tools/testing/selftests/proclocal/Makefile
> new file mode 100644
> index 000000000000..b93baecee762
> --- /dev/null
> +++ b/tools/testing/selftests/proclocal/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_GEN_PROGS := proclocal-test
> +CFLAGS += -O2 -g -Wall $(KHDR_INCLUDES)
> +
> +include ../lib.mk
> diff --git a/drivers/misc/proclocal-test.c b/drivers/misc/proclocal-test.c
> new file mode 100644
> index 000000000000..9b3d0ed9b2f9
> --- /dev/null
> +++ b/drivers/misc/proclocal-test.c
> @@ -0,0 +1,200 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (C) 2024 Amazon.com, Inc. or its affiliates. All rights reserved.
> + * Author: Roman Kagan <rkagan@amazon.de>
> + *
> + * test driver for proclocal memory allocator
> + */
> +
> +#include <linux/compat.h>
> +#include <linux/miscdevice.h>
> +#include <linux/module.h>
> +#include <linux/workqueue.h>
> +#include <linux/file.h>
> +#include <linux/secretmem.h>
> +
> +struct proclocal_test_alloc {
> +	u64 size;
> +	u64 ptr;

This structure is not defined properly to cross the user/kernel boundry :(

> +};
> +
> +#define PROCLOCAL_TEST_ALLOC _IOWR('A', 0x10, struct proclocal_test_alloc)

ioctl definitions belong in a .h file for userspace to be able to see
them.

> +
> +#define BOUNCE_BUF_SIZE PAGE_SIZE

Then why not just use PAGE_SIZE everywhere?  Less characters and we all
know what that is.

Stopped reviewing here, sorry.

greg k-h


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/3] add support for mm-local memory allocations
  2024-06-21 20:14 [PATCH RFC 0/3] add support for mm-local memory allocations Roman Kagan
                   ` (2 preceding siblings ...)
  2024-06-21 20:15 ` [PATCH RFC 3/3] drivers/misc: add test driver and selftest for proclocal allocator Roman Kagan
@ 2024-07-04 11:11 ` David Woodhouse
  2024-08-28  9:50 ` Alexander Graf
  4 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2024-07-04 11:11 UTC (permalink / raw)
  To: Roman Kagan, linux-kernel
  Cc: Shuah Khan, Dragan Cvetic, Fares Mehanna, Alexander Graf,
	Derek Kiernan, linux-kselftest, nh-open-source,
	Greg Kroah-Hartman, linux-mm, Andrew Morton, Arnd Bergmann

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

On Fri, 2024-06-21 at 22:14 +0200, Roman Kagan wrote:
> 
> Compared to the approach used in the orignal series, where a dedicated kernel
> address range and thus a dedicated PGD was used for mm-local allocations, the
> one proposed here may have certain drawbacks, in particular
> 
> - using user addresses for kernel memory may violate assumptions in various
>   parts of kernel code which we may not have identified with smoke tests we did
> 
> - the allocated addresses are guessable by the userland (ATM they are even
>   visible in /proc/PID/maps but that's fixable) which may weaken the security
>   posture

I think this approach makes sense as it's generic and applies
immediately to all architectures. I'm slightly uncomfortable about
using userspace addresses though, and the special cases that it
introduces.

I'd like to see a per-arch ARCH_HAS_PROCLOCAL_PGD so that it *can* be
put back into a dedicated address range where possible.

Looking forward to the x86 KVM code from before being dusted off and
put on top of this, and also the Arm version of same. A test driver and
test case is all very well, but it's less exciting than the real use
case :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/3] add support for mm-local memory allocations
  2024-06-21 20:14 [PATCH RFC 0/3] add support for mm-local memory allocations Roman Kagan
                   ` (3 preceding siblings ...)
  2024-07-04 11:11 ` [PATCH RFC 0/3] add support for mm-local memory allocations David Woodhouse
@ 2024-08-28  9:50 ` Alexander Graf
  4 siblings, 0 replies; 7+ messages in thread
From: Alexander Graf @ 2024-08-28  9:50 UTC (permalink / raw)
  To: Roman Kagan, linux-kernel
  Cc: Shuah Khan, Dragan Cvetic, Fares Mehanna, Alexander Graf,
	Derek Kiernan, linux-kselftest, nh-open-source,
	Greg Kroah-Hartman, linux-mm, David Woodhouse, Andrew Morton,
	Arnd Bergmann

Hey Roman,

On 21.06.24 22:14, Roman Kagan wrote:
> In a series posted a few years ago [1], a proposal was put forward to allow the
> kernel to allocate memory local to a mm and thus push it out of reach for
> current and future speculation-based cross-process attacks.  We still believe
> this is a nice thing to have.
>
> However, in the time passed since that post Linux mm has grown quite a few new
> goodies, so we'd like to explore possibilities to implement this functionality
> with less effort and churn leveraging the now available facilities.
>
> Specifically, this is a proof-of-concept attempt to implement mm-local
> allocations piggy-backing on memfd_secret(), using regular user addressess but
> pinning the pages and flipping the user/supervisor flag on the respective PTEs
> to make them directly accessible from kernel, and sealing the VMA to prevent
> userland from taking over the address range.  The approach allowed to delegate
> all the heavy lifting -- address management, interactions with the direct map,
> cleanup on mm teardown -- to the existing infrastructure, and required zero
> architecture-specific code.
>
> Compared to the approach used in the orignal series, where a dedicated kernel
> address range and thus a dedicated PGD was used for mm-local allocations, the
> one proposed here may have certain drawbacks, in particular
>
> - using user addresses for kernel memory may violate assumptions in various
>    parts of kernel code which we may not have identified with smoke tests we did
>
> - the allocated addresses are guessable by the userland (ATM they are even
>    visible in /proc/PID/maps but that's fixable) which may weaken the security
>    posture
>
> Also included is a simple test driver and selftest to smoke test and showcase
> the feature.
>
> The code is PoC RFC and lacks a lot of checks and special case handling, but
> demonstrates the idea.  We'd appreciate any feedback on whether it's a viable
> approach or it should better be abandoned in favor of the one with dedicated
> PGD / kernel address range or yet something else.
>
> [1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@amazon.de/


I haven't seen any negative feedback on the RFC, so when can I expect a 
v1 of this patch set that addresses the non-production-readyness of it 
that you call out above? :)


Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-08-28  9:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-21 20:14 [PATCH RFC 0/3] add support for mm-local memory allocations Roman Kagan
2024-06-21 20:14 ` [PATCH RFC 1/3] mseal: expose interface to seal / unseal user memory ranges Roman Kagan
2024-06-21 20:15 ` [PATCH RFC 2/3] mm/secretmem: implement mm-local kernel allocations Roman Kagan
2024-06-21 20:15 ` [PATCH RFC 3/3] drivers/misc: add test driver and selftest for proclocal allocator Roman Kagan
2024-07-03 14:36   ` Greg Kroah-Hartman
2024-07-04 11:11 ` [PATCH RFC 0/3] add support for mm-local memory allocations David Woodhouse
2024-08-28  9:50 ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox