linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave@sr71.net>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, x86@kernel.org, Dave Hansen <dave@sr71.net>,
	dave.hansen@linux.intel.com, linux-arch@vger.kernel.org
Subject: [PATCH 18/34] mm: add gup flag to indicate "foreign" mm access
Date: Thu, 03 Dec 2015 17:14:49 -0800	[thread overview]
Message-ID: <20151204011449.768C774C@viggo.jf.intel.com> (raw)
In-Reply-To: <20151204011424.8A36E365@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

We try to enforce protection keys in software the same way that we
do in hardware.  (See long example below).

But, we only want to do this when accessing our *own* process's
memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory.  PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.

This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process.  We want to avoid that.

To avoid that, we add a GUP flag: FOLL_FOREIGN and a fault flag:
FAULT_FLAG_FOREIGN.  They indicate that we are walking an mm
which is not guranteed to be the same as current->mm and should
not be subject to protection key enforcement.

Thanks to Jerome Glisse for pointing out this scenario.

*** Why do we enforce protection keys in software?? ***

Imagine that we disabled access to the memory pointer to by 'buf'.
The, we implemented sys_write() like this:

	sys_read(fd, buf, len...)
	{
		struct page *page = follow_page(buf);
		void *buf_mapped = kmap(page);
		memcpy(buf_mapped, fd_data, len);
		...
	}

This writes to 'buf' via a *kernel* mapping, without a protection
key.  While this implementation does the same thing:

	sys_read(fd, buf, len...)
	{
		copy_to_user(buf, fd_data, len);
		...
	}

but would hit a protection key fault because the userspace 'buf'
mapping has a protection key set.

To provide consistency, and to make key-protected memory work
as much like mprotect()ed memory as possible, we try to enforce
the same protections as the hardware would when the *kernel* walks
the page tables (and other mm structures).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
---

 b/arch/powerpc/include/asm/mmu_context.h   |    3 ++-
 b/arch/s390/include/asm/mmu_context.h      |    3 ++-
 b/arch/unicore32/include/asm/mmu_context.h |    3 ++-
 b/arch/x86/include/asm/mmu_context.h       |    5 +++--
 b/drivers/iommu/amd_iommu_v2.c             |    8 +++++---
 b/include/asm-generic/mm_hooks.h           |    3 ++-
 b/include/linux/mm.h                       |    2 ++
 b/mm/gup.c                                 |   15 ++++++++++-----
 b/mm/ksm.c                                 |   10 ++++++++--
 b/mm/memory.c                              |    3 ++-
 10 files changed, 38 insertions(+), 17 deletions(-)

diff -puN arch/powerpc/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag arch/powerpc/include/asm/mmu_context.h
--- a/arch/powerpc/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.223698386 -0800
+++ b/arch/powerpc/include/asm/mmu_context.h	2015-12-03 16:21:26.241699202 -0800
@@ -148,7 +148,8 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
-static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write)
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool foreign)
 {
 	/* by default, allow everything */
 	return true;
diff -puN arch/s390/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag arch/s390/include/asm/mmu_context.h
--- a/arch/s390/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.224698431 -0800
+++ b/arch/s390/include/asm/mmu_context.h	2015-12-03 16:21:26.242699248 -0800
@@ -130,7 +130,8 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
-static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write)
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool foreign)
 {
 	/* by default, allow everything */
 	return true;
diff -puN arch/unicore32/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag arch/unicore32/include/asm/mmu_context.h
--- a/arch/unicore32/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.226698522 -0800
+++ b/arch/unicore32/include/asm/mmu_context.h	2015-12-03 16:21:26.242699248 -0800
@@ -97,7 +97,8 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
-static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write)
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool foreign)
 {
 	/* by default, allow everything */
 	return true;
diff -puN arch/x86/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag arch/x86/include/asm/mmu_context.h
--- a/arch/x86/include/asm/mmu_context.h~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.228698613 -0800
+++ b/arch/x86/include/asm/mmu_context.h	2015-12-03 16:21:26.242699248 -0800
@@ -299,10 +299,11 @@ static inline bool vma_is_foreign(struct
 	return false;
 }
 
-static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write)
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool foreign)
 {
 	/* allow access if the VMA is not one from this process */
-	if (vma_is_foreign(vma))
+	if (foreign || vma_is_foreign(vma))
 		return true;
 	return __pkru_allows_pkey(vma_pkey(vma), write);
 }
diff -puN drivers/iommu/amd_iommu_v2.c~pkeys-12-gup-fault-foreign-flag drivers/iommu/amd_iommu_v2.c
--- a/drivers/iommu/amd_iommu_v2.c~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.229698658 -0800
+++ b/drivers/iommu/amd_iommu_v2.c	2015-12-03 16:21:26.243699293 -0800
@@ -500,9 +500,11 @@ static void do_fault(struct work_struct
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	u64 address;
-	int ret, write;
+	int ret, flags;
 
-	write = !!(fault->flags & PPR_FAULT_WRITE);
+	if (fault->flags & PPR_FAULT_WRITE)
+		flags = FAULT_FLAG_WRITE;
+	flags |= FAULT_FLAG_FOREIGN;
 
 	mm = fault->state->mm;
 	address = fault->address;
@@ -523,7 +525,7 @@ static void do_fault(struct work_struct
 		goto out;
 	}
 
-	ret = handle_mm_fault(mm, vma, address, write);
+	ret = handle_mm_fault(mm, vma, address, flags);
 	if (ret & VM_FAULT_ERROR) {
 		/* failed to service fault */
 		up_read(&mm->mmap_sem);
diff -puN include/asm-generic/mm_hooks.h~pkeys-12-gup-fault-foreign-flag include/asm-generic/mm_hooks.h
--- a/include/asm-generic/mm_hooks.h~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.231698749 -0800
+++ b/include/asm-generic/mm_hooks.h	2015-12-03 16:21:26.243699293 -0800
@@ -26,7 +26,8 @@ static inline void arch_bprm_mm_init(str
 {
 }
 
-static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write)
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool foreign)
 {
 	/* by default, allow everything */
 	return true;
diff -puN include/linux/mm.h~pkeys-12-gup-fault-foreign-flag include/linux/mm.h
--- a/include/linux/mm.h~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.233698839 -0800
+++ b/include/linux/mm.h	2015-12-03 16:21:26.244699338 -0800
@@ -232,6 +232,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_KILLABLE	0x10	/* The fault task is in SIGKILL killable region */
 #define FAULT_FLAG_TRIED	0x20	/* Second try */
 #define FAULT_FLAG_USER		0x40	/* The fault originated in userspace */
+#define FAULT_FLAG_FOREIGN	0x80	/* faulting for non current tsk/mm */
 
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's
@@ -2138,6 +2139,7 @@ static inline struct page *follow_page(s
 #define FOLL_MIGRATION	0x400	/* wait for page to replace migration entry */
 #define FOLL_TRIED	0x800	/* a retry, previous pass started an IO */
 #define FOLL_MLOCK	0x1000	/* lock present pages */
+#define FOLL_FOREIGN	0x2000	/* we are working on non-current tsk/mm */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
diff -puN mm/gup.c~pkeys-12-gup-fault-foreign-flag mm/gup.c
--- a/mm/gup.c~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.234698885 -0800
+++ b/mm/gup.c	2015-12-03 16:21:26.245699384 -0800
@@ -310,6 +310,8 @@ static int faultin_page(struct task_stru
 		return -ENOENT;
 	if (*flags & FOLL_WRITE)
 		fault_flags |= FAULT_FLAG_WRITE;
+	if (*flags & FOLL_FOREIGN)
+		fault_flags |= FAULT_FLAG_FOREIGN;
 	if (nonblocking)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
 	if (*flags & FOLL_NOWAIT)
@@ -360,11 +362,13 @@ static int faultin_page(struct task_stru
 static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 {
 	vm_flags_t vm_flags = vma->vm_flags;
+	int write = (gup_flags & FOLL_WRITE);
+	int foreign = (gup_flags & FOLL_FOREIGN);
 
 	if (vm_flags & (VM_IO | VM_PFNMAP))
 		return -EFAULT;
 
-	if (gup_flags & FOLL_WRITE) {
+	if (write) {
 		if (!(vm_flags & VM_WRITE)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;
@@ -392,7 +396,7 @@ static int check_vma_flags(struct vm_are
 		if (!(vm_flags & VM_MAYREAD))
 			return -EFAULT;
 	}
-	if (!arch_vma_access_permitted(vma, (gup_flags & FOLL_WRITE)))
+	if (!arch_vma_access_permitted(vma, write, foreign))
 		return -EFAULT;
 	return 0;
 }
@@ -562,7 +566,8 @@ EXPORT_SYMBOL(__get_user_pages);
 
 bool vma_permits_fault(struct vm_area_struct *vma, unsigned int fault_flags)
 {
-	bool write = !!(fault_flags & FAULT_FLAG_WRITE);
+	bool write   = !!(fault_flags & FAULT_FLAG_WRITE);
+	bool foreign = !!(fault_flags & FAULT_FLAG_FOREIGN);
 	vm_flags_t vm_flags = write ? VM_WRITE : VM_READ;
 
 	if (!(vm_flags & vma->vm_flags))
@@ -570,9 +575,9 @@ bool vma_permits_fault(struct vm_area_st
 
 	/*
 	 * The architecture might have a hardware protection
-	 * mechanism other than read/write that can deny access
+	 * mechanism other than read/write that can deny access.
 	 */
-	if (!arch_vma_access_permitted(vma, write))
+	if (!arch_vma_access_permitted(vma, write, foreign))
 		return false;
 
 	return true;
diff -puN mm/ksm.c~pkeys-12-gup-fault-foreign-flag mm/ksm.c
--- a/mm/ksm.c~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.236698975 -0800
+++ b/mm/ksm.c	2015-12-03 16:21:26.246699429 -0800
@@ -359,6 +359,10 @@ static inline bool ksm_test_exit(struct
  * in case the application has unmapped and remapped mm,addr meanwhile.
  * Could a ksm page appear anywhere else?  Actually yes, in a VM_PFNMAP
  * mmap of /dev/mem or /dev/kmem, where we would not want to touch it.
+ *
+ * FAULT_FLAG/FOLL_FOREIGN are because we do this outside the context
+ * of the process that owns 'vma'.  We also do not want to enforce
+ * protection keys here anyway.
  */
 static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
 {
@@ -367,12 +371,14 @@ static int break_ksm(struct vm_area_stru
 
 	do {
 		cond_resched();
-		page = follow_page(vma, addr, FOLL_GET | FOLL_MIGRATION);
+		page = follow_page(vma, addr,
+				FOLL_GET | FOLL_MIGRATION | FOLL_FOREIGN);
 		if (IS_ERR_OR_NULL(page))
 			break;
 		if (PageKsm(page))
 			ret = handle_mm_fault(vma->vm_mm, vma, addr,
-							FAULT_FLAG_WRITE);
+							FAULT_FLAG_WRITE |
+							FAULT_FLAG_FOREIGN);
 		else
 			ret = VM_FAULT_WRITE;
 		put_page(page);
diff -puN mm/memory.c~pkeys-12-gup-fault-foreign-flag mm/memory.c
--- a/mm/memory.c~pkeys-12-gup-fault-foreign-flag	2015-12-03 16:21:26.238699066 -0800
+++ b/mm/memory.c	2015-12-03 16:21:26.247699474 -0800
@@ -3345,7 +3345,8 @@ static int __handle_mm_fault(struct mm_s
 	pmd_t *pmd;
 	pte_t *pte;
 
-	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE))
+	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
+					    flags & FAULT_FLAG_FOREIGN))
 		return VM_FAULT_SIGSEGV;
 
 	if (unlikely(is_vm_hugetlb_page(vma)))
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-12-04  1:15 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-04  1:14 [PATCH 00/34] x86: Memory Protection Keys (v5) Dave Hansen
2015-12-04  1:14 ` [PATCH 01/34] mm, gup: introduce concept of "foreign" get_user_pages() Dave Hansen
2015-12-04  1:14 ` [PATCH 02/34] x86, fpu: add placeholder for Processor Trace XSAVE state Dave Hansen
2015-12-04  1:14 ` [PATCH 03/34] x86, pkeys: Add Kconfig option Dave Hansen
2015-12-04  1:14 ` [PATCH 04/34] x86, pkeys: cpuid bit definition Dave Hansen
2015-12-04  1:14 ` [PATCH 05/34] x86, pkeys: define new CR4 bit Dave Hansen
2015-12-04  1:14 ` [PATCH 06/34] x86, pkeys: add PKRU xsave fields and data structure(s) Dave Hansen
2015-12-04  1:14 ` [PATCH 07/34] x86, pkeys: PTE bits for storing protection key Dave Hansen
2015-12-04  1:14 ` [PATCH 08/34] x86, pkeys: new page fault error code bit: PF_PK Dave Hansen
2015-12-04  1:14 ` [PATCH 09/34] x86, pkeys: store protection in high VMA flags Dave Hansen
2015-12-08 14:17   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 10/34] x86, pkeys: arch-specific protection bits Dave Hansen
2015-12-08 15:15   ` [PATCH 10/34] x86, pkeys: arch-specific protection bitsy Thomas Gleixner
2015-12-08 16:34     ` Dave Hansen
2015-12-08 17:24       ` Thomas Gleixner
2015-12-08 18:06         ` Dave Hansen
2015-12-08 18:29           ` Thomas Gleixner
2015-12-08 18:35             ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 11/34] x86, pkeys: pass VMA down in to fault signal generation code Dave Hansen
2015-12-04  1:14 ` [PATCH 12/34] signals, pkeys: notify userspace about protection key faults Dave Hansen
2015-12-04  1:14 ` [PATCH 13/34] x86, pkeys: fill in pkey field in siginfo Dave Hansen
2015-12-04  1:14 ` [PATCH 14/34] x86, pkeys: add functions to fetch PKRU Dave Hansen
2015-12-08 15:18   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 15/34] mm: factor out VMA fault permission checking Dave Hansen
2015-12-08 17:26   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 16/34] x86, mm: simplify get_user_pages() PTE bit handling Dave Hansen
2015-12-08 18:01   ` Thomas Gleixner
2015-12-08 18:30     ` Dave Hansen
2015-12-04  1:14 ` [PATCH 17/34] x86, pkeys: check VMAs and PTEs for protection keys Dave Hansen
2015-12-08 18:11   ` Thomas Gleixner
2015-12-04  1:14 ` Dave Hansen [this message]
2015-12-04  1:14 ` [PATCH 19/34] x86, pkeys: optimize fault handling in access_error() Dave Hansen
2015-12-08 18:14   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 20/34] x86, pkeys: differentiate instruction fetches Dave Hansen
2015-12-08 18:17   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 21/34] x86, pkeys: dump PKRU with other kernel registers Dave Hansen
2015-12-08 18:19   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 22/34] x86, pkeys: dump PTE pkey in /proc/pid/smaps Dave Hansen
2015-12-08 18:20   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 23/34] x86, pkeys: add Kconfig prompt to existing config option Dave Hansen
2015-12-08 18:21   ` Thomas Gleixner
2015-12-04  1:14 ` [PATCH 24/34] mm, multi-arch: pass a protection key in to calc_vm_flag_bits() Dave Hansen
2015-12-04  1:14 ` [PATCH 25/34] x86, pkeys: add arch_validate_pkey() Dave Hansen
2015-12-08 18:39   ` Thomas Gleixner
2015-12-04  1:15 ` [PATCH 26/34] mm: implement new mprotect_key() system call Dave Hansen
2015-12-05  6:50   ` Michael Kerrisk (man-pages)
2015-12-07 16:44     ` Dave Hansen
2015-12-09 11:08       ` Michael Kerrisk (man-pages)
2015-12-09 15:48         ` Dave Hansen
2015-12-09 16:45           ` Michael Kerrisk (man-pages)
2015-12-09 17:05             ` Dave Hansen
2015-12-11 20:13               ` Michael Kerrisk (man-pages)
2015-12-04  1:15 ` [PATCH 27/34] x86, pkeys: make mprotect_key() mask off additional vm_flags Dave Hansen
2015-12-08 18:41   ` Thomas Gleixner
2015-12-04  1:15 ` [PATCH 28/34] x86: wire up mprotect_key() system call Dave Hansen
2015-12-08 18:44   ` Thomas Gleixner
2015-12-08 19:06     ` Dave Hansen
2015-12-08 20:38       ` Thomas Gleixner
2015-12-04  1:15 ` [PATCH 29/34] x86: separate out LDT init from context init Dave Hansen
2015-12-08 18:45   ` Thomas Gleixner
2015-12-04  1:15 ` [PATCH 30/34] x86, fpu: allow setting of XSAVE state Dave Hansen
2015-12-08 18:48   ` Thomas Gleixner
2015-12-04  1:15 ` [PATCH 31/34] x86, pkeys: allocation/free syscalls Dave Hansen
2015-12-04  1:15 ` [PATCH 32/34] x86, pkeys: add pkey set/get syscalls Dave Hansen
2015-12-04  1:15 ` [PATCH 33/34] x86, pkeys: actually enable Memory Protection Keys in CPU Dave Hansen
2015-12-04  1:15 ` [PATCH 34/34] x86, pkeys: Documentation Dave Hansen
2015-12-04 23:31 ` [PATCH 00/34] x86: Memory Protection Keys (v5) Andy Lutomirski
2015-12-04 23:38   ` Dave Hansen
2015-12-11 20:16     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151204011449.768C774C@viggo.jf.intel.com \
    --to=dave@sr71.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox