* Re: Fixing MIPS delay slot emulation weakness? [not found] <CALCETrWaWTupSp6V=XXhvExtFdS6ewx_0A7hiGfStqpeuqZn8g@mail.gmail.com> @ 2018-12-19 4:32 ` Paul Burton 2018-12-19 21:12 ` Hugh Dickins 0 siblings, 1 reply; 3+ messages in thread From: Paul Burton @ 2018-12-19 4:32 UTC (permalink / raw) To: Andy Lutomirski, Hugh Dickins, linux-mm Cc: Linux MIPS Mailing List, LKML, David Daney, Ralf Baechle, James Hogan, Rich Felker Hello, On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > The really simple but possibly suboptimal fix is to get rid of > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. I actually wound up trying this route because it seemed like it would produce a nice small patch that would be simple to backport, and we could clean up mainline afterwards. Unfortunately though things fail because get_user_pages() returns -EFAULT for the delay slot emulation page, due to the !is_cow_mapping() check in check_vma_flags(). This was introduced by commit cda540ace6a1 ("mm: get_user_pages(write,force) refuse to COW in shared areas"). I'm a little confused as to its behaviour... is_cow_mapping() returns true if the VM_MAYWRITE flag is set and VM_SHARED is not set - this suggests a private & potentially-writable area, right? That fits in nicely with an area we'd want to COW. Why then does check_vma_flags() use the inverse of this to indicate a shared area? This fails if we have a private mapping where VM_MAYWRITE is not set, but where FOLL_FORCE would otherwise provide a means of writing to the memory. If I remove this check in check_vma_flags() then I have a nice simple patch which seems to work well, leaving the user mapping of the delay slot emulation page non-writeable. I'm not sure I'm following the mm innards here though - is there something I should change about the delay slot page instead? Should I be marking it shared, even though it isn't really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should set that - would that allow a user to use mprotect() to make the region writeable..? The work-in-progress patch can be seen below if it's helpful (and yes, I realise that the modified condition in check_vma_flags() became impossible & that removing it would be equivalent). Or perhaps this is only confusing because it's 4:25am & I'm massively jetlagged... :) > A possibly nicer way to accomplish more or less the same thing would > be to allocate the area with _install_special_mapping() and arrange to > keep a reference to the struct page around. I looked at this, but it ends up being a much bigger patch. Perhaps it could be something to look into as a follow-on cleanup, though it complicates things a little because we need to actually allocate the page, preferrably only on demand, which is handled for us with the current mmap_region() code. Thanks, Paul --- diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 48a9c6b90e07..9476efb54d18 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) /* Map delay slot emulation page */ base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, - VM_READ|VM_WRITE|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC, 0, NULL); if (IS_ERR_VALUE(base)) { ret = base; diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c index 5450f4d1c920..3aa8e3b90efb 100644 --- a/arch/mips/math-emu/dsemul.c +++ b/arch/mips/math-emu/dsemul.c @@ -67,11 +67,6 @@ struct emuframe { static const int emupage_frame_count = PAGE_SIZE / sizeof(struct emuframe); -static inline __user struct emuframe *dsemul_page(void) -{ - return (__user struct emuframe *)STACK_TOP; -} - static int alloc_emuframe(void) { mm_context_t *mm_ctx = ¤t->mm->context; @@ -139,7 +134,7 @@ static void free_emuframe(int idx, struct mm_struct *mm) static bool within_emuframe(struct pt_regs *regs) { - unsigned long base = (unsigned long)dsemul_page(); + unsigned long base = STACK_TOP; if (regs->cp0_epc < base) return false; @@ -172,8 +167,8 @@ bool dsemul_thread_cleanup(struct task_struct *tsk) bool dsemul_thread_rollback(struct pt_regs *regs) { - struct emuframe __user *fr; - int fr_idx; + struct emuframe fr; + int fr_idx, ret; /* Do nothing if we're not executing from a frame */ if (!within_emuframe(regs)) @@ -183,7 +178,12 @@ bool dsemul_thread_rollback(struct pt_regs *regs) fr_idx = atomic_read(¤t->thread.bd_emu_frame); if (fr_idx == BD_EMUFRAME_NONE) return false; - fr = &dsemul_page()[fr_idx]; + + ret = access_process_vm(current, + STACK_TOP + (fr_idx * sizeof(fr)), + &fr, sizeof(fr), FOLL_FORCE); + if (WARN_ON(ret != sizeof(fr))) + return false; /* * If the PC is at the emul instruction, roll back to the branch. If @@ -192,9 +192,9 @@ bool dsemul_thread_rollback(struct pt_regs *regs) * then something is amiss & the user has branched into some other area * of the emupage - we'll free the allocated frame anyway. */ - if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->emul) + if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.emul) regs->cp0_epc = current->thread.bd_emu_branch_pc; - else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->badinst) + else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.badinst) regs->cp0_epc = current->thread.bd_emu_cont_pc; atomic_set(¤t->thread.bd_emu_frame, BD_EMUFRAME_NONE); @@ -214,8 +214,8 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, { int isa16 = get_isa16_mode(regs->cp0_epc); mips_instruction break_math; - struct emuframe __user *fr; - int err, fr_idx; + struct emuframe fr; + int fr_idx, ret; /* NOP is easy */ if (ir == 0) @@ -250,27 +250,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, fr_idx = alloc_emuframe(); if (fr_idx == BD_EMUFRAME_NONE) return SIGBUS; - fr = &dsemul_page()[fr_idx]; /* Retrieve the appropriately encoded break instruction */ break_math = BREAK_MATH(isa16); /* Write the instructions to the frame */ if (isa16) { - err = __put_user(ir >> 16, - (u16 __user *)(&fr->emul)); - err |= __put_user(ir & 0xffff, - (u16 __user *)((long)(&fr->emul) + 2)); - err |= __put_user(break_math >> 16, - (u16 __user *)(&fr->badinst)); - err |= __put_user(break_math & 0xffff, - (u16 __user *)((long)(&fr->badinst) + 2)); + union mips_instruction _emul = { + .halfword = { ir >> 16, ir } + }; + union mips_instruction _badinst = { + .halfword = { break_math >> 16, break_math } + }; + + fr.emul = _emul.word; + fr.badinst = _badinst.word; } else { - err = __put_user(ir, &fr->emul); - err |= __put_user(break_math, &fr->badinst); + fr.emul = ir; + fr.badinst = break_math; } - if (unlikely(err)) { + /* Write the frame to user memory */ + ret = access_process_vm(current, + STACK_TOP + (fr_idx * sizeof(fr)), + &fr, sizeof(fr), FOLL_FORCE | FOLL_WRITE); + if (WARN_ON(ret != sizeof(fr))) { MIPS_FPU_EMU_INC_STATS(errors); free_emuframe(fr_idx, current->mm); return SIGBUS; @@ -282,10 +286,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, atomic_set(¤t->thread.bd_emu_frame, fr_idx); /* Change user register context to execute the frame */ - regs->cp0_epc = (unsigned long)&fr->emul | isa16; - - /* Ensure the icache observes our newly written frame */ - flush_cache_sigtramp((unsigned long)&fr->emul); + regs->cp0_epc = (unsigned long)&fr.emul | isa16; return 0; } diff --git a/mm/gup.c b/mm/gup.c index f76e77a2d34b..9a1bc941dcb9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -587,7 +587,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * Anon pages in shared mappings are surprising: now * just reject it. */ - if (!is_cow_mapping(vm_flags)) + if ((vm_flags & VM_SHARED) && !is_cow_mapping(vm_flags)) return -EFAULT; } } else if (!(vm_flags & VM_READ)) { ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-19 4:32 ` Fixing MIPS delay slot emulation weakness? Paul Burton @ 2018-12-19 21:12 ` Hugh Dickins 2018-12-20 17:56 ` Paul Burton 0 siblings, 1 reply; 3+ messages in thread From: Hugh Dickins @ 2018-12-19 21:12 UTC (permalink / raw) To: Paul Burton Cc: Andy Lutomirski, Hugh Dickins, linux-mm, Linux MIPS Mailing List, LKML, David Daney, Ralf Baechle, James Hogan, Rich Felker On Wed, 19 Dec 2018, Paul Burton wrote: > On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > > The really simple but possibly suboptimal fix is to get rid of > > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. > > I actually wound up trying this route because it seemed like it would > produce a nice small patch that would be simple to backport, and we > could clean up mainline afterwards. > > Unfortunately though things fail because get_user_pages() returns > -EFAULT for the delay slot emulation page, due to the !is_cow_mapping() > check in check_vma_flags(). This was introduced by commit cda540ace6a1 > ("mm: get_user_pages(write,force) refuse to COW in shared areas"). I'm a > little confused as to its behaviour... > > is_cow_mapping() returns true if the VM_MAYWRITE flag is set and > VM_SHARED is not set - this suggests a private & potentially-writable > area, right? That fits in nicely with an area we'd want to COW. Why then > does check_vma_flags() use the inverse of this to indicate a shared > area? This fails if we have a private mapping where VM_MAYWRITE is not > set, but where FOLL_FORCE would otherwise provide a means of writing to > the memory. > > If I remove this check in check_vma_flags() then I have a nice simple > patch which seems to work well, leaving the user mapping of the delay > slot emulation page non-writeable. I'm not sure I'm following the mm > innards here though - is there something I should change about the delay > slot page instead? Should I be marking it shared, even though it isn't > really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should > set that - would that allow a user to use mprotect() to make the region > writeable..? Exactly, in that last sentence above you come to the right understanding of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think your issue in setting up the mmap, is that you're (rightly) doing it with VM_flags to mmap_region(), but giving it a combination of flags that an mmap() syscall from userspace would never arrive at, so does not match expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c: you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then removing it just from the case of a MAP_SHARED without FMODE_WRITE. > > The work-in-progress patch can be seen below if it's helpful (and yes, I > realise that the modified condition in check_vma_flags() became > impossible & that removing it would be equivalent). > > Or perhaps this is only confusing because it's 4:25am & I'm massively > jetlagged... :) > > > A possibly nicer way to accomplish more or less the same thing would > > be to allocate the area with _install_special_mapping() and arrange to > > keep a reference to the struct page around. > > I looked at this, but it ends up being a much bigger patch. Perhaps it > could be something to look into as a follow-on cleanup, though it > complicates things a little because we need to actually allocate the > page, preferrably only on demand, which is handled for us with the > current mmap_region() code. > > Thanks, > Paul > > --- > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c > index 48a9c6b90e07..9476efb54d18 100644 > --- a/arch/mips/kernel/vdso.c > +++ b/arch/mips/kernel/vdso.c > @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) > > /* Map delay slot emulation page */ > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, > - VM_READ|VM_WRITE|VM_EXEC| > - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, > + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC, So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE. > 0, NULL); > if (IS_ERR_VALUE(base)) { > ret = base; > diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c > index 5450f4d1c920..3aa8e3b90efb 100644 > --- a/arch/mips/math-emu/dsemul.c > +++ b/arch/mips/math-emu/dsemul.c > @@ -67,11 +67,6 @@ struct emuframe { > > static const int emupage_frame_count = PAGE_SIZE / sizeof(struct emuframe); > > -static inline __user struct emuframe *dsemul_page(void) > -{ > - return (__user struct emuframe *)STACK_TOP; > -} > - > static int alloc_emuframe(void) > { > mm_context_t *mm_ctx = ¤t->mm->context; > @@ -139,7 +134,7 @@ static void free_emuframe(int idx, struct mm_struct *mm) > > static bool within_emuframe(struct pt_regs *regs) > { > - unsigned long base = (unsigned long)dsemul_page(); > + unsigned long base = STACK_TOP; > > if (regs->cp0_epc < base) > return false; > @@ -172,8 +167,8 @@ bool dsemul_thread_cleanup(struct task_struct *tsk) > > bool dsemul_thread_rollback(struct pt_regs *regs) > { > - struct emuframe __user *fr; > - int fr_idx; > + struct emuframe fr; > + int fr_idx, ret; > > /* Do nothing if we're not executing from a frame */ > if (!within_emuframe(regs)) > @@ -183,7 +178,12 @@ bool dsemul_thread_rollback(struct pt_regs *regs) > fr_idx = atomic_read(¤t->thread.bd_emu_frame); > if (fr_idx == BD_EMUFRAME_NONE) > return false; > - fr = &dsemul_page()[fr_idx]; > + > + ret = access_process_vm(current, > + STACK_TOP + (fr_idx * sizeof(fr)), > + &fr, sizeof(fr), FOLL_FORCE); > + if (WARN_ON(ret != sizeof(fr))) > + return false; > > /* > * If the PC is at the emul instruction, roll back to the branch. If > @@ -192,9 +192,9 @@ bool dsemul_thread_rollback(struct pt_regs *regs) > * then something is amiss & the user has branched into some other area > * of the emupage - we'll free the allocated frame anyway. > */ > - if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->emul) > + if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.emul) > regs->cp0_epc = current->thread.bd_emu_branch_pc; > - else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->badinst) > + else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.badinst) > regs->cp0_epc = current->thread.bd_emu_cont_pc; > > atomic_set(¤t->thread.bd_emu_frame, BD_EMUFRAME_NONE); > @@ -214,8 +214,8 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, > { > int isa16 = get_isa16_mode(regs->cp0_epc); > mips_instruction break_math; > - struct emuframe __user *fr; > - int err, fr_idx; > + struct emuframe fr; > + int fr_idx, ret; > > /* NOP is easy */ > if (ir == 0) > @@ -250,27 +250,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, > fr_idx = alloc_emuframe(); > if (fr_idx == BD_EMUFRAME_NONE) > return SIGBUS; > - fr = &dsemul_page()[fr_idx]; > > /* Retrieve the appropriately encoded break instruction */ > break_math = BREAK_MATH(isa16); > > /* Write the instructions to the frame */ > if (isa16) { > - err = __put_user(ir >> 16, > - (u16 __user *)(&fr->emul)); > - err |= __put_user(ir & 0xffff, > - (u16 __user *)((long)(&fr->emul) + 2)); > - err |= __put_user(break_math >> 16, > - (u16 __user *)(&fr->badinst)); > - err |= __put_user(break_math & 0xffff, > - (u16 __user *)((long)(&fr->badinst) + 2)); > + union mips_instruction _emul = { > + .halfword = { ir >> 16, ir } > + }; > + union mips_instruction _badinst = { > + .halfword = { break_math >> 16, break_math } > + }; > + > + fr.emul = _emul.word; > + fr.badinst = _badinst.word; > } else { > - err = __put_user(ir, &fr->emul); > - err |= __put_user(break_math, &fr->badinst); > + fr.emul = ir; > + fr.badinst = break_math; > } > > - if (unlikely(err)) { > + /* Write the frame to user memory */ > + ret = access_process_vm(current, > + STACK_TOP + (fr_idx * sizeof(fr)), > + &fr, sizeof(fr), FOLL_FORCE | FOLL_WRITE); > + if (WARN_ON(ret != sizeof(fr))) { > MIPS_FPU_EMU_INC_STATS(errors); > free_emuframe(fr_idx, current->mm); > return SIGBUS; > @@ -282,10 +286,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, > atomic_set(¤t->thread.bd_emu_frame, fr_idx); > > /* Change user register context to execute the frame */ > - regs->cp0_epc = (unsigned long)&fr->emul | isa16; > - > - /* Ensure the icache observes our newly written frame */ > - flush_cache_sigtramp((unsigned long)&fr->emul); > + regs->cp0_epc = (unsigned long)&fr.emul | isa16; > > return 0; > } > diff --git a/mm/gup.c b/mm/gup.c > index f76e77a2d34b..9a1bc941dcb9 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -587,7 +587,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) > * Anon pages in shared mappings are surprising: now > * just reject it. > */ > - if (!is_cow_mapping(vm_flags)) > + if ((vm_flags & VM_SHARED) && !is_cow_mapping(vm_flags)) Then please drop this patch to mm/gup.c: does the result then work for you? (I won't pretend to have reviewed the rest of the patch.) Hugh > return -EFAULT; > } > } else if (!(vm_flags & VM_READ)) { ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-19 21:12 ` Hugh Dickins @ 2018-12-20 17:56 ` Paul Burton 0 siblings, 0 replies; 3+ messages in thread From: Paul Burton @ 2018-12-20 17:56 UTC (permalink / raw) To: Hugh Dickins Cc: Andy Lutomirski, linux-mm, Linux MIPS Mailing List, LKML, David Daney, Ralf Baechle, James Hogan, Rich Felker Hi Hugh, On Wed, Dec 19, 2018 at 01:12:58PM -0800, Hugh Dickins wrote: > > is_cow_mapping() returns true if the VM_MAYWRITE flag is set and > > VM_SHARED is not set - this suggests a private & potentially-writable > > area, right? That fits in nicely with an area we'd want to COW. Why then > > does check_vma_flags() use the inverse of this to indicate a shared > > area? This fails if we have a private mapping where VM_MAYWRITE is not > > set, but where FOLL_FORCE would otherwise provide a means of writing to > > the memory. > > > > If I remove this check in check_vma_flags() then I have a nice simple > > patch which seems to work well, leaving the user mapping of the delay > > slot emulation page non-writeable. I'm not sure I'm following the mm > > innards here though - is there something I should change about the delay > > slot page instead? Should I be marking it shared, even though it isn't > > really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should > > set that - would that allow a user to use mprotect() to make the region > > writeable..? > > Exactly, in that last sentence above you come to the right understanding > of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think > your issue in setting up the mmap, is that you're (rightly) doing it with > VM_flags to mmap_region(), but giving it a combination of flags that an > mmap() syscall from userspace would never arrive at, so does not match > expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c: > you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then > removing it just from the case of a MAP_SHARED without FMODE_WRITE. > > > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c > > index 48a9c6b90e07..9476efb54d18 100644 > > --- a/arch/mips/kernel/vdso.c > > +++ b/arch/mips/kernel/vdso.c > > @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) > > > > /* Map delay slot emulation page */ > > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, > > - VM_READ|VM_WRITE|VM_EXEC| > > - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, > > + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC, > > So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE. Thanks Hugh - it works fine when I leave in VM_MAYWRITE. My 4am self had become convinced that it would be problematic if a user program could mprotect() the memory & make it writable... But in reality if a user program wants to do that then by all means, the kernel isn't going to be able to prevent it doing silly things. For anyone looking for the outcome, the patch I wound up with is here: https://lore.kernel.org/linux-mips/20181220174514.24953-1-paul.burton@mips.com/ Thanks, Paul ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-12-20 17:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CALCETrWaWTupSp6V=XXhvExtFdS6ewx_0A7hiGfStqpeuqZn8g@mail.gmail.com>
2018-12-19 4:32 ` Fixing MIPS delay slot emulation weakness? Paul Burton
2018-12-19 21:12 ` Hugh Dickins
2018-12-20 17:56 ` Paul Burton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox