From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id 9BA8C83102 for ; Mon, 29 Aug 2016 11:25:23 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id o1so322641097qkd.3 for ; Mon, 29 Aug 2016 08:25:23 -0700 (PDT) Received: from mail-yw0-x234.google.com (mail-yw0-x234.google.com. [2607:f8b0:4002:c05::234]) by mx.google.com with ESMTPS id w192si10015508ywa.243.2016.08.29.08.25.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Aug 2016 08:25:22 -0700 (PDT) Received: by mail-yw0-x234.google.com with SMTP id u134so87944558ywg.3 for ; Mon, 29 Aug 2016 08:25:22 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20160826213227.GA11393@node.shutemov.name> References: <1472229004-9658-1-git-send-email-robert.foss@collabora.com> <20160826213227.GA11393@node.shutemov.name> From: Will Drewry Date: Mon, 29 Aug 2016 10:25:02 -0500 Message-ID: Subject: Re: [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint Content-Type: multipart/alternative; boundary=001a1141c57cfbafc5053b377829 Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: Robert Foss , Andrew Morton , kirill.shutemov@linux.intel.com, vbabka@suse.cz, mhocko@suse.com, mingo@kernel.org, dave.hansen@linux.intel.com, hannes@cmpxchg.org, dan.j.williams@intel.com, iamjoonsoo.kim@lge.com, acme@redhat.com, Kees Cook , mgorman@techsingularity.net, atomlin@redhat.com, Hugh Dickins , dyoung@redhat.com, Al Viro , Daniel Cashman , w@1wt.eu, idryomov@gmail.com, yang.shi@linaro.org, vkuznets@redhat.com, vdavydov@virtuozzo.com, vitalywool@gmail.com, oleg@redhat.com, gang.chen.5i5j@gmail.com, koct9i@gmail.com, aarcange@redhat.com, aryabinin@virtuozzo.com, kuleshovmail@gmail.com, minchan@kernel.org, mguzik@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ivan Krasin , Roland McGrath , Mandeep Singh Baines , Ben Zhang , Filipe Brandenburger --001a1141c57cfbafc5053b377829 Content-Type: text/plain; charset=UTF-8 On Fri, Aug 26, 2016 at 4:32 PM, Kirill A. Shutemov wrote: > On Fri, Aug 26, 2016 at 12:30:04PM -0400, robert.foss@collabora.com wrote: > > From: Will Drewry > > > > This patch proposes a sysctl knob that allows a privileged user to > > disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC > > mountpoint. It does not alter the normal behavior resulting from > > attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior > > of any other subsystems checking MNT_NOEXEC. > > Wouldn't it be equal to remounting all filesystems without noexec from > attacker POV? It's hardly a fence to make additional mprotect(PROT_EXEC) > call, before starting executing code from such filesystems. > > If administrator of the system wants this, he can just mount filesystem > without noexec, no new kernel code required. And it's more fine-grained > than this. > > So, no, I don't think we should add knob like this. Unless I miss > something. > I don't believe this patch is necessary anymore (though, thank you Robert for testing and re-sending!). The primary offenders wrt to needing to mmap/mprotect a file in /dev/shm was the older nvidia driver (binary only iirc) and the Chrome Native Client code. The reason why half-exec is an "ok" (half) mitigation is because it blocks simple gadgets and other paths for using loadable libraries or binaries (via glibc) as it disallows mmap(PROT_EXEC) even though it allows mprotect(PROT_EXEC). This stops ld in its tracks since it does the obvious thing and uses mmap(PROT_EXEC). I think time has marched on and this patch is now something I can toss in the dustbin of history. Both Chrome's Native Client and an older nvidia driver relied on creating-then-unlinking a file in tmpfs, but there is now a better facility! > NAK. > Agreed - this is old and software that predicated it should be gone.. I hope. :) > > > It is motivated by a common /dev/shm, /tmp usecase. There are few > > facilities for creating a shared memory segment that can be remapped in > > the same process address space with different permissions. > > What about using memfd_create(2) for such cases? You'll get a file > descriptor from in-kernel tmpfs (shm_mnt) which is not exposed to > userspace for remount as noexec. > This is a relatively old patch ( https://lwn.net/Articles/455256/ ) which predated memfd_create(). memfd_create() is the right solution to this problem! Thanks again! will --001a1141c57cfbafc5053b377829 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Fri, Aug 26, 2016 at 4:32 PM, Kirill A. Shutemov &= lt;kirill@shutemo= v.name> wrote:
On Fri, Aug 26, 2016 at= 12:30:04PM -0400, robert.foss@collabora.com wrote:
> From: Will Drewry <wad@chromium.org>
>
> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC > mountpoint.=C2=A0 It does not alter the normal behavior resulting from=
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior=
> of any other subsystems checking MNT_NOEXEC.

Wouldn't it be equal to remounting all filesystems without noexe= c from
attacker POV? It's hardly a fence to make additional mprotect(PROT_EXEC= )
call, before starting executing code from such filesystems.

If administrator of the system wants this, he can just mount filesystem
without noexec, no new kernel code required. And it's more fine-grained=
than this.

So, no, I don't think we should add knob like this. Unless I miss
something.

I don't believe this pat= ch is necessary anymore (though, thank you Robert for testing and re-sendin= g!).=C2=A0

The primary offenders wrt to needing to= mmap/mprotect a file in /dev/shm was the older nvidia
driver (bi= nary only iirc) and the Chrome Native Client code.

The reason why half-exec is an "ok" (half) mitigation is because= it blocks simple gadgets and other paths for using loadable libraries or b= inaries (via glibc) as it disallows mmap(PROT_EXEC) even though it allows m= protect(PROT_EXEC).=C2=A0 This stops ld in its tracks since it does the obv= ious thing and uses mmap(PROT_EXEC).

I think time = has marched on and this patch is now something I can toss in the dustbin of= history. Both Chrome's Native Client and an older nvidia driver relied= on creating-then-unlinking a file in tmpfs, but there is now a better faci= lity!
=C2=A0
NAK.

Agreed - this is old and software = that predicated it should be gone.. I hope. :)
=C2=A0

> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped i= n
> the same process address space with different permissions.

What about using memfd_create(2) for such cases? You'll get a fi= le
descriptor from in-kernel tmpfs (shm_mnt) which is not exposed to
userspace for remount as noexec.

This i= s a relatively old patch (=C2=A0https://lwn.net/Articles/455256/ ) which preda= ted memfd_create(). =C2=A0memfd_create() is the right solution to this prob= lem!


Thanks again!
will
--001a1141c57cfbafc5053b377829-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org