From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80BBEC05027 for ; Mon, 23 Jan 2023 17:48:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16FBE6B007B; Mon, 23 Jan 2023 12:48:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11FDB6B0080; Mon, 23 Jan 2023 12:48:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F02306B0089; Mon, 23 Jan 2023 12:48:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E023A6B007B for ; Mon, 23 Jan 2023 12:48:47 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B30FE430AC for ; Mon, 23 Jan 2023 17:48:47 +0000 (UTC) X-FDA: 80386799094.05.14C139C Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf26.hostedemail.com (Postfix) with ESMTP id 99BCC140010 for ; Mon, 23 Jan 2023 17:48:45 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Lc7Qm68g; spf=pass (imf26.hostedemail.com: domain of toiwoton@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=toiwoton@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674496125; a=rsa-sha256; cv=none; b=wU+fhWPkqAj79mbuGTNd5WmUChpZb+5e02xVUA7MzghU5O0SaN0DYXC0CQm8FG19hFCs6R qvJKV0pyHHDgcBCOp291WijOSO5+IvSHdrjQ0Fm6Gn1VxxMpkF3J1Pasx3V5NEcuYD6ErJ JiKP2IUUMqtF4GV+XwsHbGrY9756Z4I= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Lc7Qm68g; spf=pass (imf26.hostedemail.com: domain of toiwoton@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=toiwoton@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674496125; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eDP5MhJamRnx89DUQvThFcrZxYZBmeDARbsADOsVCa0=; b=WM37HxiZVTw47pNqbqib4qiTfGC/XWMqCORvPIx+ZadtxuC+E3s+++UM1fg/Cpc9ta7UVD yqKubb65ul2mmkzCfbY6BpIBg1Jq2/8gLi/3orpUw0u1+5QQdV0GGPcOfaokJzdVBa4JjS xntkEjblVlH2oi2bWRy44GwBCbdL+Dw= Received: by mail-ej1-f44.google.com with SMTP id u19so32590310ejm.8 for ; Mon, 23 Jan 2023 09:48:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=eDP5MhJamRnx89DUQvThFcrZxYZBmeDARbsADOsVCa0=; b=Lc7Qm68gy2MwAMGSYPbIyX5UfQ//izoQPF6duUjNG2CB0n3m86dAxluJyF19rFpj8i HMI84X/LBJyKYWXDrk0sSpKDdH754PbXiLo28wGh7rTpGS41ZOOUJ3Av7fyiXTI3hrJW 3Vc89r2tsS7i5ArMSOGeFwL181VWAby1LpAV6w31Tked+FyYkSOnQpEBi5Oz/Gvnexrn 9iclSybSSbugBQl1R3lboxmm+8z3+TSkPSqjomRibvr75yHHb0rUJVy7YFz6y8bO6rIH mfe92kf3Zhw/MX3zRBjEhN6OTCl7p2VRkW+X+ubp3PsCuMg0lRviXWEi3xRQFmWph3G9 khPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eDP5MhJamRnx89DUQvThFcrZxYZBmeDARbsADOsVCa0=; b=TU2tgpoPANH4zdQwMEq1Rv6U3CFy2NJtRc5woKyx8tOVUQLTz6UQOBqXoAyMX6gzR5 4A9UKO3KviP2HYGKlidF0lEQkN9SpdT4th5FnYvuy04rwPOAm6UYYCzMVlvROJtJbXF0 1tQrZrFe/bRoOQy8gJk9DzrMpZ6vtzncv+5CZmBhAsZtQD73WSle6yKanJZu5xcksfO1 EQ5jaADACf1WYfxeZ+D2vmG6B0Eho6r752RCfXVDgTZB3b907XKSRbithuYAv9WPZK2u WHB5NONJN8ZXEhOrA9oStvCd4kRDYvTWIwjhplW8WQ47GIWSD9SuYyUklFIpbLquhznP oTIA== X-Gm-Message-State: AFqh2krPyRMfYqe7AXBNb1ft/Ime2qTx2liGUdJQdnfGW03ojCIy3QRn 0DSP2yzjTByoMtzfvhi73Ds= X-Google-Smtp-Source: AMrXdXu0a7GL0MA2yReD2CCilns47v/JXM32tQ74eEglC0fWTTL/p+Vq4xzELk371gic+jxcAX/o/w== X-Received: by 2002:a17:906:60d2:b0:872:6bd0:d2b with SMTP id f18-20020a17090660d200b008726bd00d2bmr24581957ejk.45.1674496123935; Mon, 23 Jan 2023 09:48:43 -0800 (PST) Received: from [192.168.1.17] (91-159-146-104.elisa-laajakaista.fi. [91.159.146.104]) by smtp.gmail.com with ESMTPSA id g9-20020a17090604c900b0085ca279966esm17416158eja.119.2023.01.23.09.48.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Jan 2023 09:48:43 -0800 (PST) Message-ID: <0ec67737-3c09-ba5c-f840-9ed02a0ea6bf@gmail.com> Date: Mon, 23 Jan 2023 19:48:40 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl To: Catalin Marinas , David Hildenbrand Cc: Joey Gouly , Andrew Morton , Lennart Poettering , =?UTF-8?Q?Zbigniew_J=c4=99drzejewski-Szmek?= , Alexander Viro , Kees Cook , Szabolcs Nagy , Mark Brown , Jeremy Linton , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-abi-devel@lists.sourceforge.net, nd@arm.com, shuah@kernel.org References: <20230119160344.54358-1-joey.gouly@arm.com> <20230119160344.54358-2-joey.gouly@arm.com> <4a1faf67-178e-c9ba-0db1-cf90408b0d7d@redhat.com> <8b4e31cf-de20-703c-4b53-ad86d4282a37@redhat.com> Content-Language: en-US From: Topi Miettinen In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 99BCC140010 X-Rspamd-Server: rspam01 X-Stat-Signature: 4jcbgeonkdqxn8u5uso1ewuoeyuxsry8 X-HE-Tag: 1674496125-624883 X-HE-Meta: U2FsdGVkX18wNmsdNn9bYmuxfieTe/j5/PlYvspQ4Cq9OFgKfdb8Ufvfh+Ia64YK+s4tR9AfU1ljt94qASYHDbM6LcPxw1V8fiitGB+Ocmc0aUM/DFgZuo4wikOUViyTP4U9jHIVngusvMeGXTyz7jqPGCNUHFwNmAz92MKxLmY8XWXO1S892KJOkjtyySMqW6P0WQOq9L+CUBXR6sAx7GdOcoINH/FQNhSE6fvedSVwqIUBsC06uf5gMGrnxYNgxQ7lTiZT37ugMvmT7APxMfVmni4tA5PpxfbdTrR7ROV/loFcHEK20csRXIlJOucgc8HRJQdea7uoV29eZEvilhhIVC0OkgL4oIVbWyH1Zi6sv2bzumQVRSMr3R/7NmSOhgxsEdTf9sMhQmfJ3hrroYz7fcxbjvmFrCb0C1CeVUQcq16dUl3Ua5tbOaa9XCaS+0LmuW4eQNm1o/k8S2jSh8SwGc6vZR7EUeVBrcoAjaH4j5+YL8SNrQela32zVZSOzvSf/8lkRR4tf3U+y2DgPWuQsNEvwBV2fdsZ2jlhROv/cML64bkwJhF59Dk9u37oXpLWrU75R48bkwsx3FIBm4G+qGs/AbA5yJctFxVK4JN7y66t9dYSBCgYbGVjruUlMLdidwB7KntggOjsCJ17ne4zESq9t0KISCupgTxJ/T0ETAhVRdL+FtoeAGh8UlDHO54PjiSGVoYKGPb+4mwB9qYTYWc2sJoo2sexmI1vzqE8UWOZhrqbSChgUmY4dFFLD37N8pDn3wWEBx1I7YOYhJu6ZTspUrEVO+KeKf+5S7gFyhJE5V5rJi2KbtvvELUaHGQ25lih2bDS453JojX9OooqMZb4Pi1qgdoDfgeZQ4cdHWXk2rOwnpi6V4MXBTLpQdyhZ23PApha5+5hQ10Id+xtWAelT8M2SRubNHyyykhnF8VgB+AJHTJQddJO0HF6Ptfi7bIQ7Pjlq6j5gcO oLZ1EzSP vYRme3iOjKwZL1BeL+9dl3foHA66sdPac7oXIUrRzmUTwRaUNm3n6ca0Y2b8yjMsFEwYH+IHIvBJpltzm87OIlZowwwD7IUipyAzC2fAKDpDXQ2I2qXVHzjT9LpIv0CRW/ZAfuYrYTeKIBDvKYlPOOExot7XlHy2UdimF5G/gp2bftT3EVq4mjxv7QmKRwm3R+exftlhAnmrZHhPCdi0JutYcy/SQAM1LrAIGZaBu1OoQjNT9wnPbSaseCQnx9+tbNdaBDpBPiYh0bWguUcQIw+iILHT10CGS991os1I3KHHv6X5biJWyKPR7AB7U9iBmk11lX25YOxITTyVx8YrfuhVSiJSbs3yL+Zrby2BIEoeiRsXGOUIWYvhc0YQld25yqQGXzzR35p0+rFSodRRNgq7fOqKsu73SdNpM37sDlrNKKwGj7NtESGukje/OnnJg8RhTkfToAYb6kF0ZKbalqKEXZAYDrOaKGcGu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 23.1.2023 18.04, Catalin Marinas wrote: > On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote: >> On 23.01.23 13:19, Catalin Marinas wrote: >>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote: >>>> On 19.01.23 17:03, Joey Gouly wrote: >>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h >>>>> index 58b3abd457a3..cee1e4b566d8 100644 >>>>> --- a/include/linux/mman.h >>>>> +++ b/include/linux/mman.h >>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags) >>>>> } >>>>> unsigned long vm_commit_limit(void); >>>>> + >>>>> +/* >>>>> + * Denies creating a writable executable mapping or gaining executable permissions. >>>>> + * >>>>> + * This denies the following: >>>>> + * >>>>> + * a) mmap(PROT_WRITE | PROT_EXEC) >>>>> + * >>>>> + * b) mmap(PROT_WRITE) >>>>> + * mprotect(PROT_EXEC) >>>>> + * >>>>> + * c) mmap(PROT_WRITE) >>>>> + * mprotect(PROT_READ) >>>>> + * mprotect(PROT_EXEC) >>>>> + * >>>>> + * But allows the following: >>>>> + * >>>>> + * d) mmap(PROT_READ | PROT_EXEC) >>>>> + * mmap(PROT_READ | PROT_EXEC | PROT_BTI) >>>>> + */ >>>> >>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC >>>> anymore? In an ideal world, there would be no further mprotect changes >>>> required. >>> >>> I don't think it works for this scenario. We don't want to disable >>> PROT_EXEC entirely, only disallow it if the mapping is not already >>> executable. The below should be allowed: >>> >>> addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); >>> mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI); >>> >>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap() >>> time. >> >> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow >> VM_EXEC of course). > > This should work but it doesn't fully mimic systemd's MDWE behaviour > (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only). > Topi wanted to stay close to that at least in the first incarnation of > this control (can be extended later). > >> But I guess we'd have to go one step further: if we allow exec access >> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course). > > Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful > but as additional controls a process can enable. > >> That at least would be then similar to how we handle mmaped files: if the >> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we >> clear VM_MAYWRITE. > > We still allow VM_MAYWRITE for private mappings, though we do clear > VM_MAYEXEC if not executable. > > It would be nice to use VM_MAY* flags for this logic but we can only > emulate MDWE if we change the semantics of 'MAY': only check the 'MAY' > flags for permissions being changed (e.g. allow PROT_EXEC if the vma is > already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up > with some weird combinations like having VM_EXEC without VM_MAYEXEC > (maybe that's fine). > >> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such >> memory would be forbidden, which might also be what we are trying to >> achieve, or is that expected to still work? > > I think currently with systemd's MDWE it still works (I haven't tried > though), unless there's something else forcing that file read-only. > >> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no >> longer fire for some VMAs, and we'd have to check if that's fine in >> all cases. > > This will break __access_remote_vm() AFAICT since it can't do a CoW on > read-only private mapping. > >> Having that said, this patch handles the case when the prctl is applied to a >> process after already having created some writable or executable mappings, >> to at least forbid if afterwards on these mappings. What is expected to >> happen if the process already has writable mappings that are executable at >> the time we enable the prctl? > > They are expected to continue to work. The prctl() is meant to be > invoked by something like systemd so that any subsequent exec() will > inherit the property. > >> Clarifying what the expected semantics with /proc/self/mem are would be >> nice. > > Yeah, this series doesn't handle this. Topi, do you know if systemd does > anything about /proc/self/mem? To me this option is more about catching > inadvertent write|exec mappings rather than blocking programs that > insist on doing this (they can always map a memfd file twice with > separate write and exec attributes for example). > I don't think so. For 100% compatibility with seccomp, the same cases of mprotect() use should be blocked regardless of the file descriptor used. There could be more relaxed PR_MDWE_* controls in the future if needed. Updated systemd PR: https://github.com/systemd/systemd/pull/25276 I wish there were highly granular access controls for /proc, including /proc/self and /proc/sys/*. Now the best options are to use mount namespaces and/or SELinux, but they aren't too good for that. -Topi