From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EC5FCF6491 for ; Sat, 28 Sep 2024 13:43:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 098BD6B01F5; Sat, 28 Sep 2024 09:43:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 070816B01F6; Sat, 28 Sep 2024 09:43:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E527B6B01F7; Sat, 28 Sep 2024 09:43:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C89A96B01F5 for ; Sat, 28 Sep 2024 09:43:40 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3AC85411EA for ; Sat, 28 Sep 2024 13:43:40 +0000 (UTC) X-FDA: 82614264600.19.933D1ED Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf13.hostedemail.com (Postfix) with ESMTP id 4307520004 for ; Sat, 28 Sep 2024 13:43:38 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KK+oQBak; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727530881; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3s38JrVlJwg1WQYaaoheQOJoYWoLobns9br/Lt2jT7o=; b=nXDOMUBjWWIZzcLg/fUDu6bUySaPiaoz1w20leCv0+yvJhUl0YtEhPdNEkc6uqf5On92ny S2O8CI8WL7olnV5QBveHNtIfUZU/iF8O5gncjbZWUqOTaviKNGEf8aI3IeoNJfqk7S7DMJ LCc9DmedDu97zAz5OI3gTSooZbaDO1A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727530881; a=rsa-sha256; cv=none; b=rZZiH1sbaxIHJGiJUswXle3jbdiOL4R3vSuppQBGerynb3rN7nm40KwDGSGLTAao7hfDMY dCJiqbY6ugwV4ZAydl1bhB7t7vw4bOEUL8U19GLvldkdqprepPWpwqdH0HtN9iK2i3t2bF 0LUmCBEAlRd6WSJgvHt47zCfPJSQcbU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KK+oQBak; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-42cb806623eso24185135e9.2 for ; Sat, 28 Sep 2024 06:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727531017; x=1728135817; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=3s38JrVlJwg1WQYaaoheQOJoYWoLobns9br/Lt2jT7o=; b=KK+oQBak1BijroOoTclVfZt5wgNEejhi+44CbbpxgxN6k+QVVoBV9cZe5Sy14cyggZ DpYtX/rH1N1LOsfczwMb+GzRNVixA68mNnFbE2g2Bv+OWx7AQED280IuwmhSp7xhVeEs VCKzriQg5rCqfVCCFTKkaTBk0QIMk7EgRaY5KR5tmD8FV9zJHpYbSd/fO55aPdyn5icc DQuI25A83RzEHLwGjDEdn5lI4ToXfkAIimhS/hPLp/bMTjOOR9uKMDPs69hgDpoJX9Hp mOk1t0U0fNWrUkhnckLGWfHoP7+pOb38lW7gQHPSMOwUI0W7Xe2IxYj2wHLlPrJLEa5j PHWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727531017; x=1728135817; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3s38JrVlJwg1WQYaaoheQOJoYWoLobns9br/Lt2jT7o=; b=SnLqswI082VuVzY1LvKj7/UqH7DaqMoAcmXA61O/AewnrDolVZLrDoUtSI0s1WqByl RfZ7S+rugG32v+txSOj3E6AxnRSZVJQDiJPViUnDDaxlke4g4KGlzX9fn2pYBtQlQsat 4kiv6EzIZa54Jjx3P5c087nnjoCAimJ5yG9HExFkMIEotTIaVgm9K0dAKomlO53iT0Bz A5IdUlJgLUsoz1rcAOGtS49eHW6KO4g6ac9xTnA4pWNYgdC8v+zjida6jj+4lq8uif7U Bo9pGaJz/YgQBA31+yTY81ELHCEXUtvoroPR4HGGkuD8vS2NUJ3gWyamQaZh8A8mPDjg 5bFw== X-Forwarded-Encrypted: i=1; AJvYcCWCkSLGlbFFbF8Epzgd3Mmq97qrSw56CvHMfACyIbTtwOIRMdZGR+ih3ezPgPorqmoE3Xn2J8I8Kw==@kvack.org X-Gm-Message-State: AOJu0YxXbfWsgi7b3YziTHZq7mZIteeOIQ36uLujOIri8iC2fZV+JmF6 SUs3RIRXeyU59pwxzsInuNXaLTysEu/enedKAIxXQ0RzUxzg8CqS X-Google-Smtp-Source: AGHT+IFGUKO5013Sb8JyGdP0f2V1TKX6TsoTcMpl4IW5QI8cJKrtpoajVcXdij2ZYEOxA9/r2P3PyA== X-Received: by 2002:a05:600c:4f96:b0:42c:b22e:fc23 with SMTP id 5b1f17b1804b1-42f58439301mr48708085e9.15.1727531016260; Sat, 28 Sep 2024 06:43:36 -0700 (PDT) Received: from PC-PEDRO-ARCH ([2001:818:e92f:6400:a118:25f3:b27f:9f34]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37cd5730fc6sm4875210f8f.76.2024.09.28.06.43.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 Sep 2024 06:43:35 -0700 (PDT) Date: Sat, 28 Sep 2024 14:43:33 +0100 From: Pedro Falcato To: Jeff Xu Cc: akpm@linux-foundation.org, keescook@chromium.org, corbet@lwn.net, jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, sroettger@google.com, linux-hardening@vger.kernel.org, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org, deraadt@openbsd.org, usama.anjum@collabora.com, surenb@google.com, merimus@google.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, enh@google.com Subject: Re: [PATCH v1 1/1] mseal: update mseal.rst Message-ID: <2q6hzkvep2g3z6m2jrwbw2j3sbydf6tgj2obwd6hgmm7xzgsg3@ddr5ghmsia5k> References: <20240927185211.729207-1-jeffxu@chromium.org> <20240927185211.729207-2-jeffxu@chromium.org> <2vkppisejac42wnawjkd7qzyybuycu667yxwmsd4pfk5rwhiqc@gszyo5lu24ge> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4307520004 X-Stat-Signature: 8u1dhu9db1gzhqremmex4iqu46pi8wgm X-Rspam-User: X-HE-Tag: 1727531018-494229 X-HE-Meta: U2FsdGVkX1+7fI6rQWWW+WE62UKyCOE4LNsJK66MVoHubh+WwctfFW3e97MBUOUaDLngv6rIVIHF0C8F5IvhKt5o0LbuUxt9yooGThxMlr7f0n0FUU8xWHV3Rlp5vJtUy5wHDQOV3OqE0LpbnIBoRSYyQ1ZozzegV0RDGd0qcce1q0Knw2+cbzwL9RfTH7+Dkdtum9mXkrcJCrEDus+ZPpjOja2I0riEYqevtmjmHvtoi6FdYDEKaQuqFoMclqeiW6Il/WeDBIFSeX3ScocwW0VMmVWxNs0bMl8Lx3rUGCd+c8TMMkS3UvGfy0t/V8YtnDgmB0dA5Rt2/2Arh7u5NBdwi/xYBLqUOe3AEPqOtLJZV7CepqqxBMslbPXWBWZgz+bD7zemeCXe2On7m78f0s/hQSvR6tsDHCv+LNyJQgNNDCle5DnBzU7iQRfPH+8rA6Ape4x3bkTzgChT3OYcPzEW2bk7LCe8DkvgNoDYKxo9eNYRApw4sqlsMprI1i95gbpd3Oa6og4j59yH+LNBCOj6ZifXBnEosJ28I197q53LY1AzlwV8ndx4L9isQ0K082mvPpBo3ddQv2NlzOShtsiqb2p43uzKMftNB9I2xZJs3ey4PPp7KnqR3koQ+RAf5vo85EIBSC2r6ZuqzZAYwfhkcccOfNuFvm80fXkG0I2n2BFXpnLXQvY5HffuQEbqZAmxnDcyQUwKvdU3g/xmkuh1FHZKMoTjFYdGDuPU3peLtQ2tT1XpairG8/A9MSa3zUY6/C1cfi9qgLq2OdIQUkzDfJRWdhKL9vipDLPT5R5/zwXxJJO3+LDB4sOhsNsdOjI77EDP8gOgMVxe14mHNZXVHM062TAnD6D8UI2pWx6gD6Cg5myymEY6Lq+B7Ee+TfS6mW0l6uLwDEptor4btcwrVExRb0FbpundJc8Qi6aGCOMXXMGmtURRM3xSXWg6hZ09SLSiBXfilbwXjXg 7a3u7OVX 7Z/PM6E8fgTDM8nOV2NeLbPSCHVV//exmp2IR2gZZwnWRn5g3AuSYy94584y+P9pi7M2R/qBFFS6UA2hO4LPYWwzvwBn9ASqSNIvQncP4Gwwq0pyplqjNa2p5RsUBiSH+ERbjXoReGw68zPn9F83iyUuh8TpNTmRDmoxvbWJ72UY4oCcmamoM19lq9WFj3DplalB5wqAXd/qwQs9O6DPhCdbiuyrlWmCmR/h9vEsYxSGepbmnouadSSEurax4BxHkNZKIDeFXNrKcvIWyXuR6J1obHVMWYkWdAtTYs7D1SgGZpMukAB6aOuiyOQ+04YwGydnIP+DxsSZH0MDGSVjdFAper/zLkxaw2hDTdmM38scil15TDzs92LgQ6Z1DHQAMfswqXjb8qSy05QsSY6ces24rAnqSSV++AuPNcU26pm4/ZItzrcK0AuQUvsjh7y4JeVlNx78Yoatr2o2YtNAMBKGAP2Bl3O9tDyS20p8CUN+R2nc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 27, 2024 at 06:29:30PM GMT, Jeff Xu wrote: > Hi Pedro, > > On Fri, Sep 27, 2024 at 3:59 PM Pedro Falcato wrote: > > > + > > > + Blocked mm syscall: > > > + - munmap > > > + - mmap > > > + - mremap > > > + - mprotect and pkey_mprotect > > > + - some destructive madvise behaviors: MADV_DONTNEED, MADV_FREE, > > > + MADV_DONTNEED_LOCKED, MADV_FREE, MADV_DONTFORK, MADV_WIPEONFORK > > > + > > > + The first set of syscall to block is munmap, mremap, mmap. They can > > > + either leave an empty space in the address space, therefore allow > > > + replacement with a new mapping with new set of attributes, or can > > > + overwrite the existing mapping with another mapping. > > > + > > > + mprotect and pkey_mprotect are blocked because they changes the > > change > > > + protection bits (rwx) of the mapping. > > > + > > > + Some destructive madvice behaviors (MADV_DONTNEED, MADV_FREE, > > > + MADV_DONTNEED_LOCKED, MADV_FREE, MADV_DONTFORK, MADV_WIPEONFORK) > > > + for anonymous memory, when users don't have write permission to the > > > + memory. Those behaviors can alter region contents by discarding pages, > > > + effectively a memset(0) for anonymous memory. > > > > What's the difference between anonymous memory and MAP_PRIVATE | MAP_FILE? > > > MAP_FILE seems not used ? > anonymous mapping is the mapping that is not backed by a file. MAP_FILE is actually defined as 0 usually :) But I meant file-backed private mappings. > > The feature now, as is (as far as I understand!) will allow you to do things like MADV_DONTNEED > > on a read-only file mapping. e.g .text. This is obviously wrong? > > > When a MADV_DONTNEED is called, pages will be freed, on file-backed > mapping, if the process reads from the mapping again, the content > will be retrieved from the file. > Sorry, it was late and I gave you a crap example. Consider this: a file-backed MAP_PRIVATE vma is marked RW. I write to it, then RO-it + mseal. The attacker later gets me to MADV_DONTNEED that VMA. You've just lost data. The big problem here is with anon _pages_, not anon vmas. > For anonymous mapping, since there is no file backup, if process > reads from the mapping, 0 is filled, hence equivalent to memset(0) > > > > + > > > + Kernel will return -EPERM for blocked syscalls. > > > + > > > + When blocked syscall return -EPERM due to sealing, the memory regions may or may not be changed, depends on the syscall being blocked: > > > + - munmap: munmap is atomic. If one of VMAs in the given range is > > > + sealed, none of VMAs are updated. > > > + - mprotect, pkey_mprotect, madvise: partial update might happen, e.g. > > > + when mprotect over multiple VMAs, mprotect might update the beginning > > > + VMAs before reaching the sealed VMA and return -EPERM. > > > + - mmap and mremap: undefined behavior. > > > > mmap and mremap are actually not undefined as they use munmap semantics for their unmapping. > > Whether this is something we'd want to document, I don't know honestly (nor do I think is ever written down in POSIX?) > > > I'm not sure if I can declare mmap/mremap as atomic. > > Although, it might be possible to achieve this due to munmap being > atomic. I'm not sure as I didn't test this. Would you like to find > out ? I just told you they use munmap under the hood. It's just that the requirement isn't actually written down anywhere. > > > > > > > Use cases: > > > ========== > > > - glibc: > > > The dynamic linker, during loading ELF executables, can apply sealing to > > > - non-writable memory segments. > > > + mapping segments. > > > > > > - Chrome browser: protect some security sensitive data-structures. > > > > > > -Notes on which memory to seal: > > > -============================== > > > - > > > -It might be important to note that sealing changes the lifetime of a mapping, > > > -i.e. the sealed mapping won’t be unmapped till the process terminates or the > > > -exec system call is invoked. Applications can apply sealing to any virtual > > > -memory region from userspace, but it is crucial to thoroughly analyze the > > > -mapping's lifetime prior to apply the sealing. > > > +Don't use mseal on: > > > +=================== > > > +Applications can apply sealing to any virtual memory region from userspace, > > > +but it is *crucial to thoroughly analyze the mapping's lifetime* prior to > > > +apply the sealing. This is because the sealed mapping *won’t be unmapped* > > > +till the process terminates or the exec system call is invoked. > > > > There should probably be a nice disclaimer as to how most people don't need this or shouldn't use this. > > At least in its current form. > > > Ya, the mseal is not for most apps. I mention the malloc example to stress that. > > > > > > - > > > - > > > -Additional notes: > > > -================= > > > As Jann Horn pointed out in [3], there are still a few ways to write > > > -to RO memory, which is, in a way, by design. Those cases are not covered > > > -by mseal(). If applications want to block such cases, sandbox tools (such as > > > -seccomp, LSM, etc) might be considered. > > > +to RO memory, which is, in a way, by design. And those could be blocked > > > +by different security measures. > > > > > > Those cases are: > > > - > > > -- Write to read-only memory through /proc/self/mem interface. > > > -- Write to read-only memory through ptrace (such as PTRACE_POKETEXT). > > > -- userfaultfd. > > > + - Write to read-only memory through /proc/self/mem interface (FOLL_FORCE). > > > + - Write to read-only memory through ptrace (such as PTRACE_POKETEXT). > > > + - userfaultfd. > > > > I don't understand how this is not a problem, but MADV_DONTNEED is. > > To me it seems that what we have now is completely useless, because you can trivially > > bypass it using /proc/self/mem, which is enabled on most Linux systems. > > > > Before you mention ChromeOS or Chrome, I don't care. Kernel features aren't designed > > for Chrome. They need to work with every other distro and application as well. > > > > It seems to me that the most sensible change is blocking/somehow distinguishing between /proc/self/mem and > > /proc//mem (some other process) and ptrace. As in blocking /proc/self/mem but allowing the other FOLL_FORCE's > > as the traditional UNIX permission model allows. > > > IMO, it is a matter of Divide and Conquer. In a nutshell, mseal only > prevents VMA's certain attributes (such as prot bits) from changing. > It doesn't mean to say that sealed RO memory is immutable. To achieve > that, the system needs to apply multiple security measures. No, it's a matter of providing a sane API without tons of edgecases. Making a VMA immutable should make a VMA immutable, and not require you to provide a crap ton of other mechanisms in order to truly make it immutable. If I call mseal, I expect it to be sealed, not "sealed except when it's not, lol". You haven't been able to quite specify what semantics are desirable out of this whole thing. Making prot flags "immutable" is completely worthless if you can simply write to a random pseudofile and have it bypass the whole thing (where a write to /proc/self/mem is semantically equivalent to mprotect RW + write + mprotect RO). Making the vma immutable is completely worthless if I can simply wipe anon pages. There has to be some end goal here (make contents immutable? make sure VMA protection can't be changed? both?) which seems to be unclear from the kernel mmap-side. If you insist on providing half-baked APIs (and waving off any concerns), I'm sure this would've been better implemented as a random bpf program for chrome. Maybe we could revert this whole thing and give eBPF one or two bits of vma flags for their own uses :) > > For writing to /proc/pid/mem, it can be disabled via [1]. SELINUX and > Landlock can achieve the same protection too. I'm not blocking /proc/pid/mem, and my distro doesn't run any of those security modules :/ -- Pedro