From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E79F7CDB474 for ; Tue, 17 Oct 2023 22:35:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F3358D0113; Tue, 17 Oct 2023 18:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 47D148D0005; Tue, 17 Oct 2023 18:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F5B78D0113; Tue, 17 Oct 2023 18:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1BEC78D0005 for ; Tue, 17 Oct 2023 18:35:45 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E0A9FC0EF3 for ; Tue, 17 Oct 2023 22:35:44 +0000 (UTC) X-FDA: 81356411808.29.F01F74F Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf07.hostedemail.com (Postfix) with ESMTP id 282E94001C for ; Tue, 17 Oct 2023 22:35:42 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e1h0khv2; spf=pass (imf07.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697582143; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wdeKf2/h6eRXVjrIhtW4p5eKxKczJ25w9PIDmoQxibo=; b=tIPBTmXnO3VV6qeTktUExmq9vDlASzMVI8mSok9YGggMNZiqrS4XG22vLOUBLlpMCS/16f rC9qCGshoroG+jeGknXySrucwGcS17UCn3d21CqtqFn+fp5E9OSwGYBlCyzl38yOvYphh9 +s2CWqSuUzfg58lx+CWxjm+fSwG3/0E= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e1h0khv2; spf=pass (imf07.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697582143; a=rsa-sha256; cv=none; b=NucjYDYShlCaxjp9Su9wnpwFEjYQM+Vz0VuucQ2skAYbNhExdwfrdX46jIrZb7Evj8uoAJ LQIrvcN5fx3UuzXq67FHsJco43DMOFLLltBeXuHiA8YyRnNGsbGCJAkmgllddf35xegMzw vnZB1GJcS2n9yJKYGz42X5UHGo0/sJs= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-7ae19da7b79so2388549241.2 for ; Tue, 17 Oct 2023 15:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697582142; x=1698186942; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wdeKf2/h6eRXVjrIhtW4p5eKxKczJ25w9PIDmoQxibo=; b=e1h0khv2JSfs3b1l3e0j86vLmnW+kQioWCd0apt72FmdonQRv8bJa/2frkYdXW3moa +7ZVQ0IVh+t6gFfbzVHre7GAU/m0Vue7DUhAY6dXSkBYKTDufU0dpJjrwYE1567A0vmJ 2Uq6KK2vP/U3dnasDB0OCvUnjnKoLFXT9vWlIuNu9G8iqkEGAF1FWHVCSNlwPJimNp22 8If7GXdDONO7c34pRyrebI3/IfGFSfFSx30NMra09X5HfCeBaSKjtBOqexAxd8CxZsnk rHsFQdPZwSSHQSHW6oyrZjP+K23HeNkZJ/BfNHrIudsK2Rh55WDx8gpOktXsI3LdoaI8 kdjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697582142; x=1698186942; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wdeKf2/h6eRXVjrIhtW4p5eKxKczJ25w9PIDmoQxibo=; b=wrHNnBSeOV78ai86SN9Hu52vXENKen3Hra6+89elgUUhlk/q+N6GoyMcTKW9umIntV zopGk/71m5peCsiaChc0NNlgf19KH+tsC5qopNPXQ2ykSkzvsVluXQNiNioU8GQcyMjP 6Qhs8n2DmkHHTTn6D0LGOiRHSppIDvtXwhETvZdiIet8zdCRsA2E+d+1q1fHSj8lApX1 iNVVNcpVGNbnZx6+nHCNIY4Dd813tWQnu30R3VJ1gnXPCKN5Hj00acpI+VXm0dk0UwYe E0RSWg+AvvX5b4jYgdJR46lQNiLIK+c0QC/wKdOc2EXrftFxFIRUHIEhr0OEcSn14K+k +Iqw== X-Gm-Message-State: AOJu0YwHWfZgeQDKMx7mkynpvYUYew8tVAG7uvJn29+EoTG9vWDaH69P t5C8OCLF9ZVC2+rRW13q1Ny3IA2l9FlID/rxB0M= X-Google-Smtp-Source: AGHT+IH5j9aX5uFG6qIm/ceWB+MagtIHZd+LHx139RfrZUAJXvcGu6hpgvKLuHEaHwGfMDIBoSSU12wdggts1ghINmc= X-Received: by 2002:a67:cc15:0:b0:452:60c5:20b with SMTP id q21-20020a67cc15000000b0045260c5020bmr3720336vsl.15.1697582142134; Tue, 17 Oct 2023 15:35:42 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: From: Pedro Falcato Date: Tue, 17 Oct 2023 23:35:30 +0100 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Jeff Xu Cc: Matthew Wilcox , jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, torvalds@linux-foundation.org, lstoakes@gmail.com, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 282E94001C X-Rspam-User: X-Stat-Signature: stix6en8npix9rwm7xid4e6ynbtp1ybs X-Rspamd-Server: rspam01 X-HE-Tag: 1697582142-318767 X-HE-Meta: U2FsdGVkX19Gugtx4LK/hXC1eQr8q/iqkD7AjT2UaiUpkwBhdIC9pWID+U36xY2Jf9bb0++gtFAHQ6HGRES1pcdF8/Cejr5p+hYn+Znh5ypnwv6XJ0wNjuk+e1PGeDPdwz5uiXr7JyvK5TRYc6JZJsSGdLYKNhktbSyY8xxdAbrIrwA89Tp0LhKwQoNJ5a2HZqvaCqVz9I2PQqtjZpDwTh4UacoocUjHvGIxTpI3RUEhpXJpPpJWJhWNPizyCYQH3uy8oQyIvvpCI7Gm7m+p7hSRPQd0omitVbU0MoaMaGCjoq8KhDRpD6+5sqRJMedakvq1bAJw7yyCctiH+HFH3sF51eOZ3Ybt/+UblyJzGkLGOmKMkBqBMDK+iXku1yxoG06BTZe0zd3OX0XHA0f8OueYl2QCqahHoFlNHEa0/q6G9w6GqIg4LfHTttpSIr/UGTyHZbusOSgAro7TgyMLY3t3Kff9EmFve8510J4uL3G9kMI63FoMM85FK/0tjr1tdmZ8aQhBuylGkLtrJszrgYWoo05lmiZG0IHrlaUH+JevqqnmM0l0cYy+wvOOb0KCwTXTsLUwdqsXzMNjqlzE71LCfz7PhiOCALzWQpDOa3rNgIWhAH4NfmT3yPZ9IxOky/ZaYovFXo2vTlu17cYlfs48ahW3ck8XzW6wCJaiwR/pbx6LyyYo4o5Z8gRXMchqKXuN57Z6Qg7NgJRhdGIS8bo3sgWaLwKSCF/pht3Ht8g8JKVEUTT5UPLsSXp+jHow/n9zkSSnhrr680EgODpKgxuzyoyXFucuVPlkB6wIxS0YGLWEDsSx7JqQcDQnKBFkS+4oYyZSrXt0IcC3DyWbhvyF0fSbq4/jEDMFjyFo2NblVrN7ShyCEt8biAb77A4Uoj+n6DDh/gfO6vkzK/J23coGbHGnekxDnGERRpfVVIh0phkjXRUVaUEcssaO43vy/BuPTe+7qXScGICeRNV iPkiHH1x 9R3Fq1ssW8d0OettA8OSTaGjR9uIhnA2S1tiApBfAS4dhCC9M8SnI076L/VHneeQp51Pd0/2HnUo/9tqVDtu3w/QYfuxuF0d+3FjL4vBgjpHzfYtbDQ3BJocuic3dT7qrtGH/TSX+h4Cknp5o6iY3nS3p7IQCwRkyGndbbErPma7OHWbvFvRyXXCXa3+4O8wR4rQIA/89yppgV3MWx8Ze2LAiUX2hK8jGMijbMCW8BQjPiAw/7wsv7BqD2s/vEJMfS0tGQsDVeiDyQ7hxXKsfBbB9TX9Zp1WdG9y1OMNmgjECVtBvQS3OloQTgXUMeof/qJdPbuoEBKBGuRu/8S7keEsuRvyg8+A1DNM+DmoGgzGLxeq3ExsrB6ZP1g1/j3rxnIfcdAkZ9Y5CgOTr5UYbNqBNVtWWRIufnDRJVt/aZV5p201QNuoYAiCqkgk95eU2BNOOqrM/xBmSHa6kzCZp9GW8BbJ9irFQzkv7r3sQM7o+vJ364gDEQ0fqvXvnLnbu446ubkPEBoGhbuekG0EcZhjZ1Uj/vvfgamTxhpGDk6jxIrQxLhY6KHqzY2v0RLMT0XWPCLzbmlsez1Lgjz/T4YPEF8ljAQQvbI6L X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 17, 2023 at 10:34=E2=80=AFPM Jeff Xu wrote: > > On Tue, Oct 17, 2023 at 8:30=E2=80=AFAM Pedro Falcato wrote: > > > > On Mon, Oct 16, 2023 at 4:18=E2=80=AFPM Matthew Wilcox wrote: > > > > > > On Mon, Oct 16, 2023 at 02:38:19PM +0000, jeffxu@chromium.org wrote: > > > > Modern CPUs support memory permissions such as RW and NX bits. Linu= x has > > > > supported NX since the release of kernel version 2.6.8 in August 20= 04 [1]. > > > > > > This seems like a confusing way to introduce the subject. Here, you'= re > > > talking about page permissions, whereas (as far as I can tell), mseal= () is > > > about making _virtual_ addresses immutable, for some value of immutab= le. > > > > > > > Memory sealing additionally protects the mapping itself against > > > > modifications. This is useful to mitigate memory corruption issues = where > > > > a corrupted pointer is passed to a memory management syscall. For e= xample, > > > > such an attacker primitive can break control-flow integrity guarant= ees > > > > since read-only memory that is supposed to be trusted can become wr= itable > > > > or .text pages can get remapped. Memory sealing can automatically b= e > > > > applied by the runtime loader to seal .text and .rodata pages and > > > > applications can additionally seal security critical data at runtim= e. > > > > A similar feature already exists in the XNU kernel with the > > > > VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable sysc= all [4]. > > > > Also, Chrome wants to adopt this feature for their CFI work [2] and= this > > > > patchset has been designed to be compatible with the Chrome use cas= e. > > > > > > This [2] seems very generic and wide-ranging, not helpful. [5] was m= ore > > > useful to understand what you're trying to do. > > > > > > > The new mseal() is an architecture independent syscall, and with > > > > following signature: > > > > > > > > mseal(void addr, size_t len, unsigned int types, unsigned int flags= ) > > > > > > > > addr/len: memory range. Must be continuous/allocated memory, or el= se > > > > mseal() will fail and no VMA is updated. For details on acceptable > > > > arguments, please refer to comments in mseal.c. Those are also full= y > > > > covered by the selftest. > > > > > > Mmm. So when you say "continuous/allocated" what you really mean is > > > "Must have contiguous VMAs" rather than "All pages in this range must > > > be populated", yes? > > > > > > > types: bit mask to specify which syscall to seal, currently they ar= e: > > > > MM_SEAL_MSEAL 0x1 > > > > MM_SEAL_MPROTECT 0x2 > > > > MM_SEAL_MUNMAP 0x4 > > > > MM_SEAL_MMAP 0x8 > > > > MM_SEAL_MREMAP 0x10 > > > > > > I don't understand why we want this level of granularity. The OpenBS= D > > > and XNU examples just say "This must be immutable*". For values of > > > immutable that allow downgrading access (eg RW to RO or RX to RO), > > > but not upgrading access (RW->RX, RO->*, RX->RW). > > > > > > > Each bit represents sealing for one specific syscall type, e.g. > > > > MM_SEAL_MPROTECT will deny mprotect syscall. The consideration of b= itmask > > > > is that the API is extendable, i.e. when needed, the sealing can be > > > > extended to madvise, mlock, etc. Backward compatibility is also eas= y. > > > > > > Honestly, it feels too flexible. Why not just two flags to mprotect(= ) > > > -- PROT_IMMUTABLE and PROT_DOWNGRADABLE. I can see a use for that -- > > > maybe for some things we want to be able to downgrade and for other > > > things, we don't. > > > > I think it's worth pointing out that this suggestion (with PROT_*) > > could easily integrate with mmap() and as such allow for one-shot > > mmap() + mseal(). > > If we consider the common case as 'addr =3D mmap(...); mseal(addr);', i= t > > definitely sounds like a performance win as we halve the number of > > syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD > > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem > > like common patterns. > > > Yes. mmap() can support sealing as well, and memory is allocated as > immutable from begining. > This is orthogonal to mseal() though. I don't see how this can be orthogonal to mseal(). In the case we opt for adding PROT_ bits, we should more or less only need to adapt calc_vm_prot_bits(), and the rest should work without issues. vma merging won't merge vmas with different prots. The current interfaces (mmap and mprotect) would work just fine. In this case, mseal() or mimmutable() would only be needed if you need to set immutability over a range of VMAs with different permissions. Note: modifications should look kinda like this: https://godbolt.org/z/Tbjj= d14Pe The only annoying wrench in my plans here is that we have effectively run out of vm_flags bits in 32-bit architectures, so this approach as I described is not compatible with 32-bit. > In case of ld.so, iiuc, memory can be first allocated as W, then later > changed to RO, for example, during symbol resolution. > The important point is that the application can decide what type of > sealing it wants, and when to apply it. There needs to be an api(), > that can be mseal() or mprotect2() or mimmutable(), the naming is not > important to me. > > mprotect() in linux have the following signature: > int mprotect(void addr[.len], size_t len, int prot); > the prot bitmasks are all taken here. > I have not checked the prot field in mmap(), there might be bits left, > even not, we could have mmap2(), so that is not an issue. I don't see what you mean. We have plenty of prot bits left (32-bits, and we seem to have around 8 different bits used). And even if we didn't, prot is the same in mprotect and mmap and mmap2 :) The only issue seems to be that 32-bit ran out of vm_flags, but that can probably be worked around if need be. --=20 Pedro