From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83610CDB465 for ; Mon, 16 Oct 2023 17:23:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA1848E000A; Mon, 16 Oct 2023 13:23:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A537C8D0001; Mon, 16 Oct 2023 13:23:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 919498E000A; Mon, 16 Oct 2023 13:23:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8256D8D0001 for ; Mon, 16 Oct 2023 13:23:51 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 42F62120A64 for ; Mon, 16 Oct 2023 17:23:51 +0000 (UTC) X-FDA: 81351997062.15.2EA3D01 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf20.hostedemail.com (Postfix) with ESMTP id 3B1791C0011 for ; Mon, 16 Oct 2023 17:23:49 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=DavGQCMP; spf=pass (imf20.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.47 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697477029; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ziSfOVrFPUUVHeu10UoweKHdhwL95UR6sSsnqfpXK6Q=; b=SlOBu31WjWy0+3vfw3zKFyja3keQ95e3ayxzk3PxPWEF6pSUrZbDNXc6TmEmiuw7b59af7 d+nEiOuhm/OXoq46HVsnOQPSRbxkt0eJCcZEVpuu+fmFtV5/uTXufSP8TX+FtaIZRX0gLA 4Lh7vnGc7tHjlH6DnoMeK8fi4lj69eg= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=DavGQCMP; spf=pass (imf20.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.47 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697477029; a=rsa-sha256; cv=none; b=4SJY5gf2gPK+61IyIIVvS0vHSaLCWQOPvKiHyXaSAy22Msn0iYZtrpOC+Ko7gtKhVhCLna UDQNgI2BnLG+DNCM1ix9JXLjgAOnecFMyMm2ILOYgIkONeAoVT8ephgnvoKLZRDDazfwoW O/bq/9+Xm6Ne3NDQxSCejRSzB4axqhY= Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-523100882f2so8038915a12.2 for ; Mon, 16 Oct 2023 10:23:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1697477027; x=1698081827; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ziSfOVrFPUUVHeu10UoweKHdhwL95UR6sSsnqfpXK6Q=; b=DavGQCMPDpt0wJb0R2p30q3ylLBA+bmabEi7pQlBjNJVKkYEVzT6Z/9LFZ4Ft7Krx3 KrxqdZx9YmjDrRZ1O6exZlOr2PumGa9Z6gXV79+ggM+5bokYCMcGsNwFuXxp4qsbUfrv UC19YHtLKpOTpCUHoT38PDGF8OsoyW/8A2uwQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697477027; x=1698081827; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ziSfOVrFPUUVHeu10UoweKHdhwL95UR6sSsnqfpXK6Q=; b=PjMkK3+MzcaBsX6JAi2dtLhDPd6F3GYs70o9N2R6HMoCg3Iky8UOWckzuEKy9pXql2 5BrKPPOtClsyoYgBVOkd8bTghDxNpfdYjjcRLviBEcCqdIwFjCA8mwqVGihhkSSnwnpF QSBuwMudgeMZ6Eo6unJWH+rFHEah5BBUdyKRfptpViKp5vuiNzcN4olraonic5GSd5bJ i01TyZmWI5MUq+5sQRmUjWtqtgfCVLbYGP1CD5/voce045czgzkDTrnlBaoR0aRXxDFn 4IZmUCIYIm0GuxHMji8b/5OJ62IzBTH63KqAvOYLYkJT5xIWeV1lKziKalABKuXDTFOF zpTw== X-Gm-Message-State: AOJu0YyXUeDY6wchAppTK32fzGt5wZTT9TXUEM6n5SfkOpJIGp5lX7SB DOw/Qk3erl8IEPa1xnWUxQQZjuJpLrw/qCg6omTvgA== X-Google-Smtp-Source: AGHT+IF868OjfIz8/J+XQP44hswdWLD+icRfwlhK9m5/ZAF/gHj15QguI9oAyo19/0IbfIL3tcNK4w== X-Received: by 2002:a50:d709:0:b0:53e:9387:40f7 with SMTP id t9-20020a50d709000000b0053e938740f7mr4559340edi.19.1697477027605; Mon, 16 Oct 2023 10:23:47 -0700 (PDT) Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com. [209.85.218.41]) by smtp.gmail.com with ESMTPSA id l23-20020a056402231700b0053e89721d4esm171908eda.68.2023.10.16.10.23.47 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Oct 2023 10:23:47 -0700 (PDT) Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-9ad8a822508so779347066b.0 for ; Mon, 16 Oct 2023 10:23:47 -0700 (PDT) X-Received: by 2002:a17:907:78a:b0:9bd:a662:c066 with SMTP id xd10-20020a170907078a00b009bda662c066mr9387285ejb.76.1697477026848; Mon, 16 Oct 2023 10:23:46 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> In-Reply-To: <20231016143828.647848-1-jeffxu@chromium.org> From: Linus Torvalds Date: Mon, 16 Oct 2023 10:23:29 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: jeffxu@chromium.org Cc: akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, lstoakes@gmail.com, willy@infradead.org, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 3B1791C0011 X-Rspam-User: X-Stat-Signature: mbuo4sp7cwbxajh1m1ibui7dw9ookep3 X-Rspamd-Server: rspam01 X-HE-Tag: 1697477028-697518 X-HE-Meta: U2FsdGVkX18h+N2VLSMt8/gFCF8KgePaSUSXSNV0Y8rGtrQ2FHrlGWEg5k42P4B9KLFpuHOEZJEKyfiybzbaEd4XVn1jhZsM21OXL+phrM+2xpQ4a2vxr+UtrTcSqvonrO4QX6CzBB7QCVnP7reRZW3HCSMxIP4earC0idP2Bvy11MX2lUW+3Bqke/N/shEzV67otPUGrpBr5jtEzWJE6/Y6ibtyAIzL8Yfmmebc6JGPDCewHdRsQEs5RufxHKqkXJ58ynnHwdmpmXm+hNpehWan6tEvqC93YkbeBBrAkc10Zx+HW39KJpGQqijip/4DZvA1GtGA9/vqutvS2gB5LIccqHWloqjWCBlHkOoIQ8HqoNtKMjkFE/KINbxp0X/dIUH8ZRdCEYVVt8HwXXM49LAU6gUO048DjjFhUDE+NhHOv72eOyudmpjSixp7Ted2ZuuhwojIMjKa21nQEKQFcuGd08ZYLEa185yShfE+3AvK92K4bSbhbGkROo+Xj7kpJI+wbGM8CAwS+/jlYgnPanNvLrgj9iPFUgDcikwTBnjQPdI9xeisPDnT9sw2mQJeMVKpl78Y8jar7BtVWmWZkCKbF13qsx+sjFVEBDGOa828F86maZ3xCSAYzT6soJvUBTPgnF3bJj/vJVgcqaeULn1Dt/4jgW2L4lR0mLaxmcz1sQtXge1tbJu2Oqb8Ix19aTIFkCJjqGrdOV90ERlxF11OyBJZSYzJuK2ahwu+s0yE+e4OYLMEXkn8SmIg0v18VvS7lN3ErFnwof0EeACCobQ3o8EaNIl30GXzK16fYh2GQNp4f1bnhjTh6kpc/djXi/8uUxZt8XRvogag4CGjSVVzlBlEvdocdOi9YkgZR/96F9GT4R9awi6UB+LMWop9HwwESKA/djODW9fD3hUkeIDS997UGC5moboyhpY25DBT37XHNe9KORWDotjmbRU1z1YZ42ExkcJuWI74Zfc v08eLlZj iLT2pysY/cWLyMOnxdwNEaPpf/gW9tj7cqBKJSGoLdXu2SUsvY2QDaoCPEP2iR8TTZgWH5Y8pj1Wj7xyk/ae46HbfbLJowrKPbmUQEVXcCcwOdHE/Gt1SsHIDfwbf4Btodg07lP7cdhq3sHMiUNJwrvs4DvTQbcpJTr/OF6BzLpKf5Ze7XnOf1AnKOL/Yu709ODvohTK/taYkYwAj4psm/J+3Nw5Q8OKQhp65ZO4jSURDqh1l7dg/6GXL++463eTLvKqfPYouvF3EAAFWKO8S8yaRoKrk7oIpooWvbZu2b36Ahf1Ywcd1JFaHJmQY/pDXZ7xM+9Yykr9TG8ZhJj2W2TvfujRo4MWbJfPIdE5S2JnShXy1bsHR8n4pXPRDAFT0BM3t8yKK8C1J2uNLY5mA9vg9KOIxTE3656QsqrrAAhZhF+MeBmJlqz5ZJQa86WHHvbAv39vXjIdIZGS6zLHmNqaX0YJgsuVfcHDOAOZK90Do7zR3hEmjvS1ol2q0n7F/5Or1R8Y43F5lIQLcmgNckPsY8ZCsRUvVEHRQoeCu/ziOzLWgCcUjw2edyA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 16 Oct 2023 at 07:38, wrote: > > This patchset proposes a new mseal() syscall for the Linux kernel. So I have no objections to adding some kind of "lock down memory mappings" model, but this isn't it. First off, the simple stuff: the commit messages are worthless. Having check seal for mmap(2) as the commit message is not even remotely acceptable, to pick one random example from the series (7/8). But that doesn't matter much, since I think the more fundamental problems are much worse: - the whole "ON_BEHALF_OF_KERNEL" and "ON_BEHALF_OF_USERSPACE" is just complete noise and totally illogical. The whole concept needs to be redone. Example of complete nonsense (again, picking 7/8, but that's again just a random example): > @@ -3017,8 +3022,8 @@ SYSCALL_DEFINE5(remap_file_pages, > flags |= MAP_LOCKED; > > file = get_file(vma->vm_file); > - ret = do_mmap(vma->vm_file, start, size, > - prot, flags, pgoff, &populate, NULL); > + ret = do_mmap(vma->vm_file, start, size, prot, flags, pgoff, > + &populate, NULL, ON_BEHALF_OF_KERNEL); > fput(file); > out: > mmap_write_unlock(mm); Christ. That's *literally* the remap_file_pages() system call definition. No way in hell does "ON_BEHALF_OF_KERNEL" make any sense in this context. It's not the only situation. "mremap() as the same thing. vm_munmap() has the same thing. vm_brk_flags() has the same thing. None of them make any sense. But even if those obvious kinds of complete mis-designs were to be individually fixed, the whole naming and concept is bogus. The "ON_BEHALF_OF_KERNEL" thing seems to actually just mean "don't check sealing". Naming should describe what the thing *means*, not some random policy thing that may or may not be at all relevant. - the whole MM_SEAL_xx vs VM_SEAL_xx artificial distinction needs to go away. Not only is it unacceptable to pointlessly create two different name spaces for no obvious reason, code like this (from 1/8) should not exist: > + if (types & MM_SEAL_MSEAL) > + newtypes |= VM_SEAL_MSEAL; > + > + if (types & MM_SEAL_MPROTECT) > + newtypes |= VM_SEAL_MPROTECT; > + > + if (types & MM_SEAL_MUNMAP) > + newtypes |= VM_SEAL_MUNMAP; > + > + if (types & MM_SEAL_MMAP) > + newtypes |= VM_SEAL_MMAP; > + > + if (types & MM_SEAL_MREMAP) > + newtypes |= VM_SEAL_MREMAP; Sure, we have that in some cases when there was a *reason* for namespacing imposed on us from an API standpoint (ie the "open()" flags that get turned into FMODE_xyz flags, or the MS_xyz mount flags being turned into MNT_xyz flags), but those tend to be signs of problems in the system call API where some mixup made it impossible to use the flags directly (most commonly because there is some extended internal representation of said things). For example, the MS_xyz namespace is a combination of "these are flags for the new mount" (like MS_RDONLY) and "this is how you should mount it" (like MS_REMOUNT), and so the user interface has that odd mixing of different things, and the MNT_xyz namespace is a distillation of the former. But we certainly should not strive to introduce *new* interfaces that start out with this kind of mixup and pointless "translate from one bit to another" code. - And finally (for now), I hate that MM_ACTION_xyz thing too! Why do we have MM_ACTION_MREMAP, and pointless like this (from 3/8): > + switch (action) { > + case MM_ACTION_MPROTECT: > + if (vma->vm_seals & VM_SEAL_MPROTECT) > + return false; > + break; when the sane thing to do is to use the *same* MM_SEAL_xyz bitmask and sealing bitmask and just say if (vma->vm_seal & vm_action) return -EPERM; IOW, you have pointlessly introduced not *two* different namespaces, but *three*. All doing the exact same thing, and all just causing pointless and ugly code to "translate" the actions of one into the model of another. So get rid of the "MM_ACTION_xyz" thing. Get rid of ther "VM_SEAL_xyz" thing. Use *one* single "these are the sealing bits". And get rid of "enum caller_origin" entirely. I don't know what the right model for that thing is, but that isn't it. *Maybe* the right model is some MM_SEAL_OVERRIDE bit that user space cannot set, but that the kernel can use internally - and if that is the right model, then dammit, the *uses* should be very very obvious why the override is fine, because that remap_file_pages() use sure as hell was *not* obvious. In fact, it's not at all obvious why anything should override the sealing bits - EVER. So I'm not convinced that the right model is "replace ON_BEHALF_OF_KERNEL with MM_SEAL_OVERRIDE". Why would we *ever* want to override sealing? It sounds like complete garbage. Most of the users seem to be things like "execve()", which is nonsensical, since the VM wouldn't have been sealed at that point _anyway_, so having a "don't bother checking sealing bits" flag seems entirely useless. Anyway, this is all a resounding NAK on this series in this form. My complaints are not some kind of small "fix this up". These are fundamental issues. Linus