From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 319ABCDB47E for ; Wed, 18 Oct 2023 03:19:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE62E8D0017; Tue, 17 Oct 2023 23:19:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B970F8D0016; Tue, 17 Oct 2023 23:19:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5F308D0017; Tue, 17 Oct 2023 23:19:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9612F8D0016 for ; Tue, 17 Oct 2023 23:19:26 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5CC6D811AF for ; Wed, 18 Oct 2023 03:19:26 +0000 (UTC) X-FDA: 81357126732.02.D581DD1 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf13.hostedemail.com (Postfix) with ESMTP id ABEFC2001C for ; Wed, 18 Oct 2023 03:19:24 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NbVsHXFK; spf=pass (imf13.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697599164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MvCHuca5gqFmVX6ADoD31L9497cKfFiOdWSXxTUJEiA=; b=YlOW+GJboCu/EjxsKnVdQ/4GzQa5S++maiJxpBGS3T/6YQMxZl5h225zqxPAWUFT+MKbrk QuzG1yw/4u6S11UPoomXChvsHg/OCqb1nhS+dmYXui76vVIWlQDZjYXQ8ATBVXIxHu6+na MNJmApMfwMIUptrDanw2JHqgSaJRWqc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697599164; a=rsa-sha256; cv=none; b=WeLS6TnCAGfvFOPDz76iR/cae6fy5Lehswfjfhrl3pkw20mTPXNXC02IxEG7gv0RrlzInd HS26Gh0P8odOjfljNpbTcfEVoBbIPJWrcWs60DRTEkLgbf2YvRw5YHBj6YGfHh8sqVIh4Q UrXQ0UfQTsGaVBO4isZVXyqHlWj5hds= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NbVsHXFK; spf=pass (imf13.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-419b53acc11so113171cf.0 for ; Tue, 17 Oct 2023 20:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697599163; x=1698203963; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MvCHuca5gqFmVX6ADoD31L9497cKfFiOdWSXxTUJEiA=; b=NbVsHXFK+DRWp8XVN+9jdnA0FBiTYY5F9ETJuVJk7vGTVUKKj+P3gKwy/3ja97H9hi 2qBnl/6QEXc2W9sskSno5wi19uCGQ10tKYhvC8nN0yx1oOrkTCUiMTkOdKOrvk4BHEOv mBIloWVB6lQwUSYZFKR0zkhdbnWxXPbopB4gDzPIzEfWYKknoqcT9tnC0NsaG30DoSQA aiFtbGtf4jJfONnEapNn2yvpfEdliSD0brqA4b0+LTXXF9MMAeMc/7s8FONUh2jBLqDK +lBsAvzwlz0yYlJZPi2Kx/c6iRevixil+/kBRP4PHqmemXfx8rFRIiEN/fiaCRbJetJe BXNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697599163; x=1698203963; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MvCHuca5gqFmVX6ADoD31L9497cKfFiOdWSXxTUJEiA=; b=h9HKw09oPP6n2glkLWJ+y8oi9jolFmnlYzss9e9l9VAA5L6fooAepJyJsaUxpFa7WD quckGOBtx8JhaFYSHu2Eo7FtDfoJYp1SBMyfN2fsuCMTsvlmIcBcylrac0/xs+F8VqG/ yzknSP3JjbelhrhQaaAs7bQNGMTNgKKDeWOPu0jFFsaLwEQtrISKmuvX9w3t8y6lpsQs g4mz2/UqyE5RtLINBr2etZIBxLDIrFmTqDsFYGN3GT8+FgRVEbaXm/6eAqYDNg1Os8TD 50DNI1Gf4IWU2d6NQssyLhRra8bKep7YPdDYZyl52D59V8JEAjvMkkkt794CFCKyO+4D bwAw== X-Gm-Message-State: AOJu0YyB5mi0pqPlKxn5ju4RtAGGlJOIpFlhZxDuOP3jRs7F1YD2XNmZ 7Uu9LFudsrj68O+LwkW7Ztj6YC53TLdDTyybYJZg9w== X-Google-Smtp-Source: AGHT+IFbrTIOd7jXyx4T/7x0hZgOMDWQr7Q9EqHck7tw9KSmrv7NOpfHSsir3EFhTZNg03S0BFf0TIJLP9jvzPk3rKE= X-Received: by 2002:a05:622a:2c0e:b0:417:944a:bcb2 with SMTP id kk14-20020a05622a2c0e00b00417944abcb2mr89662qtb.13.1697599163303; Tue, 17 Oct 2023 20:19:23 -0700 (PDT) MIME-Version: 1.0 References: <20231016143828.647848-1-jeffxu@chromium.org> <55960.1697566804@cvs.openbsd.org> <95482.1697587015@cvs.openbsd.org> In-Reply-To: <95482.1697587015@cvs.openbsd.org> From: Jeff Xu Date: Tue, 17 Oct 2023 20:18:47 -0700 Message-ID: Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall To: Theo de Raadt Cc: Linus Torvalds , jeffxu@chromium.org, akpm@linux-foundation.org, keescook@chromium.org, sroettger@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, surenb@google.com, alex.sierra@amd.com, apopple@nvidia.com, aneesh.kumar@linux.ibm.com, axelrasmussen@google.com, ben@decadent.org.uk, catalin.marinas@arm.com, david@redhat.com, dwmw@amazon.co.uk, ying.huang@intel.com, hughd@google.com, joey.gouly@arm.com, corbet@lwn.net, wangkefeng.wang@huawei.com, Liam.Howlett@oracle.com, lstoakes@gmail.com, willy@infradead.org, mawupeng1@huawei.com, linmiaohe@huawei.com, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz, xiujianfeng@huawei.com, yu.ma@intel.com, zhangpeng362@huawei.com, dave.hansen@intel.com, luto@kernel.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: icad33mf1phznqgpyi44tq36aiie74sz X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: ABEFC2001C X-Rspam-User: X-HE-Tag: 1697599164-766941 X-HE-Meta: U2FsdGVkX197MrR8KZoqNvCPJMKt8x0YTyhwbHVHLME2gyxQW46Pd8CyKUZz/rHHxOnVfc4pK9qSWuVdTLNLYcxj+HsvmzL0/e19MN3b2ZXc07xDfnxvKHoLnLIlmRRszevL6yYtjHwu1LpJqBhVJ+7oShBM1jqIHgQCC4/ikgv58BS0fDZSIQTeHzgLmw1ncyydA6IOj4zJb7mLLjIxEwzad7FsLdQ0+NkLioALbKyOUyBXtVG5lSgRJQYEhbeiD70Y4nz045jec+zjd4bhqeRhCFkqWGSpS5Ck9P7VNw9Jpo0UCpHXnKhyHx8hU21cxddTE4m8IBAmK4YpMPDXPq6U9mwrLrDsc8QbEYKez+8XuEvIVaKkwiKad5IwZDShnYKwlyGjOss81tNEnVkZm/nudRMan3ubbxIsJE1/RKp5zZ/qQtOHN7MnFegq5gzhp8A0AYL/YghgoXJ6OkhsP3oNy1IolvJKoNCDpj5irPc8NL7VytuyVyagXuPwtP0dkpTnsupSmRAtNsTv0ISaYkj2mPT/KtDOs7sXPHKz2ftOtkHV7W2wfuB2LUgVXZuiTRz3aO/XNAmIzoqh4of//3qlCw+ndVwVAsxr6VpHxRuGoasVCRIkm6npTHljFvydhrgNhm/40ky5Pee13jfCxIabSS7kDTbuoyJdUrU43FtsKxYbjEh4r7pWhAhjKNsxEmdaWlSnfkjHrmBVYA6s4vRNsFRN9oY+F9eeJzqnpYOEQYj4cFk6qEqc1VzTLuglwEHMoEJ2DZtTnQsOTAH9sFYIrJZZVoQ1BR8LaSROpUkssu+XBffhLHskysIRD5We4NOk8dJV0pDToLtKDCsLnLd3di9RYZS/7j618gu+C7fnGbGf9m1x7IWoop6b4mgklGSSIV0g47hRYU4DTOdB7cHtB6Pw2jN7s6eSQhaM0ERF3jI+OU/ETDiUYLFqU4q/ZWjNTgBPZPewDNbzvWb 32zByrbg gSrBVA/pqpbu0rifSC9/H/R2C+CURV4QgAcNED5UkOkjIROv8umi08I7Ck1+wxxsNvBuHD5c52wevVIv5oyprWGssffyQsvbre/GC45hOsWEHR5PdquDE9eJ1/6ABxacnJdRA/TgJA8LSG2QT1823INxVSPujirH0EeHtd7mbqX8ouSpM/NGaQkhadW65x2VoXkaQjKvY22izZVmVbJ8hyuyH0ingFIGalY8uXrHd9ETYONupycSEn0ApRmqzdQw1ibHbXu5qQpPRo/0jHRYnJ1jvzaNpcfv/7H1yi0ZkfFukJUjaRvPhXWpZr/69t2uvJjThjDFXgGG12eCwiDwmgEBUIqCqJGG/2jEo728dpDUa0cQfGusWHp369gwSxwdAex7lt57xz/nuXCKSxJ9hD6xq9tDFlOwfDNOFwUF2vVA591CyTk+y87eKC6ZJPmhVS3eg+DQpgLf2A54= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 17, 2023 at 4:57=E2=80=AFPM Theo de Raadt = wrote: > > Jeff Xu wrote: > > > May I ask, for BSD's implementation of immutable(), do you cover > > things such as mlock(), > > madvice() ? or just the protection bit (WRX) + remap() + unmap(). > > It only prevents removal of the mapping, placement of a replacement > mapping, or changing the existing permissions. If one page in the > existing sub-region is marked immutable, the whole operation fails with > EPERM. > > Those are the only user-visible aspects that an attacker cares about to > utilize in this area. > > mlock() and madvise() deal with the physical memory handling underneath > the VA. They have nothing to do with how attack code might manipulate > the VA address space inside a program to convert a series of dead-end > approaches into a succesfull escalation strategy. > > [It would be very long conversation to explain where and how this has > been utilized to make an attack succesfull] > > > In other words: > > Is BSD's definition of immutable equivalent to > > MM_SEAL_MPROTECT|MM_SEAL_MUNMAP|MM_SEAL_MREMAP|MM_SEAL_MMAP, of this pa= tch set ? > > I can't compare it to your subsystem, because I completely fail to > understand the cause or benefit of all the complexity. > > I think I've explained what mimmutable() is in extremely simple terms. > Thanks for the explanation, based on those, this is exactly what the current set of patch does. In practice: libc could do below: #define MM_IMMUTABLE (MM_SEAL_MPROTECT|MM_SEAL_MUNMAP|MM_SEAL_MREMAP|MM_SEAL_MMAP) mseal(add,len, MM_IMMUTABLE) it will be equivalent to BSD's immutable(). > And I don't understand else you are trying to do anything beyond what > mimmutable() offers. It seems like this is inventing additional > solutions without proof that any of them are necessary to solve the > specific problem that is known. > > > I hesitate to introduce the concept of immutable into linux because I d= on't know > > all the scenarios present in linux where VMAs's metadata can be > > modified. > > Good grief. It seems obvious if you want to lock the change-behaviour > of an object (the object in this case being a VA sub-region, there is a > datastructure for that, in OpenBSD it is called an "entry"), then you > put a flag in that object's data-structure and you simply check the flag > everytime a change-operation is attempted. It is a flag which gets set, > and checked. Nothing ever clears it (except address space teardown). > > This flag must be put on the data structure that manages VA sub-ranges. > > In our case when a prot/mapping operation reaches low-level code that > will want to change an "entry", we notice it is not allowed and simply > percolate EPERM up through the layers. > > > There could be quite a few things we still need to deal with, to > > completely block the possibility, > > e.g. malicious code attempting to write to a RO memory > > What?! writes to RO memory are blocked by the permission bits. > > > or change RW memory to RWX. > > In our case that is blocked by W^X policy. > > But if the region is marked mimmutable, then that's another reason you ca= nnot > change RW to RWX. It seems so off-topic, to talk about writes to RO memo= ry. > I get a feeling you are a bit lost. > > immutable() is not about permissions, but about locking permissions. > - You can't change the permissions of the address space region. > - You cannot map a replacement object at the location instead (especially > with different permission). > - You cannot unmap at that location (which you would do if you wanted to > map a new object, with a different permission). > > All 3 of these scenarios are identical. No regular code performs these 3 > operations on regions of the address space which we mark immutable. > > There is nothing more to mimmutable in the VM layer. The hard work is > writing code in execve() and ld.so which will decide which objects can > be marked immutable automatically, so that programs don't do this to > themselves. > > I'm aware of where this simple piece fits in. It does not solve all > problems, it is a very narrow change to impact a problem which only > high-value targets will ever face (like chrome). > > But I think you don't understand the purpose of this mechanism. > In linux cases, I think, eventually, mseal() will have a bigger scope than BSD's mimmutable(). VMA's metadata(vm_area_struct) contains a lot of control info, depending on application's needs, mseal() can be expanded to seal individual control info. For example, in madvice(2) case: As Jann point out in [1] and I quote: "you'd probably also want to block destructive madvise() operations that can effectively alter region contents by discarding pages and such, ..." Another example: if an application wants to keep a memory always present in RAM, for whatever the reason, it can call seal the mlock(). To handle those two new cases. mseal() could add two more bits: MM_SEAL_MADVICE, MM_SEAL_MLOCK. It is practical to keep syscall extentable, when the business logic is the = same. I think I explained the logic of using bitmasks in the mseal() interface clearly with the example of madvice() and mlock(). -Jeff [1] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fk= cgnfUGLvA@mail.gmail.com/ > > If, as part of immutable, I also block madvice(), mlock(), which also u= pdates > > VMA's metadata, so by definition, I could. What if the user wants the > > features in > > madvice() and at the same time, also wants their .text protected ? > > I have no idea what you are talking about. None of those things relate > to the access permission of the memory the user sees, and therefore none > of them are in the attack surface profile which is being prevented. > > Meaning, we allow madvise() and mlock() and mphysicalquantummemory() beca= use > those relate to the physical storage and not the VA permission model. > > > Also, if linux introduces a new syscall that depends on a new metadata = of VMA, > > say msecret(), (for discussion purpose), should immutable > > automatically support that ? > > How about the future makingexcuses() system call? > > I don't think you understand the problem space well enough to come up wit= h > your own solution for it. I spent a year on this, and ship a complete sy= stem > using it. You are asking such simplistic questions above it shocks me. > > Maybe read the LWN article; > > https://lwn.net/Articles/915640/ > > > Without those questions answered, I couldn't choose the route of > > immutable() yet. > > "... so I can clearly not choose the wine in front of you." > > If you don't understand what this thing is for, and cannot minimize the > complexity of this thing, then Linux doesn't need it at all. > > I should warn everyone the hard work is not in the VM layer, but in > ld.so -- deciding which parts of the image to make immutable, and when. > It is also possible to make some segments immutable directly in execve() > -- but in both cases you better have a really good grasp on RELRO > executable layout or will make too many pieces immutable... > > I am pretty sure Linux will never get as far as we got. Even our main > stacks are marked immutable, but in Linux that would conflict with glibc > ld.so mprotecting RWX the stack if you dlopen() a shared library with > GNUSTACK, a very bad idea which needs a different fight...