From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2455C27C6D for ; Wed, 16 Aug 2023 22:47:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51EFB28002A; Wed, 16 Aug 2023 18:47:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A6DE280029; Wed, 16 Aug 2023 18:47:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3206528002A; Wed, 16 Aug 2023 18:47:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1C00A280029 for ; Wed, 16 Aug 2023 18:47:21 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D7FBC140F63 for ; Wed, 16 Aug 2023 22:47:20 +0000 (UTC) X-FDA: 81131455440.07.E858CE2 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf16.hostedemail.com (Postfix) with ESMTP id E971A18000B for ; Wed, 16 Aug 2023 22:47:18 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=wIBTRZDk; spf=pass (imf16.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692226039; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gmAfMTPz4eh29CLf8sk04qA9x8l/5iZhtjn1NdNuQus=; b=TnmvzGn3y4FZtCqoorZX5uYKsiQDqS1NnNDhmcqJYPuo7QJVShu4h80dcdWMCu56T900xf Vzt8m6LWA4ZA++/Z50oCUJGnGbD/eNUvhRwvyLnq+IqLKFRkRtMZu+7xElKi55FyeF71vT tMyPW9QK3l+DA9MApIsMu1FTIt3hguI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692226039; a=rsa-sha256; cv=none; b=c3D4MeNWyPM2IBzzeuk9ZY90AdPl9A+Jp1lFxXMlYxiug0f1USwVBkEv7ZAIyGfwRNZB31 ymZoIqpkdnAw9msOrfDu9Xjvd1Zm7DycWtZ+GZwV+7kTNj2Q2J/qecCclXCWiC5nQ6y5tL 4c6fN8FJ0w8YSRkFZddSmXdMzcRULFo= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=wIBTRZDk; spf=pass (imf16.hostedemail.com: domain of jeffxu@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-40a47e8e38dso61281cf.1 for ; Wed, 16 Aug 2023 15:47:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692226038; x=1692830838; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gmAfMTPz4eh29CLf8sk04qA9x8l/5iZhtjn1NdNuQus=; b=wIBTRZDkz6UQoktGyqxoTeEQiN6IHsdC3DKvn5H/h2iBRaohFe6JXoyF0NoAg/2tBp pRlR+f/oy57QNKyAN14VMwjBN74w57rH/wr/BPlQeKkBa94KsFYRDmoYjCTp8kCV6hzO DoGfbSoSyB9LY2seA1Dqat7bdi9sSxxPWwU1cnXjLX5zzGwDp7FWLBUSyJX7sehzokvT GAUMhsc9pZ1Ooyt7lUb0tKj/6Nk4Xi5Q5zftKU53KqgbmokxHTGTOWITXYBDivdT4Ob9 l/z7ZDuqhiAy8tYSstQMSnht+l47Z0hVa9IsskRBLmlp7Ij+82KBTY14k7C8V66GPHLc dxhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692226038; x=1692830838; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gmAfMTPz4eh29CLf8sk04qA9x8l/5iZhtjn1NdNuQus=; b=XSV+Wl4Ve+cgUE75tIjaRlFwtO1hVrr8T9cZ66m2LxlB7fYoU6OI1ug4F1NZeXDt34 VONn2b4HWAFAgXMxtud9gsPZSp5knuFx1i/4ibweoFnYtzi3kWdnqS1aAdoiH4bIsso4 LYyPY0+6+Xj3sBkQ6cStRErEBwAt7X9Qx7f2Mr1p2hLqL5m6j5FC9VXh0xo2a3f3ZQyv NrZSHY4gTxpHl8X/GmQuXQV5lmUYbiz7JVqx+oIeHP/W6ZV/uDDYQkpUH2sgmWP7np0y QBlxQLjJr/FfeF4i3BKV7oadBzryHd1Pyvdb12bGAkPakyQICiIqnKnCZb8KHwoKWXh3 9ZyQ== X-Gm-Message-State: AOJu0YwTIO/yWYeAt/yUoST2IvTnonxLLWPP7NcF6TAzfUgOleXmLPqx yhmfQXMQKRQETVaD8fypojsErfjL4CMm1SLGQz3dEA== X-Google-Smtp-Source: AGHT+IGKhVtNgZhUpSQYcnybxaqwvR/y997m77QY4GFoPXa9zfEq+ZbGp4ISpTVoDw/ZKneDf0Bl3pDyVZg2gBas0+c= X-Received: by 2002:ac8:118b:0:b0:40f:ec54:973 with SMTP id d11-20020ac8118b000000b0040fec540973mr93254qtj.22.1692226037853; Wed, 16 Aug 2023 15:47:17 -0700 (PDT) MIME-Version: 1.0 References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> <20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e10ba6@cyphar.com> In-Reply-To: From: Jeff Xu Date: Wed, 16 Aug 2023 15:46:41 -0700 Message-ID: Subject: Re: [PATCH v2 4/5] memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy To: Dominique Martinet Cc: Aleksa Sarai , Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Christian Brauner , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E971A18000B X-Rspam-User: X-Stat-Signature: emewew8k5x7mit7uy8ybzxeau1cpszca X-Rspamd-Server: rspam03 X-HE-Tag: 1692226038-208937 X-HE-Meta: U2FsdGVkX19jeLVmbra4nN7wNxkT+xuAcUH7nI/iQTCv73GbmOEqftjBBlh4nGw7TST5bsweqIcDsNp7d4ZQqt0uNMXFK1iJp1oVmYnckhDzYhd9yzClXNrJZOQNl8aXXsXITG9NiivjCtWBE1z0fPCCXjJKTzBNdG2yMC9FYMF6JFrf+oD4PaC2gcKB/ptKn0/Nsa5y57L8BTV5Fn0aopdiG3oR+CWP2b61is6sSeSYEMoINij/ksXQkM8WtJ4IoTttwlOupfLK93zXHWjRESaIw6BxmJJ9/NuHH7C2zywPLOTiDeI4oTxSg5ayfqyiivH12lerb0eBTjFGGD0BNAnOj5vLHKgvUHr2LkX9CxRCL/0ju9ROWwlbzBkPRgHfchJqky+c7h9nb1P6PyUeCQJ+tWYY12AFwi4jOzKiu6Ij+lIxdlxS1hNUoKd69zUqXzktPCPZgHMlQ8EEsiKke7A/HNOuFEN4Oukc4HnBFn0h+Q316Mi9IaI3VNuyL1QYdr7jdKdLtkiau43auNw7tylsHxgOnx842TbfU3MTRO769D2P4GQByev1P2LLczkMk7Iulznr9rHeDbY/Csd9WQyJohdfobqxIv3Rmn7nYeGw1hb+yUg7vbyoWft7pP5oy6TRnRNeLHquAEbLX5/klRRSAhxCfbWKTZDozMRCGx43ONLrNZ1bDZFCqno2O2PpiAht9kRd8ILSUuUXLN39ileCvH6eMG4EvwBIYU9i7H+ZPj3sD/McBxeyvfMu+PTMoxoV4pY+cdkdkjSET1d7dXdkPqCTA0ka1xHiZzcn3TN7448gK58rLHLW1MsxzVRCKdcjx3tcBGVgerk6CDUd66MvP7EoxuwGwTo6FXxtxNGGsPZQS3R5XEu2mXZLIDK18aLxPp7nFJUs7DkeGLYe2SeOZzVRxDEGMOrdPvud7usI5cSbvfYvkE9dLwOv2Gj1/Eht6gP7Ao/Zpd0R+8D /6zOEnjl uag0McZKZbXrl0kTMny5uzaOyJl25pOqeU3qQTS4BOymxdoK7ExrIAh9hNTMtHRbE/e4LsllrGSZ7W8GnR2ihfrZWHRHDTjFssJdWBFciXC1+XFv8+9YZe2cOj5t4nYxM90ZCpKTR8aMlnhArglQcH5yT7V5n4ynOZBJskT9JadtME03IgNFAgabJ65DvKZNaZFd5/HaHzDWd7KR97AOIQNNAiTusRTTcm7f/QV14v2m3X7F6gFN+W7U0sq3N1q6dGgYIwF0NF1cCLhKSrBnxVeLag3/Spo8LMFx/n1Pnp2agZMPIvLBQwv820M9EYB8es0+K3fqVhCe6wOh8IngEAA/jow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 15, 2023 at 10:44=E2=80=AFPM Dominique Martinet wrote: > > Jeff Xu wrote on Tue, Aug 15, 2023 at 10:13:18PM -0700: > > > Given that it is possible for CAP_SYS_ADMIN users to create executabl= e > > > binaries without memfd_create(2) and without touching the host > > > filesystem (not to mention the many other things a CAP_SYS_ADMIN proc= ess > > > would be able to do that would be equivalent or worse), it seems stra= nge > > > to cause a fair amount of headache to admins when there doesn't appea= r > > > to be an actual security benefit to blocking this. There appear to be > > > concerns about confused-deputy-esque attacks[2] but a confused deputy= that > > > can write to arbitrary sysctls is a bigger security issue than > > > executable memfds. > > > > > Something to point out: The demo code might be enough to prove your > > case in other distributions, however, in ChromeOS, you can't run this > > code. The executable in ChromeOS are all from known sources and > > verified at boot. > > If an attacker could run this code in ChromeOS, that means the > > attacker already acquired arbitrary code execution through other ways, > > at that point, the attacker no longer needs to create/find an > > executable memfd, they already have the vehicle. You can't use an > > example of an attacker already running arbitrary code to prove that > > disable downgrading is useless. > > I agree it is a big problem that an attacker already can modify a > > sysctl. Assuming this can happen by controlling arguments passed into > > sysctl, at the time, the attacker might not have full arbitrary code > > execution yet, that is the reason the original design is so > > restrictive. > > I don't understand how you can say an attacker cannot run arbitrary code > within a process here, yet assert that they'd somehow run memfd_create + > execveat on it if this sysctl is lowered -- the two look equivalent to > me? > It might require multiple steps for this attack, one possible scenario: 1> control a write primitive in CAP_SYSADMIN process's memory, change arguments of sysctl call, and downgrade the setting for memfd, e.g. change it=3D0 to revert to old behavior (by default creating executable memfd) 2> control a non-privileged process that creates and writes to memfd, and write the contents with the binary that the attacker wants. This process just needs non-executable memfd, but isn't updated yet. 3> Confuse a non-privilege process to execute the memfd the attacker wrote in step 2. In chromeOS, because all the executables are from verified sources, attackers typically can't easily use the step 3 alone (without step 2), and memfd was such a hole that enables an unverified executable. In the original design, downgrading is not allowed, the attack chain of 2/3 is completely blocked. With this new approach, attackers will try to find an additional step (step 1) to make the old attack (step 2 and 3) working again. It is difficult but I can't say it is impossible. > CAP_SYS_ADMIN is a kludge of a capability that pretty much gives root as > soon as you can run arbitrary code (just have a look at the various > container escape example when the capability is given); I see little > point in trying to harden just this here. I'm not an expert in containers, if the industry is giving up on privileged containers, then the reasoning makes sense. >From ChromeOS point of view, we don't use runc currently, so I think it makes more sense for runc users to drive these features. The original design is with runc's in mind, and even privileged containers can't downgrade its own setting. > It'd make more sense to limit all sysctl modifications in the context > you're thinking of through e.g. selinux or another LSM. > I agree, when I think more about this. Security features fit LSM better, LSM can do additional "allow/deny" on otherwise allowed behavior from user space code. Based on that, "disallow downgrading" fits LSM better. Also from the same reasoning, I have second thoughts on the "=3D2", originally the "MEMFD_EXE was left out due to the thinking, if user code explicitly setting MEMFD_EXE, sysctl should not block it, it is the work of LSM. However, the "=3D2" has evolved to block MEMFD_EXE completely ... alas .. it might be too late to revert this, if this is what devs want, it can be that way. Thanks Best regards, -Jeff -Jeff > (in the context of users making their own containers, my suggestion is > always to never use CAP_SYS_ADMIN, or if they must give it to a separate > minimal container where they can limit user interaction) > > > FWIW, I also think the proposed =3D2 behaviour makes more sense, but this > is something we already discussed last month so I won't come back to it > as not really involved here. > > -- > Dominique Martinet | Asmadeus