From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C28A8C76196 for ; Fri, 31 Mar 2023 23:56:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A9576B0072; Fri, 31 Mar 2023 19:56:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3322A6B0074; Fri, 31 Mar 2023 19:56:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AB506B0075; Fri, 31 Mar 2023 19:56:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 047B06B0072 for ; Fri, 31 Mar 2023 19:56:15 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D28621602ED for ; Fri, 31 Mar 2023 23:56:14 +0000 (UTC) X-FDA: 80630854668.20.D843743 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf05.hostedemail.com (Postfix) with ESMTP id 1C42A100008 for ; Fri, 31 Mar 2023 23:56:12 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WIuiZiUm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of 3G3MnZAsKCMYmowq3xqA5zss00sxq.o0yxuz69-yyw7mow.03s@flex--ackerleytng.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3G3MnZAsKCMYmowq3xqA5zss00sxq.o0yxuz69-yyw7mow.03s@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680306973; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=sG6q4pkZb1PQl0aPgzbGY4ON8LGmvMRz4r7etWkMq4s=; b=cpuGcHNtg7OepQT1yl1aQU8jdDCO7FyIU7cUMxT682eSMfCFvmKKjCGPEwSyiZ0x8kBzIS /8qzg1FQDZeoCgB3hQZaZBBH5mUS0rNN4sOnUuLArf3RIZZCI8JsddPPLePowKAwLKx3vC YASof5iBSmfm0QLTOCgxuaTBvpiQ/Ys= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WIuiZiUm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of 3G3MnZAsKCMYmowq3xqA5zss00sxq.o0yxuz69-yyw7mow.03s@flex--ackerleytng.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3G3MnZAsKCMYmowq3xqA5zss00sxq.o0yxuz69-yyw7mow.03s@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680306973; a=rsa-sha256; cv=none; b=YQnYDRa8EFdbFG+U2xNY3tFAF3jpSi0Vrnjvzqk5opE7AdIf2dT1n/yu7A9TkLhzwkF9nk Z0vPhX5jQyBrvjbk9LeocI2I9AOg8f09x8mcut/sOLQbzDGFKVkVV3WV1VVgPJyf9zT5a5 eQaNIKAy7tu9ZTu2FJ1RjkkDbrqXqw4= Received: by mail-pl1-f201.google.com with SMTP id c8-20020a170902d48800b001a1e0fd4085so14033564plg.20 for ; Fri, 31 Mar 2023 16:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680306972; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=sG6q4pkZb1PQl0aPgzbGY4ON8LGmvMRz4r7etWkMq4s=; b=WIuiZiUmBrkp2YhuCurDCgSkBZEc7Z5OqkyPiskiC1yk+Wn7xr3OVMZ6bTJT2jChGU gYB3NqXz1dSf2AK0Vo7H04PmxQ2u6f2sPigBZplbEjbqCCL6kOJu6i9T+PAkceBzrlJK WdiA7iiJbAefAXUQlwEUCSWtlz4NyYo/TCW/dIFBgPN/OjRvLI9C9dzcKG2rYvPRaJgR eledUJCSJqGBcVI+cf/4W7IKSTdWIRYbNu9Q8zMLaAw28LCuE9DmDJwYjnijOfTNWhW2 /dZLVFsSEyDqcwzoQeov9nRjWgfWRhWwoN1voUhGP/RLTxMxylIl+dFWMDFWmYA60zlX wzcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680306972; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sG6q4pkZb1PQl0aPgzbGY4ON8LGmvMRz4r7etWkMq4s=; b=A136QXe/ekL/+jvB6adTLG0rk3ZV6Xyh0guW2uXIqZ2Rm/GQhjy+gfpUd+xECZXPTO MbOlIUE1L59FgNS/uxkAMxfb6wgKFxYbRI755fVj6TQu6+vQ/eH7Awwj8z9xihS6Hfiw sWI5I65irixYfns58ySo5/YUXWOVIZt/1P6g33qt5vVLgLKuJlntV3Q5NGnkGHOMOmbA eB+kezQnylIkjkq5c6uIkPcr1E6C7r7F2mO05riVMXBp2wljv0fslY9aXE0sLquMoNEQ CXku8PGZHp5zLXBd/Gu/Do2BUvlHdsu/6mej1CpYpJHdVFTmYfLaDz3SIwokELq0tBEE bqXQ== X-Gm-Message-State: AAQBX9fZoz/e9ijq79Jlkv5SGbbWLoSQJfCzS+AgZ8sgyw/tZ4cuX61R to1FdNjIYxnqkolFIyjMMD0OlEzNspNEGuBBxA== X-Google-Smtp-Source: AKy350bwo4gDmt6qvVmklscGtMe+0GABAuLYyXXB8b22LaSZK6d4xB1fPVTRAuoUYPissJnWP1PA7ngfyjHOn7h8gg== X-Received: from ackerleytng-cloudtop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1f5f]) (user=ackerleytng job=sendgmr) by 2002:a63:c042:0:b0:513:1281:2796 with SMTP id z2-20020a63c042000000b0051312812796mr8266653pgi.11.1680306971953; Fri, 31 Mar 2023 16:56:11 -0700 (PDT) Date: Fri, 31 Mar 2023 23:56:10 +0000 In-Reply-To: <20230322111951.vfrm2xf4o5kmtte6@wittgenstein> (message from Christian Brauner on Wed, 22 Mar 2023 12:19:51 +0100) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v2 1/2] mm: restrictedmem: Allow userspace to specify mount for memfd_restricted From: Ackerley Tng To: Christian Brauner Cc: kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, aarcange@redhat.com, ak@linux.intel.com, akpm@linux-foundation.org, arnd@arndb.de, bfields@fieldses.org, bp@alien8.de, chao.p.peng@linux.intel.com, corbet@lwn.net, dave.hansen@intel.com, david@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, hpa@zytor.com, hughd@google.com, jlayton@kernel.org, jmattson@google.com, joro@8bytes.org, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, luto@kernel.org, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, mingo@redhat.com, naoya.horiguchi@nec.com, pbonzini@redhat.com, qperret@google.com, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, tabba@google.com, tglx@linutronix.de, vannapurve@google.com, vbabka@suse.cz, vkuznets@redhat.com, wanpengli@tencent.com, wei.w.wang@intel.com, x86@kernel.org, yu.c.zhang@linux.intel.com Content-Type: text/plain; charset="UTF-8"; format=flowed; delsp=yes X-Rspamd-Queue-Id: 1C42A100008 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 9yjhp6b9tder7dmmpgumps5wtkyuk7mm X-HE-Tag: 1680306972-451199 X-HE-Meta: U2FsdGVkX18UShbB0paWlOPZ4IIUJNscwegY5lAAJ1K0wyTWYNb0skaB6In9MFXa1wnBl5s9f/dUoqeENR4r+NZeZg7Tr1x26zQjbc0GVaU6Kp6UXFO37S8mQPLncc1ryNRx4pjXqz9SdUGAnGvmA8p2ODNI1R4orYj+n/EMOxS2EEhvbfVwcFWGRy4K1K8lQhOVHipiguKr9hAFeleDWaVe8F/MwhyicdWLYwgB2hpZ4SGlUGjjATLw/m7s8H0VRQ9WnXHX8bWP+AqtHY9uEnmF/7SI+rK13UCT8jJXjEa7uViq/CXKMARVRTJHh6FuEuC2oZdAJC5DO/6LvwwYNXSM+rw6q+w8lsBH7eHqevhRncdXr1BbK1nvkEW/HMuClPt16lBJfl0af+P4wPi5CWHDsRagAc5yG+JPNM6S97dMFsCoXUz9Ljh+b+nat02SSJoSSBBYJxjXcULHIfn4lYyvZUHE8VYhTDOuAnFxqtzLhZk6zrQOUPhPaUfAMCzP8QuCsOkJWttKYZY48OwA7HkUKWEX5PFy1IISLVg97VfGIOJwaWecmy7oKoS3sy4957SumzSzgV89NDR8e88SWQc/rUMr1BZx8Oy4hAW3ckhzFHDOD9sEV97OFgSPWSvQwaQRSBsbBAUecX7G4TXzQpCIF+i+vSuwcKJ1kLgkCtx3OgSx62VdFC/j8ZYc4KV5yXkYLwPEVTiE5BrbiL/rKpWttpD2+rP/2risNtdXBEN7TPdZv4lS0fuJin6K88kLfo45AQEfNnvaMFJ5Jg3K7i5lnu7HIF+ln7NTElH8AlDdq0eGHkxaiSi61fvYfZot6bzEMrH2VTsRmujDijfBC5JtyDlYhFXDkihMsWgV3tuUmPjwHxyjrjp7A4Sq2IEOtHmRMjaZlqYItli2CT/e+N9SUPjiotYr3AgAberOR+R+ROWxD7t4IP4ctvYh9v9mv6USwG1nGqtNis2lHFi UhoJk/7G 07WnhPO8W+ImHjkMq+dNwd1VQNGVTFOhAcpySrXFL9F8mdr2mjjwwpy4O9leoo4g+lKhywpRzneGGlBw2ZieBDuVzpWR8auy1W6o7tqvaNNO1LiuWoOXZZF5ayv0sLn0+sjb3o/sH6yanZ1I/hC1MopxioH8iA75lrqd/yoQSVO+2nTy/uyQ+KJa8MlEhiavH9IMF73NwJ81R/aiKNJgyZasSETzy5G3CFJWy2DUssjjB1QQf5YYonPrfOSVEhE+LRnWAU9Lf3mwAcOatM2FpaIWApeb7h4lh66NbOt/rckEOFmr17dqPJrpMfBLcNiMWksdbQbdf9Gwhgd5nrH+yriMxaB+Gx98XkdYomG+7+xJrvUaJmQnwhEmWqjDvQ3tmadJHrtvLHnZbrKoqpBJucKm2qdZsJuYTRlj8ceiNTlPsgmI2zNWXEkkPtt8H7a4fxnuVXxZn23H/3LQ49TZ1XUmiKzRAGSWpzCqJXupnhTmTikzBmcjkWKpC4u67C3/YkCfS0zd69DgBZVv0wg2eW7kX+l2gDrAWCt+P+4fIRAkyacOYXAfz/oP09Gn9yuNt89g549ct9Q3kyYVGDLKIbiBNSIMox6DguGK2CYBkKXvgWZUl0Mmr+YckMBMSaE8pKhKX2WX7RNK6LqI2593DdlTe4FPN6Nc8+MpE5yovfHvV+nxuhKJ7tx00+LOjtel0LCr8sy1rLBPBXLawkX3EcB2Tfw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Christian Brauner writes: > On Tue, Mar 21, 2023 at 08:15:32PM +0000, Ackerley Tng wrote: >> By default, the backing shmem file for a restrictedmem fd is created >> on shmem's kernel space mount. >> ... Thanks for reviewing this patch! > This looks like you can just pass in some tmpfs fd and you just use it > to identify the mnt and then you create a restricted memfd area in that > instance. So if I did: > mount -t tmpfs tmpfs /mnt > mknod /mnt/bla c 0 0 > fd = open("/mnt/bla") > memfd_restricted(fd) > then it would create a memfd restricted entry in the tmpfs instance > using the arbitrary dummy device node to infer the tmpfs instance. > Looking at the older thread briefly and the cover letter. Afaict, the > new mount api shouldn't figure into the design of this. fsopen() returns > fds referencing a VFS-internal fs_context object. They can't be used to > create or lookup files or identify mounts. The mount doesn't exist at > that time. Not even a superblock might exist at the time before > fsconfig(FSCONFIG_CMD_CREATE). > When fsmount() is called after superblock setup then it's similar to any > other fd from open() or open_tree() or whatever (glossing over some > details that are irrelevant here). Difference is that open_tree() and > fsmount() would refer to the root of a mount. This is correct, memfd_restricted() needs an fd returned from fsmount() and not fsopen(). Usage examples of this new parameter in memfd_restricted() are available in selftests. > At first I wondered why this doesn't just use standard *at() semantics > but I guess the restricted memfd is unlinked and doesn't show up in the > tmpfs instance. > So if you go down that route then I would suggest to enforce that the > provided fd refer to the root of a tmpfs mount. IOW, it can't just be an > arbitrary file descriptor in a tmpfs instance. That seems cleaner to me: > sb = f_path->mnt->mnt_sb; > sb->s_magic == TMPFS_MAGIC && f_path->mnt->mnt_root == sb->s_root > and has much tigher semantics than just allowing any kind of fd. Thanks for your suggestion, I've tightened the semantics as you suggested. memfd_restricted() now only accepts fds representing the root of the mount. > Another wrinkly I find odd but that's for you to judge is that this > bypasses the permission model of the tmpfs instance. IOW, as long as you > have a handle to the root of a tmpfs mount you can just create > restricted memfds in there. So if I provided a completely sandboxed > service - running in a user namespace or whatever - with an fd to the > host's tmpfs instance they can just create restricted memfds in there no > questions asked. > Maybe that's fine but it's certainly something to spell out and think > about the implications. Thanks for pointing this out! I added a permissions check in RFC v3, and clarified the permissions model (please see patch 1 of 2): https://lore.kernel.org/lkml/cover.1680306489.git.ackerleytng@google.com/