From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C8A7C282EC for ; Sat, 8 Mar 2025 11:10:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D7506B0082; Sat, 8 Mar 2025 06:10:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 161936B0083; Sat, 8 Mar 2025 06:10:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 003396B0085; Sat, 8 Mar 2025 06:10:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D30AF6B0082 for ; Sat, 8 Mar 2025 06:10:03 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B5D291CC6F1 for ; Sat, 8 Mar 2025 11:10:05 +0000 (UTC) X-FDA: 83198114370.14.C48C81C Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf25.hostedemail.com (Postfix) with ESMTP id 0647FA0009 for ; Sat, 8 Mar 2025 11:10:03 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jI558XKL; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf25.hostedemail.com: domain of brauner@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741432204; a=rsa-sha256; cv=none; b=mb8JYet36Vt0md26CZ1DZGP1/r4ICSkQg2ZNQvwPHlXaw7xVRoYbeTFx6S8Yr229luty4L 3JzlsmKLEtSm8+7btknAycufmLeuzG8uaUHPZ+kkNVv6vfr9RARkIzZ6o3U3jRQLyQKG3v 1lYXB51zEW0qO44EFUWuUHQCk6xNvuo= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jI558XKL; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf25.hostedemail.com: domain of brauner@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741432204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7Pz6VrciSx71wEb25sg5vUXxnUNFA9WwPiYqkWeWYKM=; b=3XLxUoXKQE3Gv4lRh+zr4JYb2mGa4IhAVPVUsWAF4qkLznXkLqpAl5i+s1DVq7gOg3DfP2 EP2UdBuRakYbgaXyRx7BHaC3iJI2ztlVH7CdwgX0gdB0N6bXOYsX06BdodpoyP4wmA0gUG Vvmi8yubcsRaLZc0J8Bo4Z/tGx9VlGM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id B9600A45A07; Sat, 8 Mar 2025 11:04:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C1039C4CEE0; Sat, 8 Mar 2025 11:09:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741432202; bh=wjd4lKQpkpaeMS8a8WAbOFRqGjEHqDFGgmCn5GH9lJY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jI558XKLWX4PB6OS6TWHt63q8OOLntXovXrL6g7R2IoQRHOlRc/x7GiCAlKt0cPFy UJ0m3En3iOcOcnQcbaVNuygjBEHCamf6ZkdzcnkQIRvMuUudvdzJWQWRp7TahzpnMt po0NZYAOhJI6nMu6namwhF1JZ8c6L4Uwwsy+Qf0MIpylpTrKc5oTGhRkKIHp9gZqjx vUxCFDU4Ipw5VzgvYnbEX12w67Cv5lQXFJ9DU8slKx5BVC73IiYGs+zAZrXqdfkRlH pt5tWcD9gqJHqEDOAFRx9ppRpRmQMDbjsaWfFTMiOSAlIuSw7Ly4btwIUFTyVT8+VV waKmgfyh1ZNxQ== Date: Sat, 8 Mar 2025 12:09:53 +0100 From: Christian Brauner To: Jason Gunthorpe Cc: Pratyush Yadav , Linus Torvalds , linux-kernel@vger.kernel.org, Jonathan Corbet , Eric Biederman , Arnd Bergmann , Greg Kroah-Hartman , Alexander Viro , Jan Kara , Hugh Dickins , Alexander Graf , Benjamin Herrenschmidt , David Woodhouse , James Gowans , Mike Rapoport , Paolo Bonzini , Pasha Tatashin , Anthony Yznaga , Dave Hansen , David Hildenbrand , Matthew Wilcox , Wei Yang , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox Message-ID: <20250308-wutanfall-ersetzbar-2aedc820d80d@brauner> References: <20250307005830.65293-1-ptyadav@amazon.de> <20250307005830.65293-2-ptyadav@amazon.de> <20250307-sachte-stolz-18d43ffea782@brauner> <20250307151417.GQ354511@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250307151417.GQ354511@nvidia.com> X-Rspamd-Queue-Id: 0647FA0009 X-Rspamd-Server: rspam11 X-Stat-Signature: o94qb5f346eswu1hjfwqs3wug9qgzkzh X-Rspam-User: X-HE-Tag: 1741432203-104873 X-HE-Meta: U2FsdGVkX196EN4W3hyaRRarVFuzW4jeSHJzGntDb9MHNLXl3n/wr57orZvRETtCOewPjQLUjKIMzlRMfrZmZT+vi/HzLnZqPwbjuanRXLkb+aZ9Ct/Vmoic2B8wxOlofQAYWHOuYloIq569RadpQdxp8Et5DVQ28xbukaNwsTScIbQ9ZRZFZsNhVZ5XcsFx+L0denayAsk+IBqEXQ3yVwtQYQ4Q8mAFF0fRd2PjsynovUvcGm3nv3xqF8nOXc6krxLJnj82q9jYMqZhhOXQX3whQVmLkKM/HJ5HGNgKuiMyMEe8LqRdsVZQIb/xhPNkqmcBOz9+LAw09fy/6Pd1cPWQA4Z2n4XQL9Df82h91/yVPU1WpEEasROQE14G6l74/Ic+NiAOpPNby2q5izbCFMA/UQ2wr2m9mkSIKJYf4YHp6uAvYoj3jpiNCU4vn5yGL4xvIvpyxV9eLJNM2gGnFzsvExnIUWYdMpbosB7W69KelDIlKjFPP72UjANl2Y/Gr5X5lWylNO4Qo0K22eCGjwQaHaSj3KuZAbEvjT7xp4rpucB5mmvLX90PY1w9GhlkyuYai5I7SAvWp7+Ny9CTNhnkJGYovA/kFbm3LFrmyZP4jnkzczycUJhjoTeWfMLaQDcqO9XEVTDc7XwOBGrzW1zTo/qD/cIG6aivcyTTz4tYspNon2Eqp4/0zvsL2VRSqoeue9pVI2jewLILHITwRJUX+xKXzFNgN93HrxVfaEcsPMmmqR91KH3Cvw0DL1lWvR8c8xZfXcjy2h1WI2CbeuY279hWtquEQ9fdJGqSCyrg6pDW3QVjn29+CnmKw7Q5g28ObP8FjKV5xDZviZvMhpXV/2sAZzz5hSRlcw8dvwA2+uyN2nqdFS5r35lRy4gDmhsK4XUkTQWyHL9fmpqvwduZtuITzHSpmmAWh0cbcS69lcXN1cSXqHvEE6MqJ09K0Xli+o1Gg4GlBDCmAJ9 km9Ii9z5 JgQQQ6GepDt1xwz6EfyMMnpVibCw5SULuvHF/dgpEqe8RLUDUqFukT3TcJFVBpEYw8Wy3YlmZdRfknVvApMrHNh34fPhEG7KWi01kGZf7+6M3zqu5PvYy9c/swQ6T0ymEKzJVV1n2MTbwDHt9/5Obc1NENE3pM2/ej7r/fqgIno2/Uhss5/3pQNTx1LpE+M61xs2cmDhAYykdYC1ROoMBNR7/JHJU79WN/BFojlX0YB1XN9pGoSRCFHucGttLymC+k/txIzpclCMLA8hh02Q6ZxWksCPcTWF0/Wc7Z+ro9Or2/yPzx6jp+aAy9zr2nYPa7Bm5ql5u2zikRfidj9jxxVA9ytYgwbqfDTRNVyZcMM87Zxtkwogow9BUSw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 07, 2025 at 11:14:17AM -0400, Jason Gunthorpe wrote: > On Fri, Mar 07, 2025 at 10:31:39AM +0100, Christian Brauner wrote: > > On Fri, Mar 07, 2025 at 12:57:35AM +0000, Pratyush Yadav wrote: > > > The File Descriptor Box (FDBox) is a mechanism for userspace to name > > > file descriptors and give them over to the kernel to hold. They can > > > later be retrieved by passing in the same name. > > > > > > The primary purpose of FDBox is to be used with Kexec Handover (KHO). > > > There are many kinds anonymous file descriptors in the kernel like > > > memfd, guest_memfd, iommufd, etc. that would be useful to be preserved > > > using KHO. To be able to do that, there needs to be a mechanism to label > > > FDs that allows userspace to set the label before doing KHO and to use > > > the label to map them back after KHO. FDBox achieves that purpose by > > > exposing a miscdevice which exposes ioctls to label and transfer FDs > > > between the kernel and userspace. FDBox is not intended to work with any > > > generic file descriptor. Support for each kind of FDs must be explicitly > > > enabled. > > > > This makes no sense as a generic concept. If you want to restore shmem > > and possibly anonymous inodes files via KHO then tailor the solution to > > shmem and anon inodes but don't make this generic infrastructure. This > > has zero chances to cover generic files. > > We need it to cover a range of FD types in the kernel like iommufd and anonymous inode > vfio. anonymous inode > > It is not "generic" in the sense every FD in the kernel magicaly works > with fdbox, but that any driver/subsystem providing a FD could be > enlightened to support it. > > Very much do not want the infrastructure tied to just shmem and memfd. Anything you can reasonably want will either be an internal shmem mount, devtmpfs, or anonymous inodes. Anything else isn't going to work. > > > As soon as you're dealing with non-kernel internal mounts that are not > > guaranteed to always be there or something that depends on superblock or > > mount specific information that can change you're already screwed. > > This is really targetting at anonymous or character device file > descriptors that don't have issues with mounts. > > Same remark about inode permissions and what not. The successor > kernel would be responsible to secure the FDBOX and when it takes > anything out it has to relabel it if required. > > inode #s and things can change because this is not something like CRIU > that would have state linked to inode numbers. The applications in the > sucessor kernels are already very special, they will need to cope with > inode number changes along with all the other special stuff they do. > > > And struct file should have zero to do with this KHO stuff. It doesn't > > need to carry new operations and it doesn't need to waste precious space > > for any of this. > > Yeah, it should go through file_operations in some way. I'm fine with a new method. There's not going to be three new methods just for the sake of this special-purpose thing. And want this to be part of fs/ and co-maintained by fs people. I'm not yet sold that this needs to be a character device. Because that's fundamentally limiting in how useful this can be. It might be way more useful if this ended up being a separate tiny filesystem where such preserved files are simply shown as named entries that you can open instead of ioctl()ing your way through character devices. But I need to think about that.