From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BFA6C282EC for ; Tue, 18 Mar 2025 14:25:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02CCE280003; Tue, 18 Mar 2025 10:25:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1E06280001; Tue, 18 Mar 2025 10:25:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBF89280003; Tue, 18 Mar 2025 10:25:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 945CF280001 for ; Tue, 18 Mar 2025 10:25:34 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 89C00803D6 for ; Tue, 18 Mar 2025 14:25:36 +0000 (UTC) X-FDA: 83234895072.27.D3AF026 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf17.hostedemail.com (Postfix) with ESMTP id B514F40007 for ; Tue, 18 Mar 2025 14:25:34 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=eM8GGehu; spf=pass (imf17.hostedemail.com: domain of brauner@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742307934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oCmUU37RFA28L3WpdgFI/QbjJNm+Pno5sfSsKFa4p1g=; b=2H/rep/QZHhbaGurhhRHMQAxZWMdON7o4HtTA0XNay3Lf1+MvUfURRUdpW6pdTcLe/l62y PtsSZcbmdejYxgPoOr9tDe7ba9DNN5rmvmjMO6FfRc5l6QUjPitCQlNG1m41eFw/72+Rg/ 5Vtg7PDduaF9fb7YTs4zt3vJUIHnn+c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742307934; a=rsa-sha256; cv=none; b=Fl2TpzQxm5rEzOLRTIeXPMGJUMYBEIM224T76RW7DmEgDgpo3bl/J3finzWCFi6uPKZlSt L8vhg4wErzW/zHpPW7u8xa+0twC/MWZaUETgdh1JkDy7+DyLnfoY12N5WlANu3GcxGIcM1 sesy1IWZCq/YCcqSIy/C6Itnw/BnfI8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=eM8GGehu; spf=pass (imf17.hostedemail.com: domain of brauner@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 2934DA48782; Tue, 18 Mar 2025 14:20:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9CDB4C4CEE3; Tue, 18 Mar 2025 14:25:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742307933; bh=nZIVy8rDifxs0gFezfDSsyY7SSuvanD4bNKBzieVH6U=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=eM8GGehuJpGvKpivoD4QSR+KNpQOrsTrQe+Eub+aqiMOwfDX+Sd0MybIJqruOhLex biGvEmTGReQmgJ0IFMiGSlsFCq55zmnjRdh9Dlq8a6caIHoCMo3/rPvEJL08UzkuDN toN9B7pWXLt+twkHsRTlar9Rfb66Cl4bQcgCIOS8cFxXk2uIFCHgdYG8XQb4ijRSm1 w0PNZNXJFIdg+wPtVYcf+3mmQxVsJ26tSr9ThAZwKrW6wApcHfaFvTr2sYnUovdUby vejqSYllChZiA15Jx6PSxONI+qOkHqf4Bvr4azaaeG/UCSBsE68OPkVC5lNPRzkyLS 65BarMYc5GzlQ== Date: Tue, 18 Mar 2025 15:25:25 +0100 From: Christian Brauner To: Jason Gunthorpe Cc: Pratyush Yadav , Linus Torvalds , linux-kernel@vger.kernel.org, Jonathan Corbet , Eric Biederman , Arnd Bergmann , Greg Kroah-Hartman , Alexander Viro , Jan Kara , Hugh Dickins , Alexander Graf , Benjamin Herrenschmidt , David Woodhouse , James Gowans , Mike Rapoport , Paolo Bonzini , Pasha Tatashin , Anthony Yznaga , Dave Hansen , David Hildenbrand , Matthew Wilcox , Wei Yang , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, kexec@lists.infradead.org Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox Message-ID: <20250318-toppen-elfmal-968565e93e69@brauner> References: <20250307005830.65293-1-ptyadav@amazon.de> <20250307005830.65293-2-ptyadav@amazon.de> <20250307-sachte-stolz-18d43ffea782@brauner> <20250309-unerwartet-alufolie-96aae4d20e38@brauner> <20250317165905.GN9311@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250317165905.GN9311@nvidia.com> X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B514F40007 X-Stat-Signature: 3415pufj4htef8cwqq8mosbekxizbaa3 X-HE-Tag: 1742307934-980775 X-HE-Meta: U2FsdGVkX1/91+dLJQFQycP6lWxY3L/0demmaZ7viBgJtd98HQOFY/YV0G/LFUPbe4ADCeYzx7sOJg/5ssNVMprGDvqHKe9UCw43dZ2drLymddG2ek7KzxN8CASZ8WhtQ+fW3gnO+OHsWWb9PMPN2vTUr3D8BBtkq8hgs2ea8Uz5kzPxTUiHcENLeO51JdQTamZgCinMXpfBE/sdE5Vjbd08qAmyO+fJsbzpDfJN25UCYC2uTssUqq6KAii/++lfEoYtcidsnS2a0bXy94Ct8fsLxo0Qe4L5Fmetp+ux3PgPPSMAp8Mkzic//2nybf3/EEn4EZ8RRSNPJrHPy/U6o5TFiRZjdALAoowU85gzpkIOt+Sjk4roA5Ay9/ex930Iw97EhNxmOnGaV8XZuDXmlfuv0wD+vYmsWLV6xcebsPDPfPzdyl/6m95Cm8PX2FCESXHkUEm8Fpy4LPt8d76iPch1YO3NDpDbNzA+Ob7ZCsL5LLKAMOli4v8wj4dEp4p+xvrjb4tG8W1lO3J5g4aBMLGNA+DFPM2XmLMAqc8PgyzcQ6V4YivtJpz+6aCU8kQfxcq+Iz/YPPH/BIdBR/gQTlxT72UNwJvXMlZJi+o7QLAOYLgwxG3/RjkN3+Jgrbzt4e5EIGvrch8aVsB6XndhCCtqxvrPIHOwlBclO7aZBmGHNb4XWAjdWUr5QpOwnnjeFMUA7Cfq6WSYSuoZrfTbY+VGV9zBderF8kFIvtJt7HRMAeDXGm16aUKbGjtfAVmEFL/BF3fd0V8FnpORgKHK5h/dAwuzmKDOqwmvvnYcDUbp1wVGXRFnL/cpBcbqlKRiKWdkQWZ6huumvde0P1W2phSJpXwZWugCtquoDRgSETUd5/WTZb5hNbX/FfSAyTDvZQRiJb/LZ2flXQ0J7dzZOqDqZifyuwj2zMs7g4226lTUdLnGVfNceY9wJ04hYxUBK4WXut/aN+KDEdmFvo6 cRgY0CH1 dHRSt1gjemf0D1XouPB8uFGFoAbKoYhjnNN/vhM78ukEmeH09U7rmbv9ckIhkFCElvyc8znIWShfjMdEZxAmBy26v3o6BZGZQ1DmFspU9MlSEjt+uYCzqt7A5mwo8eC94hiBmu/DlQaKSKNlCcY7U6YOGxrlVKp2rbCLSv9yfIpgACH6Hbj5A99ThSWoZ6XhH/khwhDWQuMnq03fJUg2K/BEHUDZfgDQb5FHYGqr+VvZZwDuCM0iWh0coYdYzbZ+b0zliu6mW4t5zen4/uu+u/9pEk1wEtHM4FjrX4k4ywBn+IdhX13xw2AJjDELJeympLO06jblfpjRYwTdKq/dyetSrSsZmN1s4ct9q5t/v/d7R5vgzAZyr/JiZVg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 17, 2025 at 01:59:05PM -0300, Jason Gunthorpe wrote: > On Sun, Mar 09, 2025 at 01:03:31PM +0100, Christian Brauner wrote: > > > So either that work is done right from the start or that stashing files > > goes out the window and instead that KHO part is implemented in a way > > where during a KHO dump relevant userspace is notified that they must > > now serialize their state into the serialization stash. And no files are > > actually kept in there at all. > > Let's ignore memfd/shmem for a moment.. > > It is not userspace state that is being serialized, it is *kernel* > state inside device drivers like VFIO/iommufd/kvm/etc that is being > serialized to the KHO. > > The file descriptor is simply the handle to the kernel state. It is > not a "file" in any normal filesystem sense, it is just an uAPI handle > for a char dev that is used with IOCTL. > > When KHO is triggered triggered whatever is contained inside the FD is > serialized into the KHO. > > So we need: > 1) A way to register FDs to be serialized. For instance, not every > VFIO FD should be retained. > 2) A way for the kexecing kernel to make callbacks to the char dev > owner (probably via struct file operations) to perform the > serialization > 3) A way for the new kernel to ask the char dev owner to create a new > struct file out of the serialized data. Probably allowed to happen > only once, ie you can't clone these things. This is not the same > as just opening an empty char device, it would also fill the char > device with whatever data was serialized. > 4) A way to get the struct file into a process fd number so userspace > can route it to the right place. > > It is not really a stash, it is not keeping files, it is hardwired to Right now as written it is keeping references to files in these fdboxes and thus functioning both as a crippled high-privileged fdstore and a serialization mechanism. Please get rid of the fdstore bits and implement it in a way that it serializes files without stashing references to live files that can at arbitrary points in time before the fdbox is "sealed" be pulled out and installed into the caller's fdtable again. > KHO to drive it's serialize/deserialize mechanism around char devs in > a very limited way. > > If you have that then feeding an anonymous memfd/guestmemfd through > the same machinery is a fairly small and logical step. > > Jason