From: Jason Gunthorpe <jgg@nvidia.com>
To: Pratyush Yadav <ptyadav@amazon.de>
Cc: Christian Brauner <brauner@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
Eric Biederman <ebiederm@xmission.com>,
Arnd Bergmann <arnd@arndb.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Hugh Dickins <hughd@google.com>, Alexander Graf <graf@amazon.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Woodhouse <dwmw2@infradead.org>,
James Gowans <jgowans@amazon.com>,
Mike Rapoport <rppt@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Pasha Tatashin <tatashin@google.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Dave Hansen <dave.hansen@intel.com>,
David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Wei Yang <richard.weiyang@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, kexec@lists.infradead.org
Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox
Date: Tue, 18 Mar 2025 20:27:27 -0300 [thread overview]
Message-ID: <20250318232727.GF9311@nvidia.com> (raw)
In-Reply-To: <mafs0a59i3ptk.fsf@amazon.de>
On Tue, Mar 18, 2025 at 11:02:31PM +0000, Pratyush Yadav wrote:
> I suppose we can serialize all FDs when the box is sealed and get rid of
> the struct file. If kexec fails, userspace can unseal the box, and FDs
> will be deserialized into a new struct file. This way, the behaviour
> from userspace perspective also stays the same regardless of whether
> kexec went through or not. This also helps tie FDBox closer to KHO.
I don't think we can do a proper de-serialization without going
through kexec. The new stuff Mike is posting for preserving memory
will not work like that.
I think error recovery wil have to work by just restoring access to
the FD and it's driver state that was never actually destroyed.
> > It sure would be nice if the freezing process could be managed
> > generically somehow.
> >
> > One option for freezing would have the kernel enforce that userspace
> > has closed and idled the FD everywhere (eg check the struct file
> > refcount == 1). If userspace doesn't have access to the FD then it is
> > effectively frozen.
>
> Yes, that is what I want to do in the next revision. FDBox itself will
> not close the file descriptors when you put a FD in the box. It will
> just grab a reference and let the userspace close the FD. Then when the
> box is sealed, the operation can be refused if refcount != 1.
I'm not sure about this sealed idea..
One of the design points here was to have different phases for the KHO
process and we want to shift alot of work to the earlier phases. Some
of that work should be putting things into the fdbox, freezing them,
and writing out the serialzation as that may be quite time consuming.
The same is true for the deserialize step where we don't want to bulk
deserialize but do it in an ordered way to minimize the critical
downtime.
So I'm not sure if a 'seal' operation that goes and bulk serializes
everything makes sense. I still haven't seen a state flow chart and a
proposal where all the different required steps would have to land to
get any certainty here.
At least in my head I imagined you'd open the KHO FD, put it in
serializing mode and then go through in the right order pushing all
the work and building the serializion data structure as you go.
At the very end you'd finalize the KHO serialization, which just
writes out a little bit more to the FDT and gives you back the FDT
blob for the kexec. It should be a very fast operation.
Jason
next prev parent reply other threads:[~2025-03-18 23:27 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-07 0:57 [RFC PATCH 0/5] Introduce FDBox, and preserve memfd with shmem over KHO Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 1/5] misc: introduce FDBox Pratyush Yadav
2025-03-07 6:03 ` Greg Kroah-Hartman
2025-03-07 9:31 ` Christian Brauner
2025-03-07 13:19 ` Christian Brauner
2025-03-07 15:14 ` Jason Gunthorpe
2025-03-08 11:09 ` Christian Brauner
2025-03-17 16:46 ` Jason Gunthorpe
2025-03-08 0:10 ` Pratyush Yadav
2025-03-09 12:03 ` Christian Brauner
2025-03-17 16:59 ` Jason Gunthorpe
2025-03-18 14:25 ` Christian Brauner
2025-03-18 14:57 ` Jason Gunthorpe
2025-03-18 23:02 ` Pratyush Yadav
2025-03-18 23:27 ` Jason Gunthorpe [this message]
2025-03-19 13:35 ` Pratyush Yadav
2025-03-20 12:14 ` Jason Gunthorpe
2025-03-26 22:40 ` Pratyush Yadav
2025-03-31 15:38 ` Jason Gunthorpe
2025-03-07 0:57 ` [RFC PATCH 2/5] misc: add documentation for FDBox Pratyush Yadav
2025-03-07 2:19 ` Randy Dunlap
2025-03-07 15:03 ` Pratyush Yadav
2025-03-07 14:22 ` Jonathan Corbet
2025-03-07 14:51 ` Pratyush Yadav
2025-03-07 15:25 ` Jonathan Corbet
2025-03-07 23:28 ` Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 3/5] mm: shmem: allow callers to specify operations to shmem_undo_range Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 4/5] mm: shmem: allow preserving file over FDBOX + KHO Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 5/5] mm/memfd: allow preserving FD " Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250318232727.GF9311@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=benh@kernel.crashing.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=gregkh@linuxfoundation.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=ptyadav@amazon.de \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tatashin@google.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox