linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Pratyush Yadav <ptyadav@amazon.de>
Cc: Christian Brauner <brauner@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Eric Biederman <ebiederm@xmission.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Hugh Dickins <hughd@google.com>, Alexander Graf <graf@amazon.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	David Woodhouse <dwmw2@infradead.org>,
	James Gowans <jgowans@amazon.com>,
	Mike Rapoport <rppt@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Pasha Tatashin <tatashin@google.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Dave Hansen <dave.hansen@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Wei Yang <richard.weiyang@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, kexec@lists.infradead.org
Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox
Date: Mon, 31 Mar 2025 12:38:22 -0300	[thread overview]
Message-ID: <20250331153822.GG10839@nvidia.com> (raw)
In-Reply-To: <mafs05xjvs9eq.fsf@amazon.de>

On Wed, Mar 26, 2025 at 10:40:29PM +0000, Pratyush Yadav wrote:
> Ideally, kho_preserve_folio() should be similar to freeing the folio,
> except that it doesn't go to buddy for re-allocation. In that case,
> re-using those pages should not be a problem as long as the driver made
> sure the page was properly "freed", and there are no stale references to
> it. They should be doing that anyway since they should make sure the
> file doesn't change after it has been serialized.

I don't know if this is a good idea, it seems to make error recovery
much more complex.

> > Then you have the issue that I don't actually imagine shutting down
> > something like iommufd, I was intending to leave it frozen in place
> > with all its allocations and so on. If you try to de-serialize you
> > can't de-serialize into the thing that is frozen, you'd create a new
> > one from empty. Now you have two things pointing at the same stuff,
> > what a mess.
> 
> What do you mean by "frozen in place"? Isn't that the same as being
> serialized? 

I mean all the memory and internal state is still there, it is just
not changing. It is not the same as being serialized, as the
de-serialized versions of everything would still exist in parallel.

> Considering that we want to make sure a file is not opened by any
> process before we serialize it, what do we get by keeping the struct
> file around (assuming we can safely deserialize it without going
> through kexec)?

We do alot less work.

Having serialize reliably but the entire system into a fully
post-live-update state, including dependent things like the
iommufd/vfio attachment and iommu driver, is very hard. This stuff is
quite complex.

I imagine instead we have three data states
 - Fully operating
 - Frozen and all preserved memory logged in KHO
 - post-live-update where there are hints scattered around the drivers
   about what is in the KHO

From an error prespective going from frozen back to fully operating
should just be throwing away the KHO record and allowing use of the FD
again. That is super simply and makes error recovery during
micro-steps of the KHO simple and safe.

If you imagine that KHO is destructive then every failure point needs
to unwind the partial destruction which is a total nightmare to code :\

> Main idea is for logical grouping and dependency management. If some FDs
> have a dependency between them, grouping them in different boxes makes
> it easy to let userspace choose the order of operations, but still have
> a way to make sure all dependencies are met when the FDs are serialized.
> Similarly, on the deserialize side, this ensures that all dependent FDs
> are deserialized together.

That seems over complicated to me. Userspace should write the FDs in
the required order and that should be a topological sort of the
required dependencies. kernel should just validate this was done.

Jason


  reply	other threads:[~2025-03-31 15:38 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07  0:57 [RFC PATCH 0/5] Introduce FDBox, and preserve memfd with shmem over KHO Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 1/5] misc: introduce FDBox Pratyush Yadav
2025-03-07  6:03   ` Greg Kroah-Hartman
2025-03-07  9:31   ` Christian Brauner
2025-03-07 13:19     ` Christian Brauner
2025-03-07 15:14     ` Jason Gunthorpe
2025-03-08 11:09       ` Christian Brauner
2025-03-17 16:46         ` Jason Gunthorpe
2025-03-08  0:10     ` Pratyush Yadav
2025-03-09 12:03       ` Christian Brauner
2025-03-17 16:59         ` Jason Gunthorpe
2025-03-18 14:25           ` Christian Brauner
2025-03-18 14:57             ` Jason Gunthorpe
2025-03-18 23:02               ` Pratyush Yadav
2025-03-18 23:27                 ` Jason Gunthorpe
2025-03-19 13:35                   ` Pratyush Yadav
2025-03-20 12:14                     ` Jason Gunthorpe
2025-03-26 22:40                       ` Pratyush Yadav
2025-03-31 15:38                         ` Jason Gunthorpe [this message]
2025-03-07  0:57 ` [RFC PATCH 2/5] misc: add documentation for FDBox Pratyush Yadav
2025-03-07  2:19   ` Randy Dunlap
2025-03-07 15:03     ` Pratyush Yadav
2025-03-07 14:22   ` Jonathan Corbet
2025-03-07 14:51     ` Pratyush Yadav
2025-03-07 15:25       ` Jonathan Corbet
2025-03-07 23:28         ` Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 3/5] mm: shmem: allow callers to specify operations to shmem_undo_range Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 4/5] mm: shmem: allow preserving file over FDBOX + KHO Pratyush Yadav
2025-03-07  0:57 ` [RFC PATCH 5/5] mm/memfd: allow preserving FD " Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250331153822.GG10839@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=ptyadav@amazon.de \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tatashin@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox