linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: David Matlack <dmatlack@google.com>
Cc: "Alex Williamson" <alex@shazbot.org>,
	"Adithya Jayachandran" <ajayachandra@nvidia.com>,
	"Alexander Graf" <graf@amazon.com>,
	"Alex Mastro" <amastro@fb.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Chris Li" <chrisl@kernel.org>,
	"David Rientjes" <rientjes@google.com>,
	"Jacob Pan" <jacob.pan@linux.microsoft.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Josh Hilke" <jrhilke@google.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	kexec@lists.infradead.org, kvm@vger.kernel.org,
	"Leon Romanovsky" <leon@kernel.org>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org, "Lukas Wunner" <lukas@wunner.de>,
	"Michał Winiarski" <michal.winiarski@intel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Parav Pandit" <parav@nvidia.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Pratyush Yadav" <pratyush@kernel.org>,
	"Raghavendra Rao Ananta" <rananta@google.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Saeed Mahameed" <saeedm@nvidia.com>,
	"Samiullah Khawaja" <skhawaja@google.com>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Tomita Moeko" <tomitamoeko@gmail.com>,
	"Vipin Sharma" <vipinsh@google.com>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"William Tu" <witu@nvidia.com>, "Yi Liu" <yi.l.liu@intel.com>,
	"Zhu Yanjun" <yanjun.zhu@linux.dev>
Subject: Re: [PATCH v2 02/22] PCI: Add API to track PCI devices preserved across Live Update
Date: Tue, 24 Feb 2026 19:02:56 +0000	[thread overview]
Message-ID: <aZ314HSRnYtGinTU@google.com> (raw)
In-Reply-To: <CALzav=fSpd6H5pQNtJoFHdNtWVO11vffhWQFsMFkM+osGuE0wQ@mail.gmail.com>

On Tue, Feb 24, 2026 at 09:33:28AM -0800, David Matlack wrote:
> On Tue, Feb 24, 2026 at 1:18 AM Pranjal Shrivastava <praan@google.com> wrote:
> > On Thu, Jan 29, 2026 at 09:24:49PM +0000, David Matlack wrote:
> > > + * Copyright (c) 2025, Google LLC.
> >
> > Nit: Should these be 2026 now?
> 
> Yes! Thanks for catching that.
> 
> > > +int pci_liveupdate_outgoing_preserve(struct pci_dev *dev)
> > > +{
> > > +     struct pci_dev_ser new = INIT_PCI_DEV_SER(dev);
> > > +     struct pci_ser *ser;
> > > +     int i, ret;
> > > +
> > > +     /* Preserving VFs is not supported yet. */
> > > +     if (dev->is_virtfn)
> > > +             return -EINVAL;
> > > +
> > > +     guard(mutex)(&pci_flb_outgoing_lock);
> > > +
> > > +     if (dev->liveupdate_outgoing)
> > > +             return -EBUSY;
> > > +
> > > +     ret = liveupdate_flb_get_outgoing(&pci_liveupdate_flb, (void **)&ser);
> > > +     if (ret)
> > > +             return ret;
> > > +
> > > +     if (ser->nr_devices == ser->max_nr_devices)
> > > +             return -E2BIG;
> >
> > I'm wondering how (or if) this handles hot-plugged devices?
> > max_nr_devices is calculated based on for_each_pci_dev at the time of
> > the first preservation.. what happens if a device is hotplugged after
> > the first device is preserved but before the second one is, does
> > max_nr_devices become stale? Since ser->max_nr_devices will not reflect
> > the actual possible device count, potentially leading to an unnecessary
> > -E2BIG failure?
> 
> Yes, it's possible to run out space to preserve devices if devices are
> hot-plugged and then preserved. But I think it's better to defer
> handling such a use-case exists (unless you see an obvious simple
> solution). So far I am not seeing preserving hot-plugged devices
> across Live Update as a high priority use-case to support.
> 

Ack. If we aren't supporting preservation for hot-plug at this point.
Let's mention that somewhere? Maybe just a little comment or the kdoc?

> > > +u32 pci_liveupdate_incoming_nr_devices(void)
> > > +{
> > > +     struct pci_ser *ser;
> > > +     int ret;
> > > +
> > > +     ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&ser);
> > > +     if (ret)
> > > +             return 0;
> >
> > Masking this error looks troubled, in the following patch, I see that
> > the retval 0 is treated as a fresh boot, but the IOMMU mappings for that
> > BDF might still be preserved? Which could lead to DMA aliasing issues,
> > without a hint of what happened since we don't even log anything.
> 
> All fo the non-0 errors indicate there are 0 incoming devices at the
> time of the call, so I think returning 0 is appropriate.
> 
>  - EOPNOTSUPP: Live Update is not enabled.
>  - ENODATA: Live Update is finished (all incoming devices have been restored).
>  - ENOTENT: No PCI data was preserved across the Live Update.
> 
> None of these cover the case where an IOMMU mapping for BDF X is
> preserved, but device X is not preserved. This is a case we should
> handle in some way... but here is not that place.
> 
> >
> > Maybe we could have something like the following:
> >
> > int pci_liveupdate_incoming_nr_devices(void)
> > {
> >         struct pci_ser *ser;
> >         int ret;
> >
> >         ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&ser);
> >         if (ret) {
> >                 if (ret != -ENOENT)
> >                         pr_warn("PCI: Failed to retrieve preservation list: %d\n", ret);
> 
> This would cause this warning to get printed if Live Update was
> disabled, or if no PCI devices were preserved. But both of those are
> not error scenarios.
> 

I agree, the snippet was just an example. What I'm trying to say here
is, what if the retval is -ENOMEM / -ENODATA, the existing code will
treat it as a fresh boot because it believes there are no incoming 
devices. However, since this was an incoming device which failed to be
retrieved, there's a chance that it's IOMMU mapping was preserved too.
By returning 0, the PCI core will feel free to rebalance bus numbers or
reassign BARs. For instance, if the IOMMU already inherited mappings for
BDF 02:00.0, but the PCI core (due to this masked error) reassigns a 
different device to that BDF, we face DMA aliasing or IOMMU faults.
Am I missing some context here?

> > > +void pci_liveupdate_setup_device(struct pci_dev *dev)
> > > +{
> > > +     struct pci_ser *ser;
> > > +     int ret;
> > > +
> > > +     ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&ser);
> > > +     if (ret)
> > > +             return;
> >
> > We should log something here either at info / debug level since the
> > error isn't bubbled up and the luo_core doesn't scream about it either.
> 
> Any error from liveupdate_flb_get_incoming() simply means there are no
> incoming devices. So I don't think there's any error to report in
> dmesg.
> 
> > > +     dev->liveupdate_incoming = !!pci_ser_find(ser, dev);
> >
> > This feels a little hacky, shall we go for something like:
> >
> > dev->liveupdate_incoming = (pci_ser_find(ser, dev) != NULL); ?
> 
> In my experience in the kernel (mostly from KVM), explicity comparison
> to NULL is less preferred to treating a pointer as a boolean. But I'm
> ok with following whatever is the locally preferred style for this
> kind of check.
> 

No strong feelings there, I see both being used in drivers/pci.

> > > @@ -582,6 +583,10 @@ struct pci_dev {
> > >       u8              tph_mode;       /* TPH mode */
> > >       u8              tph_req_type;   /* TPH requester type */
> > >  #endif
> > > +#ifdef CONFIG_LIVEUPDATE
> > > +     unsigned int    liveupdate_incoming:1;  /* Preserved by previous kernel */
> > > +     unsigned int    liveupdate_outgoing:1;  /* Preserved for next kernel */
> > > +#endif
> > >  };
> >
> > This would start another anon bitfield container, should we move this
> > above within the existing bitfield? If we've run pahole and found this
> > to be better, then this should be fine.
> 
> Yeah I simply appended these new fields to the very end of the struct.
> If we care about optimizing the packing of struct pci_dev I can find a
> better place to put it.

If you have pahole handy, it would be great to see if these can slide 
into an existing hole. If not, no big deal for v3.. we can keep it as is

Thanks,
Praan


  reply	other threads:[~2026-02-24 19:03 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 21:24 [PATCH v2 00/22] vfio/pci: Base Live Update support for VFIO device files David Matlack
2026-01-29 21:24 ` [PATCH v2 01/22] liveupdate: Export symbols needed by modules David Matlack
2026-02-24  8:26   ` Pranjal Shrivastava
2026-02-24 17:08   ` Samiullah Khawaja
2026-01-29 21:24 ` [PATCH v2 02/22] PCI: Add API to track PCI devices preserved across Live Update David Matlack
2026-02-01  6:38   ` Zhu Yanjun
2026-02-02 18:14     ` David Matlack
2026-02-04  0:10       ` Yanjun.Zhu
2026-02-20 19:03         ` David Matlack
2026-02-23 22:04   ` Samiullah Khawaja
2026-02-23 23:08     ` David Matlack
2026-02-23 23:43       ` Samiullah Khawaja
2026-02-24  0:00         ` David Matlack
2026-02-24  9:17   ` Pranjal Shrivastava
2026-02-24 17:33     ` David Matlack
2026-02-24 19:02       ` Pranjal Shrivastava [this message]
2026-02-24 19:05         ` Pranjal Shrivastava
2026-01-29 21:24 ` [PATCH v2 03/22] PCI: Inherit bus numbers from previous kernel during " David Matlack
2026-02-24  9:36   ` Pranjal Shrivastava
2026-02-24 17:36     ` David Matlack
2026-01-29 21:24 ` [PATCH v2 04/22] vfio/pci: Register a file handler with Live Update Orchestrator David Matlack
2026-02-06 22:37   ` Yanjun.Zhu
2026-02-06 23:14     ` David Matlack
2026-02-24  9:58       ` Pranjal Shrivastava
2026-01-29 21:24 ` [PATCH v2 05/22] vfio/pci: Preserve vfio-pci device files across Live Update David Matlack
2026-02-23 22:29   ` Samiullah Khawaja
2026-02-24 18:37   ` Pranjal Shrivastava
2026-02-24 19:16     ` David Matlack
2026-01-29 21:24 ` [PATCH v2 06/22] vfio/pci: Retrieve preserved device files after " David Matlack
2026-02-23 23:27   ` Samiullah Khawaja
2026-02-24 19:19   ` Pranjal Shrivastava
2026-01-29 21:24 ` [PATCH v2 07/22] vfio/pci: Notify PCI subsystem about devices preserved across " David Matlack
2026-01-29 21:24 ` [PATCH v2 08/22] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD David Matlack
2026-01-29 21:24 ` [PATCH v2 09/22] vfio/pci: Store incoming Live Update state in struct vfio_pci_core_device David Matlack
2026-01-29 21:24 ` [PATCH v2 10/22] vfio/pci: Skip reset of preserved device after Live Update David Matlack
2026-01-29 22:21   ` Jacob Pan
2026-01-29 22:33     ` David Matlack
2026-01-30  0:31       ` Jacob Pan
2026-01-29 21:24 ` [PATCH v2 11/22] docs: liveupdate: Document VFIO device file preservation David Matlack
2026-01-29 21:24 ` [PATCH v2 12/22] selftests/liveupdate: Move luo_test_utils.* into a reusable library David Matlack
2026-01-29 21:25 ` [PATCH v2 13/22] selftests/liveupdate: Add helpers to preserve/retrieve FDs David Matlack
2026-01-29 21:25 ` [PATCH v2 14/22] vfio: selftests: Build liveupdate library in VFIO selftests David Matlack
2026-01-29 21:25 ` [PATCH v2 15/22] vfio: selftests: Add Makefile support for TEST_GEN_PROGS_EXTENDED David Matlack
2026-01-29 21:25 ` [PATCH v2 16/22] vfio: selftests: Add vfio_pci_liveupdate_uapi_test David Matlack
2026-01-29 21:25 ` [PATCH v2 17/22] vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD David Matlack
2026-01-29 21:25 ` [PATCH v2 18/22] vfio: selftests: Add vfio_pci_liveupdate_kexec_test David Matlack
2026-01-29 21:25 ` [PATCH v2 19/22] vfio: selftests: Expose iommu_modes to tests David Matlack
2026-01-29 21:25 ` [PATCH v2 20/22] vfio: selftests: Expose low-level helper routines for setting up struct vfio_pci_device David Matlack
2026-01-29 21:25 ` [PATCH v2 21/22] vfio: selftests: Verify that opening VFIO device fails during Live Update David Matlack
2026-01-29 21:25 ` [PATCH v2 22/22] vfio: selftests: Add continuous DMA to vfio_pci_liveupdate_kexec_test David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ314HSRnYtGinTU@google.com \
    --to=praan@google.com \
    --cc=ajayachandra@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=graf@amazon.com \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jrhilke@google.com \
    --cc=kevin.tian@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michal.winiarski@intel.com \
    --cc=parav@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pratyush@kernel.org \
    --cc=rananta@google.com \
    --cc=rientjes@google.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=skhawaja@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tomitamoeko@gmail.com \
    --cc=vipinsh@google.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=witu@nvidia.com \
    --cc=yanjun.zhu@linux.dev \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox