linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: <ankita@nvidia.com>
Cc: <aniketa@nvidia.com>, <vsethi@nvidia.com>, <jgg@nvidia.com>,
	<mochs@nvidia.com>, <skolothumtho@nvidia.com>,
	<linmiaohe@huawei.com>, <nao.horiguchi@gmail.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<lorenzo.stoakes@oracle.com>, <Liam.Howlett@oracle.com>,
	<vbabka@suse.cz>, <rppt@kernel.org>, <surenb@google.com>,
	<mhocko@suse.com>, <tony.luck@intel.com>, <bp@alien8.de>,
	<rafael@kernel.org>, <guohanjun@huawei.com>, <mchehab@kernel.org>,
	<lenb@kernel.org>, <kevin.tian@intel.com>, <cjia@nvidia.com>,
	<kwankhede@nvidia.com>, <targupta@nvidia.com>, <zhiw@nvidia.com>,
	<dnigam@nvidia.com>, <kjaju@nvidia.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-edac@vger.kernel.org>, <Jonathan.Cameron@huawei.com>,
	<ira.weiny@intel.com>, <Smita.KoralahalliChannabasappa@amd.com>,
	<u.kleine-koenig@baylibre.com>, <peterz@infradead.org>,
	<linux-acpi@vger.kernel.org>, <kvm@vger.kernel.org>
Subject: Re: [PATCH v5 3/3] vfio/nvgrace-gpu: register device memory for poison handling
Date: Thu, 6 Nov 2025 14:56:22 -0700	[thread overview]
Message-ID: <20251106145622.1610d306.alex@shazbot.org> (raw)
In-Reply-To: <20251102184434.2406-4-ankita@nvidia.com>

On Sun, 2 Nov 2025 18:44:34 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA
> (Qemu) using remap_pfn_range() without adding the memory to the kernel.
> The device memory pages are not backed by struct page. The previous
> patch implements the mechanism to handle ECC/poison on memory page without
> struct page. This new mechanism is being used here.
> 
> The module registers its memory region and the address_space with the
> kernel MM for ECC handling using the register_pfn_address_space()
> registration API exposed by the kernel.
> 
> Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [1]
> 
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
>  drivers/vfio/pci/nvgrace-gpu/main.c | 45 ++++++++++++++++++++++++++++-
>  1 file changed, 44 insertions(+), 1 deletion(-)

LGTM.  I see Andrew has already picked this up in mm-new, if he
refreshes, here's another ack.

Acked-by: Alex Williamson <alex@shazbot.org>

Thanks,
Alex

> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index d95761dcdd58..80b3ed63c682 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -8,6 +8,10 @@
>  #include <linux/delay.h>
>  #include <linux/jiffies.h>
>  
> +#ifdef CONFIG_MEMORY_FAILURE
> +#include <linux/memory-failure.h>
> +#endif
> +
>  /*
>   * The device memory usable to the workloads running in the VM is cached
>   * and showcased as a 64b device BAR (comprising of BAR4 and BAR5 region)
> @@ -47,6 +51,9 @@ struct mem_region {
>  		void *memaddr;
>  		void __iomem *ioaddr;
>  	};                      /* Base virtual address of the region */
> +#ifdef CONFIG_MEMORY_FAILURE
> +	struct pfn_address_space pfn_address_space;
> +#endif
>  };
>  
>  struct nvgrace_gpu_pci_core_device {
> @@ -60,6 +67,28 @@ struct nvgrace_gpu_pci_core_device {
>  	bool has_mig_hw_bug;
>  };
>  
> +#ifdef CONFIG_MEMORY_FAILURE
> +
> +static int
> +nvgrace_gpu_vfio_pci_register_pfn_range(struct mem_region *region,
> +					struct vm_area_struct *vma)
> +{
> +	unsigned long nr_pages;
> +	int ret = 0;
> +
> +	nr_pages = region->memlength >> PAGE_SHIFT;
> +
> +	region->pfn_address_space.node.start = vma->vm_pgoff;
> +	region->pfn_address_space.node.last = vma->vm_pgoff + nr_pages - 1;
> +	region->pfn_address_space.mapping = vma->vm_file->f_mapping;
> +
> +	ret = register_pfn_address_space(&region->pfn_address_space);
> +
> +	return ret;
> +}
> +
> +#endif
> +
>  static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev)
>  {
>  	struct nvgrace_gpu_pci_core_device *nvdev =
> @@ -127,6 +156,13 @@ static void nvgrace_gpu_close_device(struct vfio_device *core_vdev)
>  
>  	mutex_destroy(&nvdev->remap_lock);
>  
> +#ifdef CONFIG_MEMORY_FAILURE
> +	if (nvdev->resmem.memlength)
> +		unregister_pfn_address_space(&nvdev->resmem.pfn_address_space);
> +
> +	unregister_pfn_address_space(&nvdev->usemem.pfn_address_space);
> +#endif
> +
>  	vfio_pci_core_close_device(core_vdev);
>  }
>  
> @@ -202,7 +238,14 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
>  
>  	vma->vm_pgoff = start_pfn;
>  
> -	return 0;
> +#ifdef CONFIG_MEMORY_FAILURE
> +	if (nvdev->resmem.memlength && index == VFIO_PCI_BAR2_REGION_INDEX)
> +		ret = nvgrace_gpu_vfio_pci_register_pfn_range(&nvdev->resmem, vma);
> +	else if (index == VFIO_PCI_BAR4_REGION_INDEX)
> +		ret = nvgrace_gpu_vfio_pci_register_pfn_range(&nvdev->usemem, vma);
> +#endif
> +
> +	return ret;
>  }
>  
>  static long



  reply	other threads:[~2025-11-06 21:56 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-02 18:44 [PATCH v5 0/3] mm: Implement ECC handling for pfn with no struct page ankita
2025-11-02 18:44 ` [PATCH v5 1/3] mm: Change ghes code to allow poison of non-struct pfn ankita
2025-11-02 18:44 ` [PATCH v5 2/3] mm: handle poisoning of pfn without struct pages ankita
2025-11-04  2:49   ` Andrew Morton
2025-11-02 18:44 ` [PATCH v5 3/3] vfio/nvgrace-gpu: register device memory for poison handling ankita
2025-11-06 21:56   ` Alex Williamson [this message]
2025-11-04  2:47 ` [PATCH v5 0/3] mm: Implement ECC handling for pfn with no struct page Andrew Morton
2025-11-04 17:15   ` Ankit Agrawal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251106145622.1610d306.alex@shazbot.org \
    --to=alex@shazbot.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=bp@alien8.de \
    --cc=cjia@nvidia.com \
    --cc=david@redhat.com \
    --cc=dnigam@nvidia.com \
    --cc=guohanjun@huawei.com \
    --cc=ira.weiny@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=lenb@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mchehab@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mochs@nvidia.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rppt@kernel.org \
    --cc=skolothumtho@nvidia.com \
    --cc=surenb@google.com \
    --cc=targupta@nvidia.com \
    --cc=tony.luck@intel.com \
    --cc=u.kleine-koenig@baylibre.com \
    --cc=vbabka@suse.cz \
    --cc=vsethi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox