From: <ankita@nvidia.com>
To: <ankita@nvidia.com>, <aniketa@nvidia.com>, <vsethi@nvidia.com>,
<jgg@nvidia.com>, <mochs@nvidia.com>, <skolothumtho@nvidia.com>,
<linmiaohe@huawei.com>, <nao.horiguchi@gmail.com>,
<akpm@linux-foundation.org>, <david@redhat.com>,
<lorenzo.stoakes@oracle.com>, <Liam.Howlett@oracle.com>,
<vbabka@suse.cz>, <rppt@kernel.org>, <surenb@google.com>,
<mhocko@suse.com>, <tony.luck@intel.com>, <bp@alien8.de>,
<rafael@kernel.org>, <guohanjun@huawei.com>, <mchehab@kernel.org>,
<lenb@kernel.org>, <kevin.tian@intel.com>, <alex@shazbot.org>
Cc: <cjia@nvidia.com>, <kwankhede@nvidia.com>, <targupta@nvidia.com>,
<zhiw@nvidia.com>, <dnigam@nvidia.com>, <kjaju@nvidia.com>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<linux-edac@vger.kernel.org>, <Jonathan.Cameron@huawei.com>,
<ira.weiny@intel.com>, <Smita.KoralahalliChannabasappa@amd.com>,
<u.kleine-koenig@baylibre.com>, <peterz@infradead.org>,
<linux-acpi@vger.kernel.org>, <kvm@vger.kernel.org>
Subject: [PATCH v3 3/3] vfio/nvgrace-gpu: register device memory for poison handling
Date: Tue, 21 Oct 2025 10:23:27 +0000 [thread overview]
Message-ID: <20251021102327.199099-4-ankita@nvidia.com> (raw)
In-Reply-To: <20251021102327.199099-1-ankita@nvidia.com>
From: Ankit Agrawal <ankita@nvidia.com>
The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA
(Qemu) using remap_pfn_range() without adding the memory to the kernel.
The device memory pages are not backed by struct page. Patches 1-2
implements the mechanism to handle ECC/poison on memory page without
struct page. This new mechanism is being used here.
The module registers its memory region and the address_space with the
kernel MM for ECC handling using the register_pfn_address_space()
registration API exposed by the kernel.
Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [1]
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
drivers/vfio/pci/nvgrace-gpu/main.c | 45 ++++++++++++++++++++++++++++-
1 file changed, 44 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index d95761dcdd58..80b3ed63c682 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -8,6 +8,10 @@
#include <linux/delay.h>
#include <linux/jiffies.h>
+#ifdef CONFIG_MEMORY_FAILURE
+#include <linux/memory-failure.h>
+#endif
+
/*
* The device memory usable to the workloads running in the VM is cached
* and showcased as a 64b device BAR (comprising of BAR4 and BAR5 region)
@@ -47,6 +51,9 @@ struct mem_region {
void *memaddr;
void __iomem *ioaddr;
}; /* Base virtual address of the region */
+#ifdef CONFIG_MEMORY_FAILURE
+ struct pfn_address_space pfn_address_space;
+#endif
};
struct nvgrace_gpu_pci_core_device {
@@ -60,6 +67,28 @@ struct nvgrace_gpu_pci_core_device {
bool has_mig_hw_bug;
};
+#ifdef CONFIG_MEMORY_FAILURE
+
+static int
+nvgrace_gpu_vfio_pci_register_pfn_range(struct mem_region *region,
+ struct vm_area_struct *vma)
+{
+ unsigned long nr_pages;
+ int ret = 0;
+
+ nr_pages = region->memlength >> PAGE_SHIFT;
+
+ region->pfn_address_space.node.start = vma->vm_pgoff;
+ region->pfn_address_space.node.last = vma->vm_pgoff + nr_pages - 1;
+ region->pfn_address_space.mapping = vma->vm_file->f_mapping;
+
+ ret = register_pfn_address_space(®ion->pfn_address_space);
+
+ return ret;
+}
+
+#endif
+
static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev)
{
struct nvgrace_gpu_pci_core_device *nvdev =
@@ -127,6 +156,13 @@ static void nvgrace_gpu_close_device(struct vfio_device *core_vdev)
mutex_destroy(&nvdev->remap_lock);
+#ifdef CONFIG_MEMORY_FAILURE
+ if (nvdev->resmem.memlength)
+ unregister_pfn_address_space(&nvdev->resmem.pfn_address_space);
+
+ unregister_pfn_address_space(&nvdev->usemem.pfn_address_space);
+#endif
+
vfio_pci_core_close_device(core_vdev);
}
@@ -202,7 +238,14 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
vma->vm_pgoff = start_pfn;
- return 0;
+#ifdef CONFIG_MEMORY_FAILURE
+ if (nvdev->resmem.memlength && index == VFIO_PCI_BAR2_REGION_INDEX)
+ ret = nvgrace_gpu_vfio_pci_register_pfn_range(&nvdev->resmem, vma);
+ else if (index == VFIO_PCI_BAR4_REGION_INDEX)
+ ret = nvgrace_gpu_vfio_pci_register_pfn_range(&nvdev->usemem, vma);
+#endif
+
+ return ret;
}
static long
--
2.34.1
next prev parent reply other threads:[~2025-10-21 10:23 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-21 10:23 [PATCH v3 0/3] mm: Implement ECC handling for pfn with no struct page ankita
2025-10-21 10:23 ` [PATCH v3 1/3] mm: handle poisoning of pfn without struct pages ankita
2025-10-21 17:05 ` Ira Weiny
2025-10-22 16:00 ` Jiaqi Yan
2025-10-24 6:34 ` Miaohe Lin
2025-10-24 9:45 ` Shuai Xue
2025-10-24 11:52 ` Jason Gunthorpe
2025-10-24 11:59 ` Ankit Agrawal
2025-10-21 10:23 ` [PATCH v3 2/3] mm: Change ghes code to allow poison of non-struct pfn ankita
2025-10-21 17:13 ` Ira Weiny
2025-10-21 17:19 ` Luck, Tony
2025-10-22 6:53 ` Shuai Xue
2025-10-22 15:03 ` Ira Weiny
2025-10-24 10:03 ` Shuai Xue
2025-10-24 11:26 ` Ankit Agrawal
2025-10-21 10:23 ` ankita [this message]
2025-10-21 16:30 ` [PATCH v3 0/3] mm: Implement ECC handling for pfn with no struct page Liam R. Howlett
2025-10-21 16:44 ` Jason Gunthorpe
2025-10-21 18:54 ` Liam R. Howlett
2025-10-21 22:38 ` Jason Gunthorpe
2025-10-24 11:16 ` Ankit Agrawal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251021102327.199099-4-ankita@nvidia.com \
--to=ankita@nvidia.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=aniketa@nvidia.com \
--cc=bp@alien8.de \
--cc=cjia@nvidia.com \
--cc=david@redhat.com \
--cc=dnigam@nvidia.com \
--cc=guohanjun@huawei.com \
--cc=ira.weiny@intel.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kjaju@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=lenb@kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mchehab@kernel.org \
--cc=mhocko@suse.com \
--cc=mochs@nvidia.com \
--cc=nao.horiguchi@gmail.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rppt@kernel.org \
--cc=skolothumtho@nvidia.com \
--cc=surenb@google.com \
--cc=targupta@nvidia.com \
--cc=tony.luck@intel.com \
--cc=u.kleine-koenig@baylibre.com \
--cc=vbabka@suse.cz \
--cc=vsethi@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox