[PATCH v4 0/3] mm: Implement ECC handling for pfn with no struct page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/3] mm: Implement ECC handling for pfn with no struct page
@ 2025-10-26 14:19 ankita
  2025-10-26 14:19 ` [PATCH v4 1/3] mm: Change ghes code to allow poison of non-struct pfn ankita
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: ankita @ 2025-10-26 14:19 UTC (permalink / raw)
  To: ankita, aniketa, vsethi, jgg, mochs, skolothumtho, linmiaohe,
	nao.horiguchi, akpm, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, tony.luck, bp, rafael, guohanjun,
	mchehab, lenb, kevin.tian, alex
  Cc: cjia, kwankhede, targupta, zhiw, dnigam, kjaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

From: Ankit Agrawal <ankita@nvidia.com>

Poison (or ECC) errors can be very common on a large size cluster.
The kernel MM currently handles ECC errors / poison only on memory page
backed by struct page. The handling is currently missing for the PFNMAP
memory that does not have struct pages. The series adds such support.

Implement a new ECC handling for memory without struct pages. Kernel MM
expose registration APIs to allow modules that are managing the device
to register its device memory region. MM then tracks such regions using
interval tree.

The mechanism is largely similar to that of ECC on pfn with struct pages.
If there is an ECC error on a pfn, all the mapping to it are identified
and a SIGBUS is sent to the user space processes owning those mappings.
Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN.
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2. This can greatly increase the cost
of table walks and have a major performance impact.

nvgrace-gpu-vfio-pci module maps the device memory to user VA (Qemu) using
remap_pfn_range without being added to the kernel [2]. These device memory
PFNs are not backed by struct page. So make nvgrace-gpu-vfio-pci module
make use of the mechanism to get poison handling support on the device
memory.

Patch rebased to v6.17-rc7.

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---

Link: https://lore.kernel.org/all/20251021102327.199099-1-ankita@nvidia.com/ [v3]

v3 -> v4
- Added guards in memory_failure_pfn, register, unregister function to
simplify code. (Thanks Ira Weiny for suggestion).
- Collected reviewed-by from Shuai Xue (Thanks!) on the mm GHES patch. Also
moved it to the front of the series.
- Added check for interval_tree_iter_first before removing the device
memory region. (Thanks Jiaqi Yan for suggestion)
- If pfn doesn't belong to any address space mapping, returning
MF_IGNORED (Thanks Miaohe Lin for suggestion).
- Updated patch commit to add more details on the perf impact on
HUGE PFNMAP (Thanks Jason Gunthorpe, Tony Luck for suggestion).

v2 -> v3
- Rebased to v6.17-rc7.
- Skipped the unmapping of PFNMAP during reception of poison. Suggested by
Jason Gunthorpe, Jiaqi Yan, Vikram Sethi (Thanks!)
- Updated the check to prevent multiple registration to the same PFN
range using interval_tree_iter_first. Thanks Shameer Kolothum for the
suggestion.
- Removed the callback function in the nvgrace-gpu requiring tracking of
poisoned PFN as it isn't required anymore.
- Introduced seperate collect_procs_pfn function to collect the list of
processes mapping to the poisoned PFN.

v1 -> v2
- Change poisoned page tracking from bitmap to hashtable.
- Addressed miscellaneous comments in v1.

Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1]
Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [2]

Ankit Agrawal (3):
  mm: Change ghes code to allow poison of non-struct pfn
  mm: handle poisoning of pfn without struct pages
  vfio/nvgrace-gpu: register device memory for poison handling

 MAINTAINERS                         |   1 +
 drivers/acpi/apei/ghes.c            |   6 --
 drivers/vfio/pci/nvgrace-gpu/main.c |  45 ++++++++-
 include/linux/memory-failure.h      |  17 ++++
 include/linux/mm.h                  |   1 +
 include/ras/ras_event.h             |   1 +
 mm/Kconfig                          |   1 +
 mm/memory-failure.c                 | 146 +++++++++++++++++++++++++++-
 8 files changed, 210 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/memory-failure.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/3] mm: Change ghes code to allow poison of non-struct pfn
  2025-10-26 14:19 [PATCH v4 0/3] mm: Implement ECC handling for pfn with no struct page ankita
@ 2025-10-26 14:19 ` ankita
  2025-10-26 14:19 ` [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages ankita
  2025-10-26 14:19 ` [PATCH v4 3/3] vfio/nvgrace-gpu: register device memory for poison handling ankita
  2 siblings, 0 replies; 12+ messages in thread
From: ankita @ 2025-10-26 14:19 UTC (permalink / raw)
  To: ankita, aniketa, vsethi, jgg, mochs, skolothumtho, linmiaohe,
	nao.horiguchi, akpm, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, tony.luck, bp, rafael, guohanjun,
	mchehab, lenb, kevin.tian, alex
  Cc: cjia, kwankhede, targupta, zhiw, dnigam, kjaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm, Shuai Xue

From: Ankit Agrawal <ankita@nvidia.com>

The GHES code allows calling of memory_failure() on the PFNs that pass the
pfn_valid() check. This contract is broken for the remapped PFNs which
fails the check and ghes_do_memory_failure() returns without triggering
memory_failure().

Update code to allow memory_failure() call on PFNs failing pfn_valid().

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 drivers/acpi/apei/ghes.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a0d54993edb3..bc4d0f2b3e9d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -505,12 +505,6 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
 		return false;
 
 	pfn = PHYS_PFN(physical_addr);
-	if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
-		pr_warn_ratelimited(FW_WARN GHES_PFX
-		"Invalid address in generic error data: %#llx\n",
-		physical_addr);
-		return false;
-	}
 
 	if (flags == MF_ACTION_REQUIRED && current->mm) {
 		twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb));
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-10-26 14:19 [PATCH v4 0/3] mm: Implement ECC handling for pfn with no struct page ankita
  2025-10-26 14:19 ` [PATCH v4 1/3] mm: Change ghes code to allow poison of non-struct pfn ankita
@ 2025-10-26 14:19 ` ankita
  2025-10-28  0:26   ` Andrew Morton
  2025-10-26 14:19 ` [PATCH v4 3/3] vfio/nvgrace-gpu: register device memory for poison handling ankita
  2 siblings, 1 reply; 12+ messages in thread
From: ankita @ 2025-10-26 14:19 UTC (permalink / raw)
  To: ankita, aniketa, vsethi, jgg, mochs, skolothumtho, linmiaohe,
	nao.horiguchi, akpm, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, tony.luck, bp, rafael, guohanjun,
	mchehab, lenb, kevin.tian, alex
  Cc: cjia, kwankhede, targupta, zhiw, dnigam, kjaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

From: Ankit Agrawal <ankita@nvidia.com>

Poison (or ECC) errors can be very common on a large size cluster.
The kernel MM currently does not handle ECC errors / poison on a memory
region that is not backed by struct pages. If a memory region mapped
using remap_pfn_range() for example, but not added to the kernel, MM
will not have associated struct pages. Add a new mechanism to handle
memory failure on such memory.

Make kernel MM expose a function to allow modules managing the device
memory to register the device memory SPA and the address space associated
it. MM maintains this information as an interval tree. On poison, MM can
search for the range that the poisoned PFN belong and use the address_space
to determine the mapping VMA.

In this implementation, kernel MM follows the following sequence that is
largely similar to the memory_failure() handler for struct page backed
memory:
1. memory_failure() is triggered on reception of a poison error. An
absence of struct page is detected and consequently memory_failure_pfn()
is executed.
2. memory_failure_pfn() collects the processes mapped to the PFN.
3. memory_failure_pfn() sends SIGBUS to all the processes mapping the
faulty PFN using kill_procs().

Note that there is one primary difference versus the handling of the
poison on struct pages, which is to skip unmapping to the faulty PFN.
This is done to handle the huge PFNMAP support added recently [1] that
enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN
mapped in such as way would need breaking the PMD/PUD mapping into PTEs
that will get mirrored into the S2. This can greatly increase the cost
of table walks and have a major performance impact.

Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1]

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 MAINTAINERS                    |   1 +
 include/linux/memory-failure.h |  17 ++++
 include/linux/mm.h             |   1 +
 include/ras/ras_event.h        |   1 +
 mm/Kconfig                     |   1 +
 mm/memory-failure.c            | 146 ++++++++++++++++++++++++++++++++-
 6 files changed, 166 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/memory-failure.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 520fb4e379a3..463d062d0386 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11359,6 +11359,7 @@ M:	Miaohe Lin <linmiaohe@huawei.com>
 R:	Naoya Horiguchi <nao.horiguchi@gmail.com>
 L:	linux-mm@kvack.org
 S:	Maintained
+F:	include/linux/memory-failure.h
 F:	mm/hwpoison-inject.c
 F:	mm/memory-failure.c
 
diff --git a/include/linux/memory-failure.h b/include/linux/memory-failure.h
new file mode 100644
index 000000000000..bc326503d2d2
--- /dev/null
+++ b/include/linux/memory-failure.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MEMORY_FAILURE_H
+#define _LINUX_MEMORY_FAILURE_H
+
+#include <linux/interval_tree.h>
+
+struct pfn_address_space;
+
+struct pfn_address_space {
+	struct interval_tree_node node;
+	struct address_space *mapping;
+};
+
+int register_pfn_address_space(struct pfn_address_space *pfn_space);
+void unregister_pfn_address_space(struct pfn_address_space *pfn_space);
+
+#endif /* _LINUX_MEMORY_FAILURE_H */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec7..0ab4ea82ce9e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4006,6 +4006,7 @@ enum mf_action_page_type {
 	MF_MSG_DAX,
 	MF_MSG_UNSPLIT_THP,
 	MF_MSG_ALREADY_POISONED,
+	MF_MSG_PFN_MAP,
 	MF_MSG_UNKNOWN,
 };
 
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index c8cd0f00c845..fecfeb7c8be7 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -375,6 +375,7 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_DAX, "dax page" )					\
 	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
 	EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )		\
+	EM ( MF_MSG_PFN_MAP, "non struct page pfn" )                    \
 	EMe ( MF_MSG_UNKNOWN, "unknown page" )
 
 /*
diff --git a/mm/Kconfig b/mm/Kconfig
index e443fe8cd6cf..0b07219390b9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -777,6 +777,7 @@ config MEMORY_FAILURE
 	depends on ARCH_SUPPORTS_MEMORY_FAILURE
 	bool "Enable recovery from hardware memory errors"
 	select MEMORY_ISOLATION
+	select INTERVAL_TREE
 	select RAS
 	help
 	  Enables code to recover from some memory failures on systems
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index df6ee59527dd..afac4ed2694e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -38,6 +38,7 @@
 
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/memory-failure.h>
 #include <linux/page-flags.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/task.h>
@@ -154,6 +155,10 @@ static const struct ctl_table memory_failure_table[] = {
 	}
 };
 
+static struct rb_root_cached pfn_space_itree = RB_ROOT_CACHED;
+
+static DEFINE_MUTEX(pfn_space_lock);
+
 /*
  * Return values:
  *   1:   the page is dissolved (if needed) and taken off from buddy,
@@ -957,6 +962,7 @@ static const char * const action_page_types[] = {
 	[MF_MSG_DAX]			= "dax page",
 	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
 	[MF_MSG_ALREADY_POISONED]	= "already poisoned page",
+	[MF_MSG_PFN_MAP]                = "non struct page pfn",
 	[MF_MSG_UNKNOWN]		= "unknown page",
 };
 
@@ -1349,7 +1355,7 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 {
 	trace_memory_failure_event(pfn, type, result);
 
-	if (type != MF_MSG_ALREADY_POISONED) {
+	if (type != MF_MSG_ALREADY_POISONED && type != MF_MSG_PFN_MAP) {
 		num_poisoned_pages_inc(pfn);
 		update_per_node_mf_stats(pfn, result);
 	}
@@ -2216,6 +2222,136 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags,
 	kill_procs(&tokill, true, pfn, flags);
 }
 
+int register_pfn_address_space(struct pfn_address_space *pfn_space)
+{
+	if (!pfn_space)
+		return -EINVAL;
+
+	scoped_guard(mutex, &pfn_space_lock) {
+		if (interval_tree_iter_first(&pfn_space_itree,
+					     pfn_space->node.start,
+					     pfn_space->node.last))
+			return -EBUSY;
+
+		interval_tree_insert(&pfn_space->node, &pfn_space_itree);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_pfn_address_space);
+
+void unregister_pfn_address_space(struct pfn_address_space *pfn_space)
+{
+	guard(mutex)(&pfn_space_lock);
+
+	if (pfn_space &&
+	    interval_tree_iter_first(&pfn_space_itree,
+				     pfn_space->node.start,
+				     pfn_space->node.last))
+		interval_tree_remove(&pfn_space->node, &pfn_space_itree);
+}
+EXPORT_SYMBOL_GPL(unregister_pfn_address_space);
+
+static void add_to_kill_pfn(struct task_struct *tsk,
+			    struct vm_area_struct *vma,
+			    struct list_head *to_kill,
+			    unsigned long pfn)
+{
+	struct to_kill *tk;
+
+	tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
+	if (!tk)
+		return;
+
+	/* Check for pgoff not backed by struct page */
+	tk->addr = vma_address(vma, pfn, 1);
+	tk->size_shift = PAGE_SHIFT;
+
+	if (tk->addr == -EFAULT)
+		pr_info("Unable to find address %lx in %s\n",
+			pfn, tsk->comm);
+
+	get_task_struct(tsk);
+	tk->tsk = tsk;
+	list_add_tail(&tk->nd, to_kill);
+}
+
+/*
+ * Collect processes when the error hit a PFN not backed by struct page.
+ */
+static void collect_procs_pfn(struct address_space *mapping,
+			      unsigned long pfn, struct list_head *to_kill)
+{
+	struct vm_area_struct *vma;
+	struct task_struct *tsk;
+
+	i_mmap_lock_read(mapping);
+	rcu_read_lock();
+	for_each_process(tsk) {
+		struct task_struct *t = tsk;
+
+		t = task_early_kill(tsk, true);
+		if (!t)
+			continue;
+		vma_interval_tree_foreach(vma, &mapping->i_mmap, pfn, pfn) {
+			if (vma->vm_mm == t->mm)
+				add_to_kill_pfn(t, vma, to_kill, pfn);
+		}
+	}
+	rcu_read_unlock();
+	i_mmap_unlock_read(mapping);
+}
+
+/**
+ * memory_failure_pfn - Handle memory failure on a page not backed by
+ *                      struct page.
+ * @pfn: Page Number of the corrupted page
+ * @flags: fine tune action taken
+ *
+ * Return:
+ *   0             - success,
+ *   -EBUSY        - Page PFN does not belong to any address space mapping.
+ */
+static int memory_failure_pfn(unsigned long pfn, int flags)
+{
+	struct interval_tree_node *node;
+	LIST_HEAD(tokill);
+
+	scoped_guard(mutex, &pfn_space_lock) {
+		bool mf_handled = false;
+
+		/*
+		 * Modules registers with MM the address space mapping to the device memory they
+		 * manage. Iterate to identify exactly which address space has mapped to this
+		 * failing PFN.
+		 */
+		for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node;
+		     node = interval_tree_iter_next(node, pfn, pfn)) {
+			struct pfn_address_space *pfn_space =
+				container_of(node, struct pfn_address_space, node);
+
+			collect_procs_pfn(pfn_space->mapping, pfn, &tokill);
+
+			mf_handled = true;
+		}
+
+		if (!mf_handled)
+			return action_result(pfn, MF_MSG_PFN_MAP, MF_IGNORED);
+	}
+
+	/*
+	 * Unlike System-RAM there is no possibility to swap in a different
+	 * physical page at a given virtual address, so all userspace
+	 * consumption of direct PFN memory necessitates SIGBUS (i.e.
+	 * MF_MUST_KILL)
+	 */
+	flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
+
+	kill_procs(&tokill, true, pfn, flags);
+
+	return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED);
+}
+
 /**
  * memory_failure - Handle memory failure of a page.
  * @pfn: Page Number of the corrupted page
@@ -2265,6 +2401,14 @@ int memory_failure(unsigned long pfn, int flags)
 		if (res == 0)
 			goto unlock_mutex;
 
+		if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) {
+			/*
+			 * The PFN is not backed by struct page.
+			 */
+			res = memory_failure_pfn(pfn, flags);
+			goto unlock_mutex;
+		}
+
 		if (pfn_valid(pfn)) {
 			pgmap = get_dev_pagemap(pfn, NULL);
 			put_ref_page(pfn, flags);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 3/3] vfio/nvgrace-gpu: register device memory for poison handling
  2025-10-26 14:19 [PATCH v4 0/3] mm: Implement ECC handling for pfn with no struct page ankita
  2025-10-26 14:19 ` [PATCH v4 1/3] mm: Change ghes code to allow poison of non-struct pfn ankita
  2025-10-26 14:19 ` [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages ankita
@ 2025-10-26 14:19 ` ankita
  2 siblings, 0 replies; 12+ messages in thread
From: ankita @ 2025-10-26 14:19 UTC (permalink / raw)
  To: ankita, aniketa, vsethi, jgg, mochs, skolothumtho, linmiaohe,
	nao.horiguchi, akpm, david, lorenzo.stoakes, Liam.Howlett,
	vbabka, rppt, surenb, mhocko, tony.luck, bp, rafael, guohanjun,
	mchehab, lenb, kevin.tian, alex
  Cc: cjia, kwankhede, targupta, zhiw, dnigam, kjaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

From: Ankit Agrawal <ankita@nvidia.com>

The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA
(Qemu) using remap_pfn_range() without adding the memory to the kernel.
The device memory pages are not backed by struct page. The previous
patch implements the mechanism to handle ECC/poison on memory page without
struct page. This new mechanism is being used here.

The module registers its memory region and the address_space with the
kernel MM for ECC handling using the register_pfn_address_space()
registration API exposed by the kernel.

Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [1]

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 drivers/vfio/pci/nvgrace-gpu/main.c | 45 ++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index d95761dcdd58..80b3ed63c682 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -8,6 +8,10 @@
 #include <linux/delay.h>
 #include <linux/jiffies.h>
 
+#ifdef CONFIG_MEMORY_FAILURE
+#include <linux/memory-failure.h>
+#endif
+
 /*
  * The device memory usable to the workloads running in the VM is cached
  * and showcased as a 64b device BAR (comprising of BAR4 and BAR5 region)
@@ -47,6 +51,9 @@ struct mem_region {
 		void *memaddr;
 		void __iomem *ioaddr;
 	};                      /* Base virtual address of the region */
+#ifdef CONFIG_MEMORY_FAILURE
+	struct pfn_address_space pfn_address_space;
+#endif
 };
 
 struct nvgrace_gpu_pci_core_device {
@@ -60,6 +67,28 @@ struct nvgrace_gpu_pci_core_device {
 	bool has_mig_hw_bug;
 };
 
+#ifdef CONFIG_MEMORY_FAILURE
+
+static int
+nvgrace_gpu_vfio_pci_register_pfn_range(struct mem_region *region,
+					struct vm_area_struct *vma)
+{
+	unsigned long nr_pages;
+	int ret = 0;
+
+	nr_pages = region->memlength >> PAGE_SHIFT;
+
+	region->pfn_address_space.node.start = vma->vm_pgoff;
+	region->pfn_address_space.node.last = vma->vm_pgoff + nr_pages - 1;
+	region->pfn_address_space.mapping = vma->vm_file->f_mapping;
+
+	ret = register_pfn_address_space(&region->pfn_address_space);
+
+	return ret;
+}
+
+#endif
+
 static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev)
 {
 	struct nvgrace_gpu_pci_core_device *nvdev =
@@ -127,6 +156,13 @@ static void nvgrace_gpu_close_device(struct vfio_device *core_vdev)
 
 	mutex_destroy(&nvdev->remap_lock);
 
+#ifdef CONFIG_MEMORY_FAILURE
+	if (nvdev->resmem.memlength)
+		unregister_pfn_address_space(&nvdev->resmem.pfn_address_space);
+
+	unregister_pfn_address_space(&nvdev->usemem.pfn_address_space);
+#endif
+
 	vfio_pci_core_close_device(core_vdev);
 }
 
@@ -202,7 +238,14 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
 
 	vma->vm_pgoff = start_pfn;
 
-	return 0;
+#ifdef CONFIG_MEMORY_FAILURE
+	if (nvdev->resmem.memlength && index == VFIO_PCI_BAR2_REGION_INDEX)
+		ret = nvgrace_gpu_vfio_pci_register_pfn_range(&nvdev->resmem, vma);
+	else if (index == VFIO_PCI_BAR4_REGION_INDEX)
+		ret = nvgrace_gpu_vfio_pci_register_pfn_range(&nvdev->usemem, vma);
+#endif
+
+	return ret;
 }
 
 static long
-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-10-26 14:19 ` [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages ankita
@ 2025-10-28  0:26   ` Andrew Morton
  2025-10-29  3:15     ` Ankit Agrawal
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2025-10-28  0:26 UTC (permalink / raw)
  To: ankita
  Cc: aniketa, vsethi, jgg, mochs, skolothumtho, linmiaohe,
	nao.horiguchi, david, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, tony.luck, bp, rafael, guohanjun, mchehab,
	lenb, kevin.tian, alex, cjia, kwankhede, targupta, zhiw, dnigam,
	kjaju, linux-kernel, linux-mm, linux-edac, Jonathan.Cameron,
	ira.weiny, Smita.KoralahalliChannabasappa, u.kleine-koenig,
	peterz, linux-acpi, kvm

On Sun, 26 Oct 2025 14:19:18 +0000 <ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> Poison (or ECC) errors can be very common on a large size cluster.
> The kernel MM currently does not handle ECC errors / poison on a memory
> region that is not backed by struct pages. If a memory region mapped
> using remap_pfn_range() for example, but not added to the kernel, MM
> will not have associated struct pages. Add a new mechanism to handle
> memory failure on such memory.
> 
> Make kernel MM expose a function to allow modules managing the device
> memory to register the device memory SPA and the address space associated
> it. MM maintains this information as an interval tree. On poison, MM can
> search for the range that the poisoned PFN belong and use the address_space
> to determine the mapping VMA.
> 
> In this implementation, kernel MM follows the following sequence that is
> largely similar to the memory_failure() handler for struct page backed
> memory:
> 1. memory_failure() is triggered on reception of a poison error. An
> absence of struct page is detected and consequently memory_failure_pfn()
> is executed.
> 2. memory_failure_pfn() collects the processes mapped to the PFN.
> 3. memory_failure_pfn() sends SIGBUS to all the processes mapping the
> faulty PFN using kill_procs().
> 
> Note that there is one primary difference versus the handling of the
> poison on struct pages, which is to skip unmapping to the faulty PFN.
> This is done to handle the huge PFNMAP support added recently [1] that
> enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN
> mapped in such as way would need breaking the PMD/PUD mapping into PTEs
> that will get mirrored into the S2. This can greatly increase the cost
> of table walks and have a major performance impact.
> 
> ...
>
> @@ -2216,6 +2222,136 @@ static void kill_procs_now(struct page *p, unsigned long pfn, int flags,
>  	kill_procs(&tokill, true, pfn, flags);
>  }
>  
> +int register_pfn_address_space(struct pfn_address_space *pfn_space)
> +{
> +	if (!pfn_space)
> +		return -EINVAL;

I suggest this be removed - make register_pfn_address_space(NULL)
illegal and let the punishment be an oops.

> +	scoped_guard(mutex, &pfn_space_lock) {
> +		if (interval_tree_iter_first(&pfn_space_itree,
> +					     pfn_space->node.start,
> +					     pfn_space->node.last))
> +			return -EBUSY;
> +
> +		interval_tree_insert(&pfn_space->node, &pfn_space_itree);
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(register_pfn_address_space);
> +
> +void unregister_pfn_address_space(struct pfn_address_space *pfn_space)
> +{
> +	guard(mutex)(&pfn_space_lock);
> +
> +	if (pfn_space &&
> +	    interval_tree_iter_first(&pfn_space_itree,
> +				     pfn_space->node.start,
> +				     pfn_space->node.last))
> +		interval_tree_remove(&pfn_space->node, &pfn_space_itree);
> +}
> +EXPORT_SYMBOL_GPL(unregister_pfn_address_space);
> +
> +static void add_to_kill_pfn(struct task_struct *tsk,
> +			    struct vm_area_struct *vma,
> +			    struct list_head *to_kill,
> +			    unsigned long pfn)
> +{
> +	struct to_kill *tk;
> +
> +	tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
> +	if (!tk)
> +		return;

This is unfortunate.  GFP_ATOMIC is unreliable and we silently behave
as if it worked OK.

> +	/* Check for pgoff not backed by struct page */
> +	tk->addr = vma_address(vma, pfn, 1);
> +	tk->size_shift = PAGE_SHIFT;
> +
> +	if (tk->addr == -EFAULT)
> +		pr_info("Unable to find address %lx in %s\n",
> +			pfn, tsk->comm);
> +
> +	get_task_struct(tsk);
> +	tk->tsk = tsk;
> +	list_add_tail(&tk->nd, to_kill);
> +}
> +
> +/*
> + * Collect processes when the error hit a PFN not backed by struct page.
> + */
> +static void collect_procs_pfn(struct address_space *mapping,
> +			      unsigned long pfn, struct list_head *to_kill)
> +{
> +	struct vm_area_struct *vma;
> +	struct task_struct *tsk;
> +
> +	i_mmap_lock_read(mapping);
> +	rcu_read_lock();
> +	for_each_process(tsk) {
> +		struct task_struct *t = tsk;
> +
> +		t = task_early_kill(tsk, true);
> +		if (!t)
> +			continue;
> +		vma_interval_tree_foreach(vma, &mapping->i_mmap, pfn, pfn) {
> +			if (vma->vm_mm == t->mm)
> +				add_to_kill_pfn(t, vma, to_kill, pfn);
> +		}
> +	}
> +	rcu_read_unlock();

We could play games here to make the GFP_ATOMIC allocation unnecessary,
but nasty.  Allocate the to_kill* outside the rcu_read_lock, pass that
pointer into add_to_kill_pfn().  If add_to_kill_pfn()'s
kmalloc(GFP_ATOMIC) failed, add_to_kill_pfn() can then consume the
caller's to_kill*.  Then the caller can drop the lock, allocate a new
to_kill* then restart the scan.  And teach add_to_kill_pfn() to not
re-add tasks which are already on the list.  Ugh.

At the very very least we should tell the user that the kernel goofed
and that one of their processes won't be getting killed.

> +	i_mmap_unlock_read(mapping);
> +}
> +
> +/**
> + * memory_failure_pfn - Handle memory failure on a page not backed by
> + *                      struct page.
> + * @pfn: Page Number of the corrupted page
> + * @flags: fine tune action taken
> + *
> + * Return:
> + *   0             - success,
> + *   -EBUSY        - Page PFN does not belong to any address space mapping.
> + */
> +static int memory_failure_pfn(unsigned long pfn, int flags)
> +{
> +	struct interval_tree_node *node;
> +	LIST_HEAD(tokill);
> +
> +	scoped_guard(mutex, &pfn_space_lock) {
> +		bool mf_handled = false;
> +
> +		/*
> +		 * Modules registers with MM the address space mapping to the device memory they
> +		 * manage. Iterate to identify exactly which address space has mapped to this
> +		 * failing PFN.

We're quite lenient about >80 columns nowadays, but overflowing 80 for
a block comment is rather needless.

> +		for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node;
> +		     node = interval_tree_iter_next(node, pfn, pfn)) {
> +			struct pfn_address_space *pfn_space =
> +				container_of(node, struct pfn_address_space, node);
>
> +			collect_procs_pfn(pfn_space->mapping, pfn, &tokill);
> +
> +			mf_handled = true;
> +		}
> +
> +		if (!mf_handled)
> +			return action_result(pfn, MF_MSG_PFN_MAP, MF_IGNORED);
> +	}
> +
> +	/*
> +	 * Unlike System-RAM there is no possibility to swap in a different
> +	 * physical page at a given virtual address, so all userspace
> +	 * consumption of direct PFN memory necessitates SIGBUS (i.e.
> +	 * MF_MUST_KILL)
> +	 */
> +	flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
> +
> +	kill_procs(&tokill, true, pfn, flags);
> +
> +	return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED);
> +}
> +



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-10-28  0:26   ` Andrew Morton
@ 2025-10-29  3:15     ` Ankit Agrawal
  2025-10-31  8:27       ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Ankit Agrawal @ 2025-10-29  3:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aniket Agashe, Vikram Sethi, Jason Gunthorpe, Matt Ochs,
	Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	tony.luck, bp, rafael, guohanjun, mchehab, lenb, kevin.tian,
	alex, Neo Jia, Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

Thanks Andrew for the comments.

>> +int register_pfn_address_space(struct pfn_address_space *pfn_space)
>> +{
>> +     if (!pfn_space)
>> +             return -EINVAL;
>
> I suggest this be removed - make register_pfn_address_space(NULL)
> illegal and let the punishment be an oops.

Yes, will remove it.

>> +static void add_to_kill_pfn(struct task_struct *tsk,
>> +                         struct vm_area_struct *vma,
>> +                         struct list_head *to_kill,
>> +                         unsigned long pfn)
>> +{
>> +     struct to_kill *tk;
>> +
>> +     tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
>> +     if (!tk)
>> +             return;
>
> This is unfortunate.  GFP_ATOMIC is unreliable and we silently behave
> as if it worked OK.

Got it. I'll mark this as a failure case.


> We could play games here to make the GFP_ATOMIC allocation unnecessary,
> but nasty.  Allocate the to_kill* outside the rcu_read_lock, pass that
> pointer into add_to_kill_pfn().  If add_to_kill_pfn()'s
> kmalloc(GFP_ATOMIC) failed, add_to_kill_pfn() can then consume the
> caller's to_kill*.  Then the caller can drop the lock, allocate a new
> to_kill* then restart the scan.  And teach add_to_kill_pfn() to not
> re-add tasks which are already on the list.  Ugh.
>
> At the very very least we should tell the user that the kernel goofed
> and that one of their processes won't be getting killed.

Thanks for the suggestion. As mentioned above I'll mark the kmalloc
allocation error as a failure and can put a log message there.

>> +     scoped_guard(mutex, &pfn_space_lock) {
>> +             bool mf_handled = false;
>> +
>> +             /*
>> +              * Modules registers with MM the address space mapping to the device memory they
>> +              * manage. Iterate to identify exactly which address space has mapped to this
>> +              * failing PFN.
>
> We're quite lenient about >80 columns nowadays, but overflowing 80 for
> a block comment is rather needless.

Yes. Since it passed through the strict checkpatch.pl check, I didn't notice.
I'll fix it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-10-29  3:15     ` Ankit Agrawal
@ 2025-10-31  8:27       ` Michal Hocko
  2025-11-02 11:55         ` Ankit Agrawal
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2025-10-31  8:27 UTC (permalink / raw)
  To: Ankit Agrawal
  Cc: Andrew Morton, Aniket Agashe, Vikram Sethi, Jason Gunthorpe,
	Matt Ochs, Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, tony.luck,
	bp, rafael, guohanjun, mchehab, lenb, kevin.tian, alex, Neo Jia,
	Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

On Wed 29-10-25 03:15:08, Ankit Agrawal wrote:
> >> +static void add_to_kill_pfn(struct task_struct *tsk,
> >> +                         struct vm_area_struct *vma,
> >> +                         struct list_head *to_kill,
> >> +                         unsigned long pfn)
> >> +{
> >> +     struct to_kill *tk;
> >> +
> >> +     tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
> >> +     if (!tk)
> >> +             return;
> >
> > This is unfortunate.  GFP_ATOMIC is unreliable and we silently behave
> > as if it worked OK.
> 
> Got it. I'll mark this as a failure case.

why do you need to batch all processes and kill them at once? Can you
just kill one by one?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-10-31  8:27       ` Michal Hocko
@ 2025-11-02 11:55         ` Ankit Agrawal
  2025-11-03 18:22           ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Ankit Agrawal @ 2025-11-02 11:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Aniket Agashe, Vikram Sethi, Jason Gunthorpe,
	Matt Ochs, Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, tony.luck,
	bp, rafael, guohanjun, mchehab, lenb, kevin.tian, alex, Neo Jia,
	Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

>> >> +static void add_to_kill_pfn(struct task_struct *tsk,
>> >> +                         struct vm_area_struct *vma,
>> >> +                         struct list_head *to_kill,
>> >> +                         unsigned long pfn)
>> >> +{
>> >> +     struct to_kill *tk;
>> >> +
>> >> +     tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
>> >> +     if (!tk)
>> >> +             return;
>> >
>> > This is unfortunate.  GFP_ATOMIC is unreliable and we silently behave
>> > as if it worked OK.
>>
>> Got it. I'll mark this as a failure case.
>
> why do you need to batch all processes and kill them at once? Can you
> just kill one by one?

Hi Michal, I am trying to replicate what is being done today for non-PFNMAP
memory failure in __add_to_kill
(https://github.com/torvalds/linux/blob/master/mm/memory-failure.c#L376).
For this series, I am inclined to keep it uniform.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-11-02 11:55         ` Ankit Agrawal
@ 2025-11-03 18:22           ` Michal Hocko
  2025-11-04  2:52             ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2025-11-03 18:22 UTC (permalink / raw)
  To: Ankit Agrawal
  Cc: Andrew Morton, Aniket Agashe, Vikram Sethi, Jason Gunthorpe,
	Matt Ochs, Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, tony.luck,
	bp, rafael, guohanjun, mchehab, lenb, kevin.tian, alex, Neo Jia,
	Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

On Sun 02-11-25 11:55:56, Ankit Agrawal wrote:
> >> >> +static void add_to_kill_pfn(struct task_struct *tsk,
> >> >> +                         struct vm_area_struct *vma,
> >> >> +                         struct list_head *to_kill,
> >> >> +                         unsigned long pfn)
> >> >> +{
> >> >> +     struct to_kill *tk;
> >> >> +
> >> >> +     tk = kmalloc(sizeof(*tk), GFP_ATOMIC);
> >> >> +     if (!tk)
> >> >> +             return;
> >> >
> >> > This is unfortunate.  GFP_ATOMIC is unreliable and we silently behave
> >> > as if it worked OK.
> >>
> >> Got it. I'll mark this as a failure case.
> >
> > why do you need to batch all processes and kill them at once? Can you
> > just kill one by one?
> 
> Hi Michal, I am trying to replicate what is being done today for non-PFNMAP
> memory failure in __add_to_kill
> (https://github.com/torvalds/linux/blob/master/mm/memory-failure.c#L376).
> For this series, I am inclined to keep it uniform.

Unless there is a very good reason for this code then I would rather not
rely on an atomic allocation. This just makes the behavior hard to
predict
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-11-03 18:22           ` Michal Hocko
@ 2025-11-04  2:52             ` Andrew Morton
  2025-11-04 10:37               ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2025-11-04  2:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ankit Agrawal, Aniket Agashe, Vikram Sethi, Jason Gunthorpe,
	Matt Ochs, Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, tony.luck,
	bp, rafael, guohanjun, mchehab, lenb, kevin.tian, alex, Neo Jia,
	Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

On Mon, 3 Nov 2025 19:22:09 +0100 Michal Hocko <mhocko@suse.com> wrote:

> > Hi Michal, I am trying to replicate what is being done today for non-PFNMAP
> > memory failure in __add_to_kill
> > (https://github.com/torvalds/linux/blob/master/mm/memory-failure.c#L376).
> > For this series, I am inclined to keep it uniform.
> 
> Unless there is a very good reason for this code then I would rather not
> rely on an atomic allocation. This just makes the behavior hard to
> predict

I don't think this was addressed in the v5 series.

Yes please, anything we can do to avoid GFP_ATOMIC makes the kernel
more reliable.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-11-04  2:52             ` Andrew Morton
@ 2025-11-04 10:37               ` Michal Hocko
  2025-11-04 17:21                 ` Ankit Agrawal
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2025-11-04 10:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ankit Agrawal, Aniket Agashe, Vikram Sethi, Jason Gunthorpe,
	Matt Ochs, Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, tony.luck,
	bp, rafael, guohanjun, mchehab, lenb, kevin.tian, alex, Neo Jia,
	Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

On Mon 03-11-25 18:52:26, Andrew Morton wrote:
> On Mon, 3 Nov 2025 19:22:09 +0100 Michal Hocko <mhocko@suse.com> wrote:
> 
> > > Hi Michal, I am trying to replicate what is being done today for non-PFNMAP
> > > memory failure in __add_to_kill
> > > (https://github.com/torvalds/linux/blob/master/mm/memory-failure.c#L376).
> > > For this series, I am inclined to keep it uniform.
> > 
> > Unless there is a very good reason for this code then I would rather not
> > rely on an atomic allocation. This just makes the behavior hard to
> > predict
> 
> I don't think this was addressed in the v5 series.
> 
> Yes please, anything we can do to avoid GFP_ATOMIC makes the kernel
> more reliable.

This could be done on top of the series because as such this is not a
blocker but it would be really great if we can stop copying a bad code
and rather get rid of it also in other poisoning code.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages
  2025-11-04 10:37               ` Michal Hocko
@ 2025-11-04 17:21                 ` Ankit Agrawal
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Agrawal @ 2025-11-04 17:21 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Aniket Agashe, Vikram Sethi, Jason Gunthorpe, Matt Ochs,
	Shameer Kolothum, linmiaohe, nao.horiguchi, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, tony.luck,
	bp, rafael, guohanjun, mchehab, lenb, kevin.tian, alex, Neo Jia,
	Kirti Wankhede, Tarun Gupta (SW-GPU),
	Zhi Wang, Dheeraj Nigam, Krishnakant Jaju, linux-kernel,
	linux-mm, linux-edac, Jonathan.Cameron, ira.weiny,
	Smita.KoralahalliChannabasappa, u.kleine-koenig, peterz,
	linux-acpi, kvm

>> > > Hi Michal, I am trying to replicate what is being done today for non-PFNMAP
>> > > memory failure in __add_to_kill
>> > > (https://github.com/torvalds/linux/blob/master/mm/memory-failure.c#L376).
>> > > For this series, I am inclined to keep it uniform.
>> >
>> > Unless there is a very good reason for this code then I would rather not
>> > rely on an atomic allocation. This just makes the behavior hard to
>> > predict
>>
>> I don't think this was addressed in the v5 series.
>>
>> Yes please, anything we can do to avoid GFP_ATOMIC makes the kernel
>> more reliable.
>
> This could be done on top of the series because as such this is not a
> blocker but it would be really great if we can stop copying a bad code
> and rather get rid of it also in other poisoning code.

Ok sure, I'll create a separate patch to cover that and do the one-by-one
process kill. I think separation might also have an advantage to isolate
regressions if any during verification.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-11-04 17:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-26 14:19 [PATCH v4 0/3] mm: Implement ECC handling for pfn with no struct page ankita
2025-10-26 14:19 ` [PATCH v4 1/3] mm: Change ghes code to allow poison of non-struct pfn ankita
2025-10-26 14:19 ` [PATCH v4 2/3] mm: handle poisoning of pfn without struct pages ankita
2025-10-28  0:26   ` Andrew Morton
2025-10-29  3:15     ` Ankit Agrawal
2025-10-31  8:27       ` Michal Hocko
2025-11-02 11:55         ` Ankit Agrawal
2025-11-03 18:22           ` Michal Hocko
2025-11-04  2:52             ` Andrew Morton
2025-11-04 10:37               ` Michal Hocko
2025-11-04 17:21                 ` Ankit Agrawal
2025-10-26 14:19 ` [PATCH v4 3/3] vfio/nvgrace-gpu: register device memory for poison handling ankita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox