linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: <ankita@nvidia.com>
To: <ankita@nvidia.com>, <vsethi@nvidia.com>, <jgg@nvidia.com>,
	<mochs@nvidia.com>, <jgg@ziepe.ca>, <skolothumtho@nvidia.com>,
	<alex@shazbot.org>, <akpm@linux-foundation.org>,
	<linmiaohe@huawei.com>, <nao.horiguchi@gmail.com>
Cc: <cjia@nvidia.com>, <zhiw@nvidia.com>, <kjaju@nvidia.com>,
	<yishaih@nvidia.com>, <kevin.tian@intel.com>,
	<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>
Subject: [PATCH v2 0/3] mm: fixup pfnmap memory failure handling
Date: Sat, 13 Dec 2025 04:47:05 +0000	[thread overview]
Message-ID: <20251213044708.3610-1-ankita@nvidia.com> (raw)

From: Ankit Agrawal <ankita@nvidia.com>

It was noticed during 6.19 merge window that the patch series [1] to
introduce memory failure handling for the PFNMAP memory is broken.

The expected behaviour of the series is to allow a driver (such as
nvgrace-gpu) to register its device memory with the mm. The mm would
then handle the poison on that registered memory region.

However, the following issues were identified in the patch series.
1. Faulty use of PFN instead of mapping file page offset to derive
the usermode process VA corresponding to the mapping to PFN.
2. nvgrace-gpu code called the registration at mmap, exposing it
to corruption. This may happen, when multiple mmap were called on the
same BAR. This issue was also noticed by Linus Torvalds who reverted
the patch [2].

This patch series addresses those issues.

Patch 1/3 fixes the first issue by translating PFN to page offset
and using that information to send the SIGBUS to the mapping process.
Patch 2/3 add stubs for CONFIG_MEMORY_FAILURE disabled.
Patch 3/3 is a resend of the reverted change to register the device
memory at the time of open instead of mmap.

Many thanks to Jason Gunthorpe (jgg@nvidia.com) and Alex Williamson
(alex@shazbot.org) for identifying the issue and suggesting the fix.
Thanks to Andrew Morton (akpm@linux-foundation.org) for picking up
1/3 for mm-unstable. Requesting to consider the entire series in 6.19
as 3/3 is a resend-with-fix of the only user that was reverted in the
original series [2].

Link: https://lore.kernel.org/all/20251102184434.2406-1-ankita@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20251102184434.2406-4-ankita@nvidia.com/ [2]

Changelog:
v2:
* 1/3 added to the mm-unstable branch (Thanks Andrew Morton!)
* Fixed return types in 3/3 based on Alex Williamson' suggestions.
* s/u64/pgoff_t u64 for offsets in 3/3 (Thanks Alex Williamson)
* Removed inine in pfn_memregion_offset in 3/3 (Thanks Alex Williamson)
* No change in 1/3, 2/3.

Link:
https://lore.kernel.org/all/20251211070603.338701-1-ankita@nvidia.com/ [v1]

Ankit Agrawal (3):
  mm: fixup pfnmap memory failure handling to use pgoff
  mm: add stubs for PFNMAP memory failure registration functions
  vfio/nvgrace-gpu: register device memory for poison handling

 drivers/vfio/pci/nvgrace-gpu/main.c | 116 +++++++++++++++++++++++++++-
 include/linux/memory-failure.h      |  15 +++-
 mm/memory-failure.c                 |  29 ++++---
 3 files changed, 143 insertions(+), 17 deletions(-)

-- 
2.34.1



             reply	other threads:[~2025-12-13  4:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-13  4:47 ankita [this message]
2025-12-13  4:47 ` [PATCH v2 1/3] mm: fixup pfnmap memory failure handling to use pgoff ankita
2025-12-17  3:10   ` Miaohe Lin
2025-12-17 18:10     ` Ankit Agrawal
2025-12-18  2:18       ` Miaohe Lin
2025-12-13  4:47 ` [PATCH v2 2/3] mm: add stubs for PFNMAP memory failure registration functions ankita
2025-12-13  4:47 ` [PATCH v2 3/3] vfio/nvgrace-gpu: register device memory for poison handling ankita
2025-12-13  8:00   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251213044708.3610-1-ankita@nvidia.com \
    --to=ankita@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=cjia@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mochs@nvidia.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=skolothumtho@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox