linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] mm: Fix MF_DELAYED handling on memory failure
@ 2025-10-15 18:35 Lisa Wang
  2025-10-15 18:35 ` [RFC PATCH 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure Lisa Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Lisa Wang @ 2025-10-15 18:35 UTC (permalink / raw)
  To: linmiaohe, nao.horiguchi, akpm, pbonzini, shuah, linux-mm,
	linux-kernel, kvm, linux-kselftest
  Cc: david, rientjes, seanjc, ackerleytng, vannapurve, michael.roth,
	jiaqiyan, tabba, dave.hansen, Lisa Wang

Hello,

This patch series addresses an issue in the memory failure handling path
where MF_DELAYED is incorrectly treated as an error. This issue was
revealed because guest_memfd’s .error_remove_folio() callback returns
MF_DELAYED.

Currently, when the .error_remove_folio() callback for guest_memfd returns
MF_DELAYED, there are a few issues.

1. truncate_error_folio() maps this to MF_FAILED. This causes
   memory_failure() to return -EBUSY, which unconditionally triggers a
   SIGBUS. The process’ configured memory corruption kill policy is ignored
   - even if PR_MCE_KILL_LATE is set, the process will still get a SIGBUS
   on deferred memory failures.

2. “Failed to punch page” is printed, even though MF_DELAYED indicates that
   it was intentionally not punched.

The first patch corrects this by updating truncate_error_folio() to
propagate MF_DELAYED to its caller. This allows memory_failure() to return
0, indicating success, and lets the delayed handling proceed as designed.
This patch also updates me_pagecache_clean() to account for the folio's
refcount, which remains elevated during delayed handling, aligning its
logic with me_swapcache_dirty().

The subsequent two patches add KVM selftests to validate the fix and the
expected behavior of guest_memfd memory failure:

The first test patch verifies that memory_failure() now returns 0 in the
delayed case and confirms that SIGBUS signaling logic remains correct for
other scenarios (e.g., madvise injection or PR_MCE_KILL_EARLY).

The second test patch confirms that after a memory failure, the poisoned
page is correctly unmapped from the KVM guest's stage 2 page tables and
that a subsequent access by the guest correctly notifies the userspace VMM
with EHWPOISON.

This patch series is built upon kvm/next. In addition, to align with the
change of INIT_SHARED and to use the macro wrapper in guest_memfd
selftests, we put these patches behind Sean’s patches [1].

For ease of testing, this series is also available, stitched together, at
https://github.com/googleprodkernel/linux-cc/tree/memory-failure-mf-delayed-fix-rfc-v1 

[1]: https://lore.kernel.org/all/20251003232606.4070510-1-seanjc@google.com/T/

Thank you,


Lisa Wang (3):
  mm: memory_failure: Fix MF_DELAYED handling on truncation during
    failure
  KVM: selftests: Add memory failure tests in guest_memfd_test
  KVM: selftests: Test guest_memfd behavior with respect to stage 2 page
    tables

 mm/memory-failure.c                           |  24 +-
 .../testing/selftests/kvm/guest_memfd_test.c  | 233 ++++++++++++++++++
 2 files changed, 248 insertions(+), 9 deletions(-)

-- 
2.51.0.788.g6d19910ace-goog



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-20 12:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-15 18:35 [RFC PATCH 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure Lisa Wang
2025-10-15 18:58   ` [RFC PATCH RESEND " Lisa Wang
2025-10-16 20:18   ` David Hildenbrand
2025-10-17 17:30     ` Lisa Wang
2025-10-20 12:37   ` David Hildenbrand
2025-10-15 18:35 ` [RFC PATCH 2/3] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
2025-10-15 18:58   ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 3/3] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2025-10-15 18:58   ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox