From: Lisa Wang <wyihan@google.com>
To: linmiaohe@huawei.com, nao.horiguchi@gmail.com,
akpm@linux-foundation.org, pbonzini@redhat.com,
shuah@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-kselftest@vger.kernel.org
Cc: david@redhat.com, rientjes@google.com, seanjc@google.com,
ackerleytng@google.com, vannapurve@google.com,
michael.roth@amd.com, jiaqiyan@google.com, tabba@google.com,
dave.hansen@linux.intel.com, Lisa Wang <wyihan@google.com>
Subject: [RFC PATCH RESEND 0/3] mm: Fix MF_DELAYED handling on memory failure
Date: Wed, 15 Oct 2025 18:58:54 +0000 [thread overview]
Message-ID: <cover.1760551864.git.wyihan@google.com> (raw)
Message-ID: <20251015185854.2-aI27LAkTVMyur38HRG8upvGWn-Kt49TfM8wHIRH9s@z> (raw)
[resend to correct the mailing list address]
Hello,
This patch series addresses an issue in the memory failure handling path
where MF_DELAYED is incorrectly treated as an error. This issue was
revealed because guest_memfd’s .error_remove_folio() callback returns
MF_DELAYED.
Currently, when the .error_remove_folio() callback for guest_memfd returns
MF_DELAYED, there are a few issues.
1. truncate_error_folio() maps this to MF_FAILED. This causes
memory_failure() to return -EBUSY, which unconditionally triggers a
SIGBUS. The process’ configured memory corruption kill policy is ignored
- even if PR_MCE_KILL_LATE is set, the process will still get a SIGBUS
on deferred memory failures.
2. “Failed to punch page” is printed, even though MF_DELAYED indicates that
it was intentionally not punched.
The first patch corrects this by updating truncate_error_folio() to
propagate MF_DELAYED to its caller. This allows memory_failure() to return
0, indicating success, and lets the delayed handling proceed as designed.
This patch also updates me_pagecache_clean() to account for the folio's
refcount, which remains elevated during delayed handling, aligning its
logic with me_swapcache_dirty().
The subsequent two patches add KVM selftests to validate the fix and the
expected behavior of guest_memfd memory failure:
The first test patch verifies that memory_failure() now returns 0 in the
delayed case and confirms that SIGBUS signaling logic remains correct for
other scenarios (e.g., madvise injection or PR_MCE_KILL_EARLY).
The second test patch confirms that after a memory failure, the poisoned
page is correctly unmapped from the KVM guest's stage 2 page tables and
that a subsequent access by the guest correctly notifies the userspace VMM
with EHWPOISON.
This patch series is built upon kvm/next. In addition, to align with the
change of INIT_SHARED and to use the macro wrapper in guest_memfd
selftests, we put these patches behind Sean’s patches [1].
For ease of testing, this series is also available, stitched together, at
https://github.com/googleprodkernel/linux-cc/tree/memory-failure-mf-delayed-fix-rfc-v1
[1]: https://lore.kernel.org/all/20251003232606.4070510-1-seanjc@google.com/T/
Thank you,
Lisa Wang (3):
mm: memory_failure: Fix MF_DELAYED handling on truncation during
failure
KVM: selftests: Add memory failure tests in guest_memfd_test
KVM: selftests: Test guest_memfd behavior with respect to stage 2 page
tables
mm/memory-failure.c | 24 +-
.../testing/selftests/kvm/guest_memfd_test.c | 233 ++++++++++++++++++
2 files changed, 248 insertions(+), 9 deletions(-)
--
2.51.0.788.g6d19910ace-goog
next reply other threads:[~2025-10-15 19:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 18:35 Lisa Wang [this message]
2025-10-15 18:35 ` [RFC PATCH 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND " Lisa Wang
2025-10-16 20:18 ` David Hildenbrand
2025-10-17 17:30 ` Lisa Wang
2025-10-20 12:37 ` David Hildenbrand
2025-10-15 18:35 ` [RFC PATCH 2/3] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 3/3] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1760551864.git.wyihan@google.com \
--to=wyihan@google.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=jiaqiyan@google.com \
--cc=kvm@vger.kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=michael.roth@amd.com \
--cc=nao.horiguchi@gmail.com \
--cc=pbonzini@redhat.com \
--cc=rientjes@google.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox