From: David Hildenbrand <david@redhat.com>
To: Lisa Wang <wyihan@google.com>,
linmiaohe@huawei.com, nao.horiguchi@gmail.com,
akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, linux-kselftest@vger.kernel.org
Cc: rientjes@google.com, seanjc@google.com, ackerleytng@google.com,
vannapurve@google.com, michael.roth@amd.com, jiaqiyan@google.com,
tabba@google.com, dave.hansen@linux.intel.com
Subject: Re: [RFC PATCH RESEND 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure
Date: Thu, 16 Oct 2025 22:18:17 +0200 [thread overview]
Message-ID: <91dbea57-d5b0-49b7-8920-3a2d252c46b0@redhat.com> (raw)
In-Reply-To: <57ed0bcbcfcec6fda89d60727467d7bd621c95ab.1760551864.git.wyihan@google.com>
On 15.10.25 20:58, Lisa Wang wrote:
> The .error_remove_folio a_ops is used by different filesystems to handle
> folio truncation upon discovery of a memory failure in the memory
> associated with the given folio.
>
> Currently, MF_DELAYED is treated as an error, causing "Failed to punch
> page" to be written to the console. MF_DELAYED is then relayed to the
> caller of truncat_error_folio() as MF_FAILED. This further causes
> memory_failure() to return -EBUSY, which then always causes a SIGBUS.
>
> This is also implies that regardless of whether the thread's memory
> corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a
> memory failure within guest_memfd memory will always cause a SIGBUS.
>
> Update truncate_error_folio() to return MF_DELAYED to the caller if the
> .error_remove_folio() callback reports MF_DELAYED.
>
> Generalize the comment: MF_DELAYED means memory failure was handled and
> some other part of memory failure will be handled later (e.g. a next
> access will result in the process being killed). Specifically for
> guest_memfd, a next access by the guest will result in an error returned
> to the userspace VMM.
>
> With delayed handling, the filemap continues to hold refcounts on the
> folio. Hence, take that into account when checking for extra refcounts
> in me_pagecache_clean(). This is aligned with the implementation in
> me_swapcache_dirty(), where, if a folio is still in the swap cache,
> extra_pins is set to true.
>
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
> mm/memory-failure.c | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index df6ee59527dd..77f665c16a73 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -922,9 +922,11 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
> * by the m-f() handler immediately.
> *
> * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed.
> - * The page is unmapped, and is removed from the LRU or file mapping.
> - * An attempt to access the page again will trigger page fault and the
> - * PF handler will kill the process.
> + * It means memory_failure was handled (e.g. removed from file mapping or the
> + * LRU) and some other part of memory failure will be handled later (e.g. a
> + * next access will result in the process being killed). Specifically for
> + * guest_memfd, a next access by the guest will result in an error returned to
> + * the userspace VMM.
> *
> * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
> * The page has been completely isolated, that is, unmapped, taken out of
> @@ -999,6 +1001,9 @@ static int truncate_error_folio(struct folio *folio, unsigned long pfn,
> if (mapping->a_ops->error_remove_folio) {
> int err = mapping->a_ops->error_remove_folio(mapping, folio);
>
> + if (err == MF_DELAYED)
> + return err;
> +
> if (err != 0)
> pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
> else if (!filemap_release_folio(folio, GFP_NOIO))
> @@ -1108,18 +1113,19 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
> goto out;
> }
>
> - /*
> - * The shmem page is kept in page cache instead of truncating
> - * so is expected to have an extra refcount after error-handling.
> - */
> - extra_pins = shmem_mapping(mapping);
> -
> /*
> * Truncation is a bit tricky. Enable it per file system for now.
> *
> * Open: to take i_rwsem or not for this? Right now we don't.
> */
> ret = truncate_error_folio(folio, page_to_pfn(p), mapping);
> +
> + /*
> + * The shmem page, or any page with MF_DELAYED error handling, is kept in
> + * page cache instead of truncating, so is expected to have an extra
> + * refcount after error-handling.
> + */
> + extra_pins = shmem_mapping(mapping) || ret == MF_DELAYED;
Well, to do it cleanly shouldn't we let shmem_error_remove_folio() also
return MF_DELAYED and remove this shmem special case?
Or is there a good reason shmem_mapping() wants to return 0 -- and maybe
guest_memfd would also wan to do that?
Just reading the code here the inconsistency is unclear.
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-16 20:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 18:35 [RFC PATCH 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND " Lisa Wang
2025-10-16 20:18 ` David Hildenbrand [this message]
2025-10-17 17:30 ` Lisa Wang
2025-10-20 12:37 ` David Hildenbrand
2025-10-15 18:35 ` [RFC PATCH 2/3] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 3/3] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91dbea57-d5b0-49b7-8920-3a2d252c46b0@redhat.com \
--to=david@redhat.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=jiaqiyan@google.com \
--cc=kvm@vger.kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=michael.roth@amd.com \
--cc=nao.horiguchi@gmail.com \
--cc=pbonzini@redhat.com \
--cc=rientjes@google.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=wyihan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox