linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Lisa Wang <wyihan@google.com>,
	linmiaohe@huawei.com, nao.horiguchi@gmail.com,
	akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-kselftest@vger.kernel.org
Cc: rientjes@google.com, seanjc@google.com, ackerleytng@google.com,
	vannapurve@google.com, michael.roth@amd.com, jiaqiyan@google.com,
	tabba@google.com, dave.hansen@linux.intel.com
Subject: Re: [RFC PATCH RESEND 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure
Date: Thu, 16 Oct 2025 22:18:17 +0200	[thread overview]
Message-ID: <91dbea57-d5b0-49b7-8920-3a2d252c46b0@redhat.com> (raw)
In-Reply-To: <57ed0bcbcfcec6fda89d60727467d7bd621c95ab.1760551864.git.wyihan@google.com>

On 15.10.25 20:58, Lisa Wang wrote:
> The .error_remove_folio a_ops is used by different filesystems to handle
> folio truncation upon discovery of a memory failure in the memory
> associated with the given folio.
> 
> Currently, MF_DELAYED is treated as an error, causing "Failed to punch
> page" to be written to the console. MF_DELAYED is then relayed to the
> caller of truncat_error_folio() as MF_FAILED. This further causes
> memory_failure() to return -EBUSY, which then always causes a SIGBUS.
> 
> This is also implies that regardless of whether the thread's memory
> corruption kill policy is PR_MCE_KILL_EARLY or PR_MCE_KILL_LATE, a
> memory failure within guest_memfd memory will always cause a SIGBUS.
> 
> Update truncate_error_folio() to return MF_DELAYED to the caller if the
> .error_remove_folio() callback reports MF_DELAYED.
> 
> Generalize the comment: MF_DELAYED means memory failure was handled and
> some other part of memory failure will be handled later (e.g. a next
> access will result in the process being killed). Specifically for
> guest_memfd, a next access by the guest will result in an error returned
> to the userspace VMM.
> 
> With delayed handling, the filemap continues to hold refcounts on the
> folio. Hence, take that into account when checking for extra refcounts
> in me_pagecache_clean(). This is aligned with the implementation in
> me_swapcache_dirty(), where, if a folio is still in the swap cache,
> extra_pins is set to true.
> 
> Signed-off-by: Lisa Wang <wyihan@google.com>
> ---
>   mm/memory-failure.c | 24 +++++++++++++++---------
>   1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index df6ee59527dd..77f665c16a73 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -922,9 +922,11 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
>    * by the m-f() handler immediately.
>    *
>    * MF_DELAYED - The m-f() handler marks the page as PG_hwpoisoned'ed.
> - * The page is unmapped, and is removed from the LRU or file mapping.
> - * An attempt to access the page again will trigger page fault and the
> - * PF handler will kill the process.
> + * It means memory_failure was handled (e.g. removed from file mapping or the
> + * LRU) and some other part of memory failure will be handled later (e.g. a
> + * next access will result in the process being killed). Specifically for
> + * guest_memfd, a next access by the guest will result in an error returned to
> + * the userspace VMM.
>    *
>    * MF_RECOVERED - The m-f() handler marks the page as PG_hwpoisoned'ed.
>    * The page has been completely isolated, that is, unmapped, taken out of
> @@ -999,6 +1001,9 @@ static int truncate_error_folio(struct folio *folio, unsigned long pfn,
>   	if (mapping->a_ops->error_remove_folio) {
>   		int err = mapping->a_ops->error_remove_folio(mapping, folio);
>   
> +		if (err == MF_DELAYED)
> +			return err;
> +
>   		if (err != 0)
>   			pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
>   		else if (!filemap_release_folio(folio, GFP_NOIO))
> @@ -1108,18 +1113,19 @@ static int me_pagecache_clean(struct page_state *ps, struct page *p)
>   		goto out;
>   	}
>   
> -	/*
> -	 * The shmem page is kept in page cache instead of truncating
> -	 * so is expected to have an extra refcount after error-handling.
> -	 */
> -	extra_pins = shmem_mapping(mapping);
> -
>   	/*
>   	 * Truncation is a bit tricky. Enable it per file system for now.
>   	 *
>   	 * Open: to take i_rwsem or not for this? Right now we don't.
>   	 */
>   	ret = truncate_error_folio(folio, page_to_pfn(p), mapping);
> +
> +	/*
> +	 * The shmem page, or any page with MF_DELAYED error handling, is kept in
> +	 * page cache instead of truncating, so is expected to have an extra
> +	 * refcount after error-handling.
> +	 */
> +	extra_pins = shmem_mapping(mapping) || ret == MF_DELAYED;

Well, to do it cleanly shouldn't we let shmem_error_remove_folio() also 
return MF_DELAYED and remove this shmem special case?

Or is there a good reason shmem_mapping() wants to return 0 -- and maybe 
guest_memfd would also wan to do that?

Just reading the code here the inconsistency is unclear.

-- 
Cheers

David / dhildenb



  parent reply	other threads:[~2025-10-16 20:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15 18:35 [RFC PATCH 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 1/3] mm: memory_failure: Fix MF_DELAYED handling on truncation during failure Lisa Wang
2025-10-15 18:58   ` [RFC PATCH RESEND " Lisa Wang
2025-10-16 20:18   ` David Hildenbrand [this message]
2025-10-17 17:30     ` Lisa Wang
2025-10-20 12:37   ` David Hildenbrand
2025-10-15 18:35 ` [RFC PATCH 2/3] KVM: selftests: Add memory failure tests in guest_memfd_test Lisa Wang
2025-10-15 18:58   ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:35 ` [RFC PATCH 3/3] KVM: selftests: Test guest_memfd behavior with respect to stage 2 page tables Lisa Wang
2025-10-15 18:58   ` [RFC PATCH RESEND " Lisa Wang
2025-10-15 18:58 ` [RFC PATCH RESEND 0/3] mm: Fix MF_DELAYED handling on memory failure Lisa Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91dbea57-d5b0-49b7-8920-3a2d252c46b0@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=jiaqiyan@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=michael.roth@amd.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=wyihan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox