From: Miaohe Lin <linmiaohe@huawei.com>
To: Matthew Wilcox <willy@infradead.org>,
Jane Chu <jane.chu@oracle.com>,
Oscar Salvador <osalvador@suse.de>
Cc: <linux-mm@kvack.org>
Subject: Re: [PATCH v2 07/11] mm/memory-failure: Convert memory_failure() to use a folio
Date: Thu, 11 Apr 2024 17:00:33 +0800 [thread overview]
Message-ID: <b7fbe946-976e-e95c-04d1-09e76eeeaa98@huawei.com> (raw)
In-Reply-To: <ZhdCKXr-e-H2COwk@casper.infradead.org>
On 2024/4/11 9:51, Matthew Wilcox wrote:
> On Wed, Apr 10, 2024 at 06:27:38PM -0700, Jane Chu wrote:
>> On 4/10/2024 4:15 PM, Jane Chu wrote:
>>
>>> On 4/10/2024 3:21 AM, Oscar Salvador wrote:
>>>
>>>> On Mon, Apr 08, 2024 at 05:34:29PM -0700, Jane Chu wrote:
>>>>> At this stage, 'p' is not expected to be a large page since a
>>>>> _refcount has
>>>>> been taken, right? So is it reasonable to replace the "if
>>>>> (retry)" block
>>>>> with a VM_BUG_ON warning? otherwise, if 'p' became part of a different
>>>>> large folio, how to recover from here ?
>>>> We might have split the THP (if it was part of), but AFAICS
>>>> nothing stops the kernel to coallesce this page again into a new THP
>>>> via e.g:
>>>> madvise(MADV_HUGEPAGE) ?
>>>
>>> Good point, thanks!
>>
>> Come to think of it a bit more, maybe the check could be
>>
>> if (folio_test_large(folio) || (folio != page_folio(p))) {
>>
>> ? Because, 'p' could have become part of a THP after shake_folio() and
>> before folio_lock(), right?
This check is originally introduced via commit:
"""
commit f37d4298aa7f8b74395aa13c728677e2ed86fdaf
Author: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Date: Wed Aug 6 16:06:49 2014 -0700
hwpoison: fix race with changing page during offlining
When a hwpoison page is locked it could change state due to parallel
modifications. The original compound page can be torn down and then
this 4k page becomes part of a differently-size compound page is is a
standalone regular page.
Check after the lock if the page is still the same compound page.
We could go back, grab the new head page and try again but it should be
quite rare, so I thought this was safest. A retry loop would be more
difficult to test and may have more side effects.
The hwpoison code by design only tries to handle cases that are
reasonably common in workloads, as visible in page-flags.
I'm not really that concerned about handling this (likely rare case),
just not crashing on it.
Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a013bc94ebbe..44c6bd201d3a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1172,6 +1172,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
lock_page(hpage);
+ /*
+ * The page could have changed compound pages during the locking.
+ * If this happens just bail out.
+ */
+ if (compound_head(p) != hpage) {
+ action_result(pfn, "different compound page after locking", IGNORED);
+ res = -EBUSY;
+ goto out;
+ }
+
/*
* We use page flags to determine what action should be taken, but
* the flags can be modified by the error containment action. One
"""
And the discussion can be found at [1]. The root cause for the issue is indeterminate, but most probably:
[1] https://lore.kernel.org/linux-mm/20140626195036.GA5311@nhori.redhat.com/
"""
And I think that the problem you report is caused by another part of hwpoison
code, because we have PageSlab check at the beginning of hwpoison_user_mappings(),
so if LRU page truned into slab page just before locking the page, we never
reach try_to_unmap().
I think this was caused by the code around lock migration after thp split
in hwpoison_user_mappings(), which was introduced in commit 54b9dd14d09f
("mm/memory-failure.c: shift page lock from head page to tail page after thp split").
I guess the tail page (raw error page) was freed and turned into Slab page
just after thp split and before locking the error page.
So possible solution is to do page status check again after thp split.
"""
IIUC, it's because the page lock is shifted from @hpage to @p. So there's a window
that p is freed and turned into Slab page.
@@ -942,11 +942,14 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
* We pinned the head page for hwpoison handling,
* now we split the thp and we are interested in
* the hwpoisoned raw page, so move the refcount
- * to it.
+ * to it. Similarly, page lock is shifted.
*/
if (hpage != p) {
put_page(hpage);
get_page(p);
+ lock_page(p);
+ unlock_page(hpage);
+ *hpagep = p;
}
/* THP is split, so ppage should be the real poisoned page. */
ppage = p;
>
> What is the mechanism for reassembling a large folio? I don't see that
> code anywhere.
But as code changes, the above page lock shift is gone. And I think below logic can't
trigger now. As we hold extra page refcnt so page can't be coallesced into a new THP or Slab page.
/*
* We're only intended to deal with the non-Compound page here.
* However, the page could have changed compound pages due to
* race window. If this happens, we could try again to hopefully
* handle the page next round.
*/
if (PageCompound(p)) {
if (retry) {
ClearPageHWPoison(p);
unlock_page(p);
put_page(p);
flags &= ~MF_COUNT_INCREASED;
retry = false;
goto try_again;
}
res = action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
goto unlock_page;
}
So it might be better to replace above code block as WARN_ON(PageCompound(p)) and remove MF_MSG_DIFFERENT_COMPOUND case.
Any thoughts?
Thanks.
.
>
> .
>
next prev parent reply other threads:[~2024-04-11 9:00 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-08 19:42 [PATCH v2 00/11] Some cleanups for memory-failure Matthew Wilcox (Oracle)
2024-04-08 19:42 ` [PATCH v2 01/11] mm/memory-failure: Remove fsdax_pgoff argument from __add_to_kill Matthew Wilcox (Oracle)
2024-04-10 9:09 ` Oscar Salvador
2024-04-08 19:42 ` [PATCH v2 02/11] mm/memory-failure: Pass addr to __add_to_kill() Matthew Wilcox (Oracle)
2024-04-08 22:32 ` Jane Chu
2024-04-10 9:11 ` Oscar Salvador
2024-04-08 19:42 ` [PATCH v2 03/11] mm: Return the address from page_mapped_in_vma() Matthew Wilcox (Oracle)
2024-04-08 22:38 ` Jane Chu
2024-04-10 9:38 ` Oscar Salvador
2024-04-11 2:56 ` Miaohe Lin
2024-04-11 17:37 ` Matthew Wilcox
2024-04-08 19:42 ` [PATCH v2 04/11] mm: Make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE Matthew Wilcox (Oracle)
2024-04-08 22:45 ` Jane Chu
2024-04-08 22:52 ` Matthew Wilcox
2024-04-09 6:35 ` Jane Chu
2024-04-10 9:39 ` Oscar Salvador
2024-04-08 19:42 ` [PATCH v2 05/11] mm/memory-failure: Convert shake_page() to shake_folio() Matthew Wilcox (Oracle)
2024-04-08 22:53 ` Jane Chu
2024-04-08 19:42 ` [PATCH v2 06/11] mm: Convert hugetlb_page_mapping_lock_write to folio Matthew Wilcox (Oracle)
2024-04-08 23:09 ` Jane Chu
2024-04-10 9:52 ` Oscar Salvador
2024-04-08 19:42 ` [PATCH v2 07/11] mm/memory-failure: Convert memory_failure() to use a folio Matthew Wilcox (Oracle)
2024-04-09 0:34 ` Jane Chu
2024-04-10 10:21 ` Oscar Salvador
2024-04-10 14:23 ` Matthew Wilcox
2024-04-10 15:30 ` Oscar Salvador
2024-04-10 23:15 ` Jane Chu
2024-04-11 1:27 ` Jane Chu
2024-04-11 1:51 ` Matthew Wilcox
2024-04-11 9:00 ` Miaohe Lin [this message]
2024-04-11 11:23 ` Oscar Salvador
2024-04-11 12:17 ` Matthew Wilcox
2024-04-12 19:48 ` Matthew Wilcox
2024-04-12 22:09 ` Oscar Salvador
2024-04-15 18:47 ` Jane Chu
2024-04-16 9:13 ` Miaohe Lin
2024-04-08 19:42 ` [PATCH v2 08/11] mm/memory-failure: Convert hwpoison_user_mappings to take " Matthew Wilcox (Oracle)
2024-04-09 6:15 ` Jane Chu
2024-04-08 19:42 ` [PATCH v2 09/11] mm/memory-failure: Add some folio conversions to unpoison_memory Matthew Wilcox (Oracle)
2024-04-09 6:17 ` Jane Chu
2024-04-08 19:42 ` [PATCH v2 10/11] mm/memory-failure: Use folio functions throughout collect_procs() Matthew Wilcox (Oracle)
2024-04-09 6:19 ` Jane Chu
2024-04-08 19:42 ` [PATCH v2 11/11] mm/memory-failure: Pass the folio to collect_procs_ksm() Matthew Wilcox (Oracle)
2024-04-09 6:27 ` Jane Chu
2024-04-09 12:11 ` Matthew Wilcox
2024-04-09 15:15 ` Jane Chu
2024-04-09 6:28 ` [PATCH v2 00/11] Some cleanups for memory-failure Jane Chu
2024-04-09 12:12 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7fbe946-976e-e95c-04d1-09e76eeeaa98@huawei.com \
--to=linmiaohe@huawei.com \
--cc=jane.chu@oracle.com \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox