From: Miaohe Lin <linmiaohe@huawei.com>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
"Kefeng Wang" <wangkefeng.wang@huawei.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH -next resend v3] mm: hwposion: support recovery from ksm_might_need_to_copy()
Date: Sat, 17 Dec 2022 10:24:39 +0800 [thread overview]
Message-ID: <dbb212bb-fac9-cb38-a32e-b64755a67d29@huawei.com> (raw)
In-Reply-To: <20221216014729.GA2116060@hori.linux.bs1.fc.nec.co.jp>
On 2022/12/16 9:47, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Dec 13, 2022 at 08:05:23PM +0800, Kefeng Wang wrote:
>> When the kernel copy a page from ksm_might_need_to_copy(), but runs
>> into an uncorrectable error, it will crash since poisoned page is
>> consumed by kernel, this is similar to Copy-on-write poison recovery,
>
> Maybe you mean "this is similar to the issue recently fixed by
> Copy-on-write poison recovery."? And if this sentence ends here,
> please put "." instead of ",".
>
>> When an error is detected during the page copy, return VM_FAULT_HWPOISON
>> in do_swap_page(), and install a hwpoison entry in unuse_pte() when
>> swapoff, which help us to avoid system crash. Note, memory failure on
>> a KSM page will be skipped, but still call memory_failure_queue() to
>> be consistent with general memory failure process.
>
> Thank you for the work. I have a few comment below ...
Thanks both.
>> - if (unlikely(!PageUptodate(page))) {
>> - pte_t pteval;
>> + if (hwposioned || !PageUptodate(page)) {
>> + swp_entry_t swp_entry;
>>
>> dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
>> - pteval = swp_entry_to_pte(make_swapin_error_entry());
>> - set_pte_at(vma->vm_mm, addr, pte, pteval);
>> - swap_free(entry);
>> + if (hwposioned) {
>> + swp_entry = make_hwpoison_entry(swapcache);
>> + page = swapcache;
>
> This might work for the process accessing to the broken page, but ksm
> pages are likely to be shared by multiple processes, so it would be
> much nicer if you can convert all mapping entries for the error ksm page
> into hwpoisoned ones. Maybe in this thorough approach,
> hwpoison_user_mappings() is updated to call try_to_unmap() for ksm pages.
> But it's not necessary to do this together with applying mcsafe-memcpy,
> because recovery action and mcsafe-memcpy can be done independently.
>
I'm afraid leaving the ksm page in the cache will repeatedly trigger uncorrectable error for the
same page if ksm pages are shared by multiple processes. This might reach the hardware threshold
and result in fatal uncorrectable error (thus casuing system to panic). So IMHO it might be better
to check if page is hwpoisoned before calling ksm_might_need_to_copy() if above thorough approach
is not implemented. But I can easily be wrong.
Thanks,
Miaohe Lin
next prev parent reply other threads:[~2022-12-17 2:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-13 3:05 [PATCH -next " Kefeng Wang
2022-12-13 11:33 ` kernel test robot
2022-12-13 12:05 ` [PATCH -next resend " Kefeng Wang
2022-12-16 1:47 ` HORIGUCHI NAOYA(堀口 直也)
2022-12-16 8:42 ` Kefeng Wang
2022-12-17 2:24 ` Miaohe Lin [this message]
2023-02-01 1:23 ` Kefeng Wang
2023-02-01 0:32 ` Andrew Morton
2023-02-01 1:33 ` Kefeng Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dbb212bb-fac9-cb38-a32e-b64755a67d29@huawei.com \
--to=linmiaohe@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=tony.luck@intel.com \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox