From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Jiaqi Yan <jiaqiyan@google.com>,
Andrew Morton <akpm@linux-foundation.org>, <shy828301@gmail.com>
Cc: <kirill.shutemov@linux.intel.com>, <kirill@shutemov.name>,
<tongtiangen@huawei.com>, <tony.luck@intel.com>,
<naoya.horiguchi@nec.com>, <linmiaohe@huawei.com>,
<linux-mm@kvack.org>, <osalvador@suse.de>
Subject: Re: [PATCH v8 1/2] mm/khugepaged: recover from poisoned anonymous memory
Date: Sun, 4 Dec 2022 10:57:58 +0800 [thread overview]
Message-ID: <d203f358-c43e-ef0e-ab22-a85e9da989b9@huawei.com> (raw)
In-Reply-To: <CACw3F50wgfxDreGdDC+FPRJVGM1QaLB1qWaOvV++15rNZbgiag@mail.gmail.com>
On 2022/12/3 1:29, Jiaqi Yan wrote:
> On Thu, Dec 1, 2022 at 6:25 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>> On 2022/12/2 7:09, Andrew Morton wrote:
>>> On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan <jiaqiyan@google.com> wrote:
>>>
>>>> Make __collapse_huge_page_copy return whether copying anonymous pages
>>>> succeeded, and make collapse_huge_page handle the return status.
>>>>
>>>> Break existing PTE scan loop into two for-loops. The first loop copies
>>>> source pages into target huge page, and can fail gracefully when running
>>>> into memory errors in source pages. If copying all pages succeeds, the
>>>> second loop releases and clears up these normal pages. Otherwise, the
>>>> second loop rolls back the page table and page states by:
>>>> - re-establishing the original PTEs-to-PMD connection.
>>>> - releasing source pages back to their LRU list.
>>>>
>>>> Tested manually:
>>>> 0. Enable khugepaged on system under test.
>>>> 1. Start a two-thread application. Each thread allocates a chunk of
>>>> non-huge anonymous memory buffer.
>>>> 2. Pick 4 random buffer locations (2 in each thread) and inject
>>>> uncorrectable memory errors at corresponding physical addresses.
>>>> 3. Signal both threads to make their memory buffer collapsible, i.e.
>>>> calling madvise(MADV_HUGEPAGE).
>>>> 4. Wait and check kernel log: khugepaged is able to recover from poisoned
>>>> pages and skips collapsing them.
>>>> 5. Signal both threads to inspect their buffer contents and make sure no
>>>> data corruption.
>>> Looks like a nice patchset. I'd like to give it a run in linux-next
>>> but we're at -rc7 and we have no review/ack tags. So it should be a
>>> post-6.2-rc1 thing.
>>>
>>> I have a quibble.
>>>
>>>> --- a/include/linux/highmem.h
>>>> +++ b/include/linux/highmem.h
>>>> @@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from)
>>>>
>>>> #endif
>>>>
>>>> +/*
>>>> + * Machine check exception handled version of copy_highpage. Return number
>>>> + * of bytes not copied if there was an exception; otherwise 0 for success.
>>>> + * Note handling #MC requires arch opt-in.
>>>> + */
>>>> +static inline int copy_mc_highpage(struct page *to, struct page *from)
>>>> +{
>>>> + char *vfrom, *vto;
>>>> + unsigned long ret;
>>>> +
>>>> + vfrom = kmap_local_page(from);
>>>> + vto = kmap_local_page(to);
>>>> + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
>>>> + if (ret == 0)
>>>> + kmsan_copy_page_meta(to, from);
>>>> + kunmap_local(vto);
>>>> + kunmap_local(vfrom);
>>>> +
>>>> + return ret;
>>>> +}
>>> Why inlined? It's large, it's slow, it's called only from
>>> khugepaged.c. A regular out-of-line function which is static to
>>> khugepaged.c seems more appropriate?
>> There is a similar function copy_mc_user_highpage(), could we reuse
>> it , see commit a873dfe1032a mm, hwpoison: try to recover from copy-on
>> write faults
>>
>>
> To Kefeng: As I explained in v7, besides `to` and `from` pages,
> copy_mc_user_highpage takes `struct vm_area_struct *vma` and `vaddr`.
> While it fits __collapse_huge_page_copy, it doesn't really fit well
> for collapse_file (needed for the 2nd commit). When Shi Yang reviewed
> my patches, we agreed that we should borrow this opportunity to unify
> the copying routines in khugepaged.c (for both file-backed and anon
> memory), and copy_highpage fits both (alternatively we can use
> copy_user_highpage and passing vaddr=null and vma=null, but I don't
> like that). So I choose to make copy_highpage to be MC recoverable.
> Does this make sense to you?
Yes,thanks for your explanation.
>
> To Andrew: I think it is a reasonable "quibble". I will prepare the
> update in v9 while waiting for more reviews on v8 if there is.
> .
next prev parent reply other threads:[~2022-12-04 2:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-01 0:59 [PATCH v8 0/2] Memory poison recovery in khugepaged collapsing Jiaqi Yan
2022-12-01 0:59 ` [PATCH v8 1/2] mm/khugepaged: recover from poisoned anonymous memory Jiaqi Yan
2022-12-01 23:09 ` Andrew Morton
2022-12-02 2:25 ` Kefeng Wang
2022-12-02 17:29 ` Jiaqi Yan
2022-12-04 2:57 ` Kefeng Wang [this message]
2022-12-01 0:59 ` [PATCH v8 2/2] mm/khugepaged: recover from poisoned file-backed memory Jiaqi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d203f358-c43e-ef0e-ab22-a85e9da989b9@huawei.com \
--to=wangkefeng.wang@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=jiaqiyan@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
--cc=shy828301@gmail.com \
--cc=tongtiangen@huawei.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox