From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CF01C47088 for ; Sun, 4 Dec 2022 02:58:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 308468E0002; Sat, 3 Dec 2022 21:58:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B8DF8E0001; Sat, 3 Dec 2022 21:58:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 180078E0002; Sat, 3 Dec 2022 21:58:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0518F8E0001 for ; Sat, 3 Dec 2022 21:58:06 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C7D6C1209C1 for ; Sun, 4 Dec 2022 02:58:05 +0000 (UTC) X-FDA: 80203114530.21.627FE55 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf10.hostedemail.com (Postfix) with ESMTP id 5BFFAC0005 for ; Sun, 4 Dec 2022 02:58:03 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670122685; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k8ZBcVLKUkqOcqQJltJlnybvGONHLb5htXZ0EOubQ6M=; b=g0//9ucAgtg9PyzO151hmsWP5YicAjdJ63o/Ic3OrXssCrIsHW2SAS0dJzrbg7ztS6Xk1U I8n1cBzklRueQOvjiKh52V43tLBmdFOlEPwonn42wEKRpCesvEM0ld3ztUAGnZsFN/lu2y kEyXx6I4ZIdrPbt1KibpnG6M5d5ikhY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670122685; a=rsa-sha256; cv=none; b=Hg8TNDTDaYcES3AzH+/1STzps1tWUSA8obO7Ut5fx3iKHOVyRvOlUuOZMRcKRSoYAfZwyg GsODDQL4ZRmGiXz8URWuz5zQ9cDeu/fTR2TBvqm5x36bHdWaUDVeIspHzTUbzflkH+Kd6z PeNtB6jVJvSiwCkH+QugJiXd6ZspETg= Received: from dggpemm500001.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4NPrq507tSzqSkW; Sun, 4 Dec 2022 10:53:53 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sun, 4 Dec 2022 10:57:59 +0800 Message-ID: Date: Sun, 4 Dec 2022 10:57:58 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH v8 1/2] mm/khugepaged: recover from poisoned anonymous memory Content-Language: en-US To: Jiaqi Yan , Andrew Morton , CC: , , , , , , , References: <20221201005931.3877608-1-jiaqiyan@google.com> <20221201005931.3877608-2-jiaqiyan@google.com> <20221201150919.bc41d6f9269e63fc86b1d17d@linux-foundation.org> <83c0ee14-75d3-c2ef-c6d0-040bf5e2fc7e@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Stat-Signature: czya1ou18zoerpx7xihx49g6fq4p8kd1 X-Rspam-User: X-Spamd-Result: default: False [-6.70 / 9.00]; BAYES_HAM(-6.00)[100.00%]; DMARC_POLICY_ALLOW(-0.50)[huawei.com,quarantine]; R_SPF_ALLOW(-0.20)[+ip4:45.249.212.187/29]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_SEVEN(0.00)[11]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; HAS_XOIP(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 5BFFAC0005 X-Rspamd-Server: rspam06 X-HE-Tag: 1670122683-679526 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/12/3 1:29, Jiaqi Yan wrote: > On Thu, Dec 1, 2022 at 6:25 PM Kefeng Wang wrote: >> >> On 2022/12/2 7:09, Andrew Morton wrote: >>> On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan wrote: >>> >>>> Make __collapse_huge_page_copy return whether copying anonymous pages >>>> succeeded, and make collapse_huge_page handle the return status. >>>> >>>> Break existing PTE scan loop into two for-loops. The first loop copies >>>> source pages into target huge page, and can fail gracefully when running >>>> into memory errors in source pages. If copying all pages succeeds, the >>>> second loop releases and clears up these normal pages. Otherwise, the >>>> second loop rolls back the page table and page states by: >>>> - re-establishing the original PTEs-to-PMD connection. >>>> - releasing source pages back to their LRU list. >>>> >>>> Tested manually: >>>> 0. Enable khugepaged on system under test. >>>> 1. Start a two-thread application. Each thread allocates a chunk of >>>> non-huge anonymous memory buffer. >>>> 2. Pick 4 random buffer locations (2 in each thread) and inject >>>> uncorrectable memory errors at corresponding physical addresses. >>>> 3. Signal both threads to make their memory buffer collapsible, i.e. >>>> calling madvise(MADV_HUGEPAGE). >>>> 4. Wait and check kernel log: khugepaged is able to recover from poisoned >>>> pages and skips collapsing them. >>>> 5. Signal both threads to inspect their buffer contents and make sure no >>>> data corruption. >>> Looks like a nice patchset. I'd like to give it a run in linux-next >>> but we're at -rc7 and we have no review/ack tags. So it should be a >>> post-6.2-rc1 thing. >>> >>> I have a quibble. >>> >>>> --- a/include/linux/highmem.h >>>> +++ b/include/linux/highmem.h >>>> @@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from) >>>> >>>> #endif >>>> >>>> +/* >>>> + * Machine check exception handled version of copy_highpage. Return number >>>> + * of bytes not copied if there was an exception; otherwise 0 for success. >>>> + * Note handling #MC requires arch opt-in. >>>> + */ >>>> +static inline int copy_mc_highpage(struct page *to, struct page *from) >>>> +{ >>>> + char *vfrom, *vto; >>>> + unsigned long ret; >>>> + >>>> + vfrom = kmap_local_page(from); >>>> + vto = kmap_local_page(to); >>>> + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); >>>> + if (ret == 0) >>>> + kmsan_copy_page_meta(to, from); >>>> + kunmap_local(vto); >>>> + kunmap_local(vfrom); >>>> + >>>> + return ret; >>>> +} >>> Why inlined? It's large, it's slow, it's called only from >>> khugepaged.c. A regular out-of-line function which is static to >>> khugepaged.c seems more appropriate? >> There is a similar function copy_mc_user_highpage(), could we reuse >> it , see commit a873dfe1032a mm, hwpoison: try to recover from copy-on >> write faults >> >> > To Kefeng: As I explained in v7, besides `to` and `from` pages, > copy_mc_user_highpage takes `struct vm_area_struct *vma` and `vaddr`. > While it fits __collapse_huge_page_copy, it doesn't really fit well > for collapse_file (needed for the 2nd commit). When Shi Yang reviewed > my patches, we agreed that we should borrow this opportunity to unify > the copying routines in khugepaged.c (for both file-backed and anon > memory), and copy_highpage fits both (alternatively we can use > copy_user_highpage and passing vaddr=null and vma=null, but I don't > like that). So I choose to make copy_highpage to be MC recoverable. > Does this make sense to you? Yes,thanks for your explanation. > > To Andrew: I think it is a reasonable "quibble". I will prepare the > update in v9 while waiting for more reviews on v8 if there is. > .