From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48EBCC4332F for ; Fri, 2 Dec 2022 17:29:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD4716B0071; Fri, 2 Dec 2022 12:29:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5D716B0073; Fri, 2 Dec 2022 12:29:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D7716B0074; Fri, 2 Dec 2022 12:29:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8A6206B0071 for ; Fri, 2 Dec 2022 12:29:36 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5A4B78127D for ; Fri, 2 Dec 2022 17:29:36 +0000 (UTC) X-FDA: 80198053152.05.31454BA Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf27.hostedemail.com (Postfix) with ESMTP id 0370A40009 for ; Fri, 2 Dec 2022 17:29:35 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cbiJM0C6; spf=pass (imf27.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670002176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nK6hYm1YYNXoAGDVh9iNCpCzOfqZ2VjY7xnAj5yw56A=; b=ubVZ2xJw3v6qYGfewS+CQ5X7k5TogxKHFvJetwQ/8OcMq0/YNvwjgp1DQixv35hZknBAdq /bTWKJiiEhme40EclvlRyu0h635+1Pg5M8LqjrrKRveEiFieIKnetCOqLss3Cuj0QoBviv ZT/cl9SdcRsQw8T6sVTl/Lh3F+k0Igs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cbiJM0C6; spf=pass (imf27.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670002176; a=rsa-sha256; cv=none; b=6b9lOCUp1+5vwQ4UnUKXHy6+HAiXagMjRVa+lSPwC9nrf0gZDADXBxxj9cPn1O1brXBwVp fQrvY/E+qqieoZdQ15knFCE/Kq7/JTd8Y9TeG08RcgYqQ+4uYgcBdcBYuDcmREreVZLHgP sd4bI6h0ZmpbpDIidFh75un60uuT9uU= Received: by mail-pj1-f44.google.com with SMTP id q17-20020a17090aa01100b002194cba32e9so8945435pjp.1 for ; Fri, 02 Dec 2022 09:29:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=nK6hYm1YYNXoAGDVh9iNCpCzOfqZ2VjY7xnAj5yw56A=; b=cbiJM0C6aZ3axNQRvjM+ywhCnbzqpMEV33j0CAVctwfSYCQfQ8b1nRMXVx/Aa1QkT2 +w2XgjLyRUS/Ri/NeOXioYnBz9y3+CYqD4Wq6bBE5/jHvfegPVjbH/bBpZzYjXhi8WvF oOkR99IvzdJY6HeDrfqtHKc5ohjT8WlPqo+GwcBY5IXNGJF0AoSqwu6ZebuXc5yXiOC2 A2y8iToo8oTvMtm9vjwlnFIBiF22D2BNJX6bkpf+LnOrPIm7kbzJI0N6FRhn3eplfQ02 98S686t3GKoWPgFdhwbp2sb8FeGsbjhj5jMkaBQKwLZ7QH+LyQMSN/XStAr6TaQPBVhL viVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nK6hYm1YYNXoAGDVh9iNCpCzOfqZ2VjY7xnAj5yw56A=; b=VM6Q6a33CjgDwSDJb5oxXXmhgYDH5XtVQzb3W1z1tSiFAHtY8qJH72XVBDKH5Pr/2A hfsYL7OkAg1j+1yHNeTFhxaOy1W8uI/E2EU11MaXPqpIA4Yo7NJFC43TVi9zUhAbu+Qx LJCKAWliZkq0hFNWipd2w+dM4NkTN284uuQf0r9DSpOK4WbASfd8yPRqNzDBAV8F7hTe QaQ7n5NOBt1PiljFW0Dg7UDJOU1jSdr0U+lm8Zs+oOFJcPk+K459rs6eTVsxYdjO7UsE 0Rjgi80uVmlkNJj4g+vZsPFXVSe2YhVlxsevIBGJSf5BclmtAz5ot/pI2gNbmdU7KueU JMaQ== X-Gm-Message-State: ANoB5pkO+K6mKcv1ctV7ggiOK+VkBP4TWW1LrzWrvu85t44mwA2n9192 opZSIv7RLIJSKrQR7N7cvedN0dR9aQkmybzSBfB1Ew== X-Google-Smtp-Source: AA0mqf57pgY5kUNKdjzYrYy0qO9x3nJXbv3lF+k5AFwHcv6Nr96tCy7LO5rdkl8EQAzb7/tQ9KEf/SIQnYbSrbZON54= X-Received: by 2002:a17:90a:a012:b0:20d:7c09:c92d with SMTP id q18-20020a17090aa01200b0020d7c09c92dmr75430650pjp.95.1670002174699; Fri, 02 Dec 2022 09:29:34 -0800 (PST) MIME-Version: 1.0 References: <20221201005931.3877608-1-jiaqiyan@google.com> <20221201005931.3877608-2-jiaqiyan@google.com> <20221201150919.bc41d6f9269e63fc86b1d17d@linux-foundation.org> <83c0ee14-75d3-c2ef-c6d0-040bf5e2fc7e@huawei.com> In-Reply-To: <83c0ee14-75d3-c2ef-c6d0-040bf5e2fc7e@huawei.com> From: Jiaqi Yan Date: Fri, 2 Dec 2022 09:29:23 -0800 Message-ID: Subject: Re: [PATCH v8 1/2] mm/khugepaged: recover from poisoned anonymous memory To: Kefeng Wang , Andrew Morton , shy828301@gmail.com Cc: kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, osalvador@suse.de Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 0370A40009 X-Stat-Signature: kuz13pujw6pykfi5en6iwg5n544cnoan X-Rspam-User: X-Spamd-Result: default: False [-2.90 / 9.00]; BAYES_HAM(-6.00)[100.00%]; SORBS_IRL_BL(3.00)[209.85.216.44:from]; BAD_REP_POLICIES(0.10)[]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; DMARC_POLICY_ALLOW(0.00)[google.com,reject]; RCPT_COUNT_SEVEN(0.00)[11]; DKIM_TRACE(0.00)[google.com:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; R_DKIM_ALLOW(0.00)[google.com:s=20210112]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(0.00)[+ip4:209.85.128.0/17]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[] X-Rspamd-Server: rspam08 X-HE-Tag: 1670002175-414393 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 1, 2022 at 6:25 PM Kefeng Wang wrote: > > > On 2022/12/2 7:09, Andrew Morton wrote: > > On Wed, 30 Nov 2022 16:59:30 -0800 Jiaqi Yan wrote: > > > >> Make __collapse_huge_page_copy return whether copying anonymous pages > >> succeeded, and make collapse_huge_page handle the return status. > >> > >> Break existing PTE scan loop into two for-loops. The first loop copies > >> source pages into target huge page, and can fail gracefully when running > >> into memory errors in source pages. If copying all pages succeeds, the > >> second loop releases and clears up these normal pages. Otherwise, the > >> second loop rolls back the page table and page states by: > >> - re-establishing the original PTEs-to-PMD connection. > >> - releasing source pages back to their LRU list. > >> > >> Tested manually: > >> 0. Enable khugepaged on system under test. > >> 1. Start a two-thread application. Each thread allocates a chunk of > >> non-huge anonymous memory buffer. > >> 2. Pick 4 random buffer locations (2 in each thread) and inject > >> uncorrectable memory errors at corresponding physical addresses. > >> 3. Signal both threads to make their memory buffer collapsible, i.e. > >> calling madvise(MADV_HUGEPAGE). > >> 4. Wait and check kernel log: khugepaged is able to recover from poisoned > >> pages and skips collapsing them. > >> 5. Signal both threads to inspect their buffer contents and make sure no > >> data corruption. > > Looks like a nice patchset. I'd like to give it a run in linux-next > > but we're at -rc7 and we have no review/ack tags. So it should be a > > post-6.2-rc1 thing. > > > > I have a quibble. > > > >> --- a/include/linux/highmem.h > >> +++ b/include/linux/highmem.h > >> @@ -361,6 +361,27 @@ static inline void copy_highpage(struct page *to, struct page *from) > >> > >> #endif > >> > >> +/* > >> + * Machine check exception handled version of copy_highpage. Return number > >> + * of bytes not copied if there was an exception; otherwise 0 for success. > >> + * Note handling #MC requires arch opt-in. > >> + */ > >> +static inline int copy_mc_highpage(struct page *to, struct page *from) > >> +{ > >> + char *vfrom, *vto; > >> + unsigned long ret; > >> + > >> + vfrom = kmap_local_page(from); > >> + vto = kmap_local_page(to); > >> + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); > >> + if (ret == 0) > >> + kmsan_copy_page_meta(to, from); > >> + kunmap_local(vto); > >> + kunmap_local(vfrom); > >> + > >> + return ret; > >> +} > > Why inlined? It's large, it's slow, it's called only from > > khugepaged.c. A regular out-of-line function which is static to > > khugepaged.c seems more appropriate? > > There is a similar function copy_mc_user_highpage(), could we reuse > it , see commit a873dfe1032a mm, hwpoison: try to recover from copy-on > write faults > > To Kefeng: As I explained in v7, besides `to` and `from` pages, copy_mc_user_highpage takes `struct vm_area_struct *vma` and `vaddr`. While it fits __collapse_huge_page_copy, it doesn't really fit well for collapse_file (needed for the 2nd commit). When Shi Yang reviewed my patches, we agreed that we should borrow this opportunity to unify the copying routines in khugepaged.c (for both file-backed and anon memory), and copy_highpage fits both (alternatively we can use copy_user_highpage and passing vaddr=null and vma=null, but I don't like that). So I choose to make copy_highpage to be MC recoverable. Does this make sense to you? To Andrew: I think it is a reasonable "quibble". I will prepare the update in v9 while waiting for more reviews on v8 if there is.