From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 127BCC76188 for ; Wed, 5 Apr 2023 03:57:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 610A66B0071; Tue, 4 Apr 2023 23:57:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C07E6B0072; Tue, 4 Apr 2023 23:57:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 488646B0074; Tue, 4 Apr 2023 23:57:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3AD156B0071 for ; Tue, 4 Apr 2023 23:57:46 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 047DBC0512 for ; Wed, 5 Apr 2023 03:57:45 +0000 (UTC) X-FDA: 80645978532.21.5FA2998 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf14.hostedemail.com (Postfix) with ESMTP id 3EDC6100015 for ; Wed, 5 Apr 2023 03:57:43 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oj+tpI2m; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680667063; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=igOCvkgGX9p3cTh2XEmGeOw/wOwcit4/jFhaxGVMV0Y=; b=XfrQWrrhMuKzgeI0wbuhrkmBExTCu9qWwneRSdxMbIFzFtpCL30ruUF1F96tyswkYfNQMD wxjARPE/K+CUyRE+5pRSfwqoVALHl1HQFQrYVSCR+qunSOvc5vYTNU1CsDaRNZkFS9Tbjh RmtXTmqy5x2u0MStLVWn/X+t7v7XQik= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oj+tpI2m; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680667063; a=rsa-sha256; cv=none; b=5narT1QCCruGYHZ5V5bnZWz5VjUdxpeby0rNZkKcNvXqfgVN0YB//nmpStMgfGgoVKSud5 hNsGPWLx5n4cvu53NH9DGXfZ8SRaLYKi64G7TvvSwuQxn7nePswlv4x0FxISY9LHdyO9hc zzM4cWpG/Dnj19IMH9Sh4bSbZqlv6Ek= Received: by mail-pl1-f170.google.com with SMTP id ix20so33273059plb.3 for ; Tue, 04 Apr 2023 20:57:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680667062; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=igOCvkgGX9p3cTh2XEmGeOw/wOwcit4/jFhaxGVMV0Y=; b=oj+tpI2mxd4yC+erlvXIFNmdMHPjy+yMAoZYmxD0TNtlwH/C0A9+oYVZyp4xxHbBid kqTNnLBzc2A2/nS5Sx3QtmgFYa/N0pabmKRePjfg16UPa1Fz3UjVO1WT9BRA3jxHX7HX n0RQfL6ZTnm8N7HvznQy1DRq7n74IlPM814Q1LMQzPgtIowsgxpskgl0wfxxScgidPuJ fHs2BwUkJidTqFBxrJri23WQwDT2RC9ru4YJjGZE/1lvS3qQKSxP+eRX0LjvdljDbahM z1/20/Q16NQSQtu6rLEF4kwQvyxuozyxrkKUeyEDhFJl/fQ4w9Zxy7UX8BWusJrCyTMi nMIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680667062; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=igOCvkgGX9p3cTh2XEmGeOw/wOwcit4/jFhaxGVMV0Y=; b=vMm21ofnU08KDGfWKCJtetxRC9o2Hn/TG/V76qqT8uK1N4WBaDSeD4lI8faf6I4Q0f lHyjD4LTYV5dKCdjx2RpCZfPbAHDYlaWzXnjxB4HHIXA7Dqpv2Up6X3EQLG4LjJSKQ44 0Qyik1F+Nnyx6f1ON4D2ERYM/fO7BIbsTmVnDlfPixmp9QJpbO+ftNvLNKbB5n5G4nlS xlSGTqiQBDxLLj7e055O5oYvCiucRNWfmGo+rj9cRzsmNsBD+7wsad6MDlQdUiLsERsx 7xD2oWgd4tFv6i7k9jiEzUQnwx4Ikzwsifbxl9BiDWvCyd007l3AlD48ikc7R27JOgRF NqTA== X-Gm-Message-State: AAQBX9ffiPiZ/OCmSXVzToPNENhjLFZh9NbGYOYQhb7LExK5q8s0R9a8 YsktxQ2mKSSatlsqt/6ENIsxJ3VzAfZEjIiEE+c= X-Google-Smtp-Source: AKy350bJbTIkc7se2Xy6sD/u9lj+ON0ZwsoMty1tp0GbgFAWcVLbblBy5yAJP4Kyg0dynU1UpSTR8IF5lP72KHtNMQs= X-Received: by 2002:a17:90a:d201:b0:240:d8d8:12c4 with SMTP id o1-20020a17090ad20100b00240d8d812c4mr1700980pju.3.1680667061815; Tue, 04 Apr 2023 20:57:41 -0700 (PDT) MIME-Version: 1.0 References: <20230329151121.949896-1-jiaqiyan@google.com> In-Reply-To: From: Yang Shi Date: Tue, 4 Apr 2023 20:57:30 -0700 Message-ID: Subject: Re: [PATCH v12 0/3] Memory poison recovery in khugepaged collapsing To: Jiaqi Yan Cc: kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, akpm@linux-foundation.org, osalvador@suse.de, wangkefeng.wang@huawei.com, stevensd@chromium.org, hughd@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3EDC6100015 X-Stat-Signature: zkgzsw51ee9ruu9xa9exar8scws6xo1u X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680667063-60742 X-HE-Meta: U2FsdGVkX1/5e+rukFqQ3kfbf+2IlMpNv9qwpyym1qOrMUDOb2WIxyWMq2jj/GKrCo3CfcKIZo97PEuskPVWMFSPg0I3fVGHAk5rS2SQzlXDo6z9/umeRWtow+DTJmT99JbvjjghL5CFfqAkOnyxEWzHKa8OXjyow6k2KnsLMbV5ce/FgcQ4XWJcIR1hC34aXdzZ8/BlcpFWO7BsoDuIKVKhZEtDf9vFJfZHOVEofFLQtWcGgdulLaoIrNrtVq2WmC9ghAvY0JFEpIt+sGk4ekwSnOfiFEXostsu/40iqjxAmxsZvsSPGPXVmJImCNPvPNPeRJLY0wTG6awIxnSHLrs2APgww5dkh1mj4/bTYWR6KAevd9iOU/nMhwTwzQuHFA/i4NvN3gnH2gYhBnpRLvoaeNoBA55Wsp1xj/iNZunxl9x8r1Lh2x0t94mTpmkk7/AC3FvVTpNaQAHhch9xU2xrmtRkeZ5Tr8NL5Oi5Jo0Rd01EM94CgF+2+kjRUco/lsEp4EEGuS4GhV17SFSpL1ocPCgScdv7Peoaidap6nccQQjCAl+dBlie9YyK/lweelCphmD06MwF1/LgO0Bd6KSEndM0gOmf+pwb5XibvvH0t6lZ+GDv5Txu4kmMKV5HVGLWir8QCMpb+EPatgkUJXJY8tKhJKzvDaxGgkIS6EwybzTL7F4ScKXg8//3QxEXMBr078fhrb0lJf62PIwVfonn2KX65xE1w3PnEaMPDCQLwwJk3CHTsVJaWsIDnsBl+wtvignLJ2Thwmqa+mrHevNrWSZ55orXPz2m5JukbrhSB6ZvVFxBBwubTmiBEysGSNfXeMQbXhAqdHNH5y9npAXHJwlXYHTzF8B7SZwmUCoKpY0pLT+zHFoiBihuvi4cVGF7t6jwzh498c82YBYSwBLOS/nxr42Z77tpuolvz9YhGo8ZA8xx2LVFXecOgGFjCn2xp4yF+aUABJHZWoz qyeMJg/U ZPZRn+55zLtI+eFkfiEthDRIykuW9wrPACR1+ORywzRiytDslPSfG4HclVQVi65ZIu6vZsWI81jn59U9o4QT1712VXLAEjLFrtIGWvj0W+EsvBqhr/6Jnqa+kattNlL6wNGJExRWJykz4xkYRuv8kcGF8a65kLF6KSbt1tyh8+YswYcX8xZ601L0rv8YzAeGTWVNKy0IpUBfjtpRjUhEwoOSYwud/GKEXOwQC1K/r4Q90p/qv3ndsuE58MgDghHDZiOY8CUh0SryAa9e0gLPBfw8m2Yycakdtp7cdLTDAqmHJBaTqyRUsFjE02Wt6KpW1omA8bSDoRqeD6Xnqqky2oyUM34GVXHLNoOA1jrxC5nLtRPa1xBIwXaHD4S9It4gWnAwTmhHUW4LZen0SUo8ANet4FcN+eqF9xpZXrRLC8Uc9IlaO1mhE3AM1ZUW8p9ZZcFcO8MiVqpMvpOVuQ/b5aIFyCSP5hscJamPqs0cQm7XVlP0MMi8hcRC/qoXJ2Te4cXZPenKrZerhxMER/nKISZu9IVlMxqWwlCAFMQKYYIGkRz/+iwBJvsdlgQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 4, 2023 at 11:44=E2=80=AFAM Jiaqi Yan wro= te: > > Friendly ping for review :) Both I and Hugh already gave reviewed/acked for the previous version. Since there were just some minor changes so you could keep the reviewed/acked from the previous version. > > On Wed, Mar 29, 2023 at 8:11=E2=80=AFAM Jiaqi Yan w= rote: >> >> Problem >> =3D=3D=3D=3D=3D=3D=3D >> Memory DIMMs are subject to multi-bit flips, i.e. memory errors. >> As memory size and density increase, the chances of and number of >> memory errors increase. The increasing size and density of server >> RAM in the data center and cloud have shown increased uncorrectable >> memory errors. There are already mechanisms in the kernel to recover >> from uncorrectable memory errors. This series of patches provides >> the recovery mechanism for the particular kernel agent khugepaged >> when it collapses memory pages. >> >> Impact >> =3D=3D=3D=3D=3D=3D >> The main reason we chose to make khugepaged collapsing tolerant of >> memory failures was its high possibility of accessing poisoned memory >> while performing functionally optional compaction actions. >> Standard applications typically don't have strict requirements on >> the size of its pages. So they are given 4K pages by the kernel. >> The kernel is able to improve application performance by either >> >> 1) giving applications 2M pages to begin with, or >> 2) collapsing 4K pages into 2M pages when possible. >> >> This collapsing operation is done by khugepaged, a kernel agent that >> is constantly scanning memory. When collapsing 4K pages into a 2M page, >> it must copy the data from the 4K pages into a physically contiguous >> 2M page. Therefore, as long as there exists one poisoned cache line in >> collapsible 4K pages, khugepaged will eventually access it. The current >> impact to users is a machine check exception triggered kernel panic. >> However, khugepaged=E2=80=99s compaction operations are not functionally= required >> kernel actions. Therefore making khugepaged tolerant to poisoned memory >> will greatly improve user experience. >> >> This patch series is for cases where khugepaged is the first guy >> that detects the memory errors on the poisoned pages. IOW, the pages >> are not known to have memory errors when khugepaged collapsing gets to >> them. In our observation, this happens frequently when the huge page >> ratio of the system is relatively low, which is fairly common in >> virtual machines running on cloud. >> >> Solution >> =3D=3D=3D=3D=3D=3D=3D=3D >> As stated before, it is less desirable to crash the system only because >> khugepaged accesses poisoned pages while it is collapsing 4K pages. >> The high level idea of this patch series is to skip the group of pages >> (usually 512 4K-size pages) once khugepaged finds one of them is poisone= d, >> as these pages have become ineligible to be collapsed. >> >> We are also careful to unwind operations khuagepaged has performed befor= e >> it detects memory failures. For example, before copying and collapsing >> a group of anonymous pages into a huge page, the source pages will be >> isolated and their page table is unlinked from their PMD. These operatio= ns >> need to be undone in order to ensure these pages are not changed/lost fr= om >> the perspective of other threads (both user and kernel space). As for >> file backed memory pages, there already exists a rollback case. This >> patch just extends it so that khugepaged also correctly rolls back when >> it fails to copy poisoned 4K pages. >> >> Changelog >> =3D=3D=3D=3D=3D=3D=3D=3D=3D >> v12 changes >> - Incorporate feedbacks from Shi Yang . >> - Drop unused pmd from __collapse_huge_page_copy_succeeded. >> - Drop unused address from __collapse_huge_page_copy_failed. >> - smp_mb() should be after filemap_nr_thps_dec. >> - This revision is rebased to mm-unstable at commit 9b175ce664d33 >> ("mm: move free_area_empty() to mm/internal.h") >> >> v11 changes >> - Incorporate feedbacks from Shi Yang and Hugh >> Dickins >> - Replace releasing pages for-loop with release_pte_pages in >> __collapse_huge_page_copy_failed. >> - Rename pte_ptl to ptl in __collapse_huge_page_copy_succeeded. >> - Fix a bug in __collapse_huge_page_copy_succeeded: ptep_clear should be >> used instead of pte_clear. >> - Drop _address in __collapse_huge_page_copy_succeeded. >> - Add smp_mb() before updating filemap_nr_thps_dec. >> - Move `nr =3D thp_nr_pages()` closer to its references. >> - Remove an unnecessary goto statement. >> - This revision is rebased to mm-unstable at commit b4e1277ee31db >> ("xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text") >> >> v10 changes >> - Incorporate feedbacks from Kirill A. Shutemov >> >> - Refactor the 2nd loop (after the loop for copying memory) into 2 helpe= r >> functions, one for actions to take when copying succeeded, one for whe= n >> copying failed due to #MC. >> - Use copy_mc_user_highpage for anonymous memory. >> - Introduce copy_mc_highpage and use it for file-backed memory. >> - Rename the original PMD from `rollback` to `orig_pmd`. >> - Some minor changes in comments, e.g. `normal page` to `raw page`. >> - This revision is rebased to mm-unstable at commit df3ae4347aff9 >> ("dma-buf: system_heap: avoid reclaim for order 4") >> >> v9 changes >> - Incorporate feedback from Andrew Morton >> - Move copy_mc_highpage into khugepage.c as a static out-of-line >> function copy_mc_page. >> >> v8 changes >> - Incorporate feedbacks from Tony Luck >> - Rename copy_highpage_mc to copy_mc_highpage. >> - Update copy_mc_highpage with kmsan changes. >> - Code style changes: >> 1) copy_mc_highpage returns int as "copy" is an action and is consiste= nt >> with copy_mc_user_highpage. >> 2) __collapse_huge_page_copy returns scan_result(int) and is consisten= t >> with __collapse_huge_page_isolate/swapin. >> 3) variables are declared in separate lines in collapse_file. >> >> v7 changes >> - Fix a bug "KASAN: stack-out-of-bounds Read in collapse_file". After >> copying all pages into the huge page, clear_highpage should use index >> instead of page->index. >> >> v6 changes >> - Address comments from Kirill Shutemov >> - Rewrite __collapse_huge_page_copy to make rollback operations more >> clear to its reader. >> - Add detailed test steps in each commit message. >> >> v5 changes >> - Rebase patches to mm-unstable at >> commit ffb39098bf87 ("Merge tag 'linux-kselftest-kunit-6.1-rc1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"). >> - Resolves conflicts with: >> commit 2f55f070e5b8 ("mm/khugepaged: minor cleanup for collapse_file") >> commit 1baec203b77c ("mm/khugepaged: try to free transhuge swapcache >> when possible") >> >> v4 changes >> - Incorporate feedbacks from Yang Shi >> - Remove tracepoint for __collapse_huge_page_copy, just keep SCAN_COPY_M= C >> and let trace_mm_collapse_huge_page it >> - Remove unnecessary comments >> >> v3 changes >> - Incorporate feedbacks from Yang Shi >> - Add tracepoint for __collapse_huge_page_copy >> - Restore PMD in collapse_huge_page >> - Correct comment about mmap_read_lock >> >> v2 changes >> - Incorporate feedbacks from Yang Shi >> - Only keep copy_highpage_mc >> - Adding new scan_result SCAN_COPY_MC >> - Defer NR_FILE_THPS update until copying succeeded >> >> Jiaqi Yan (3): >> mm/khugepaged: recover from poisoned anonymous memory >> mm/hwpoison: introduce copy_mc_highpage >> mm/khugepaged: recover from poisoned file-backed memory >> >> include/linux/highmem.h | 54 ++++++-- >> include/trace/events/huge_memory.h | 3 +- >> mm/khugepaged.c | 200 ++++++++++++++++++++++------- >> 3 files changed, 198 insertions(+), 59 deletions(-) >> >> -- >> 2.40.0.348.gf938b09366-goog >>