From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F10F6C7618D for ; Thu, 6 Apr 2023 18:12:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36B596B0071; Thu, 6 Apr 2023 14:12:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 342056B0074; Thu, 6 Apr 2023 14:12:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20AC66B0075; Thu, 6 Apr 2023 14:12:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 129AA6B0071 for ; Thu, 6 Apr 2023 14:12:30 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C89A212104A for ; Thu, 6 Apr 2023 18:12:29 +0000 (UTC) X-FDA: 80651761218.07.9921DB2 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf21.hostedemail.com (Postfix) with ESMTP id 0154B1C0023 for ; Thu, 6 Apr 2023 18:12:26 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=a2oi4xcT; spf=pass (imf21.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680804747; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=noIZ6gyggdCcV9Jf+A4HJbHZWTCeI9+D/loIw6bx8Kg=; b=qHQ7RlHbsvEcSTdQ/M1FInznK0DMsjoag9mbCcXX+1lW3BQXAa/l43S6AGH8PdzB5IgGbK aJ9XNd9gVB30fajNHKh3FS4uUAWUix92OXUSaZZNdGZ3xhBGcu8lfxencfHS2kMa4zA9sj bPJ67hXZKX/1GPrCHJd34rrYh0k1QNI= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=a2oi4xcT; spf=pass (imf21.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680804747; a=rsa-sha256; cv=none; b=Ot9iEijHITOKF08CNDYPOVYhME2WQekte2jTv3UWFEX42ZJCaQxfSETeIwwFcaF6k8s8qd gijSVbZP/fEwSJchyW99/ByJnr60LmglQMH0qySBP29jkbN0Lou2DGgM9Oh7EwsuCV8ORq TbqXnZb52mLQE7R7bZYY6msfXAT0SQ8= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-5491fa028adso258521817b3.10 for ; Thu, 06 Apr 2023 11:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680804746; x=1683396746; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=noIZ6gyggdCcV9Jf+A4HJbHZWTCeI9+D/loIw6bx8Kg=; b=a2oi4xcT1zKJ89SCiZJnhab7/83NGvlsmW/bhqkpum0Ouuc0eFYjWUOCrF+vZBKU0n Yene7xj8twQxZGmkVb383+FlfYqC8hZPj7/HRJjCxQ782G+JrSuso3qyBNKwDJNLbIhg 0RC7AYaH1i6nF+sSBRyefQI4MXNaBNBOn1vMuDH2PD+Z7khv36L9/p/OAOf6ZJ0LXP/E 6rQvFtX6Vad9dLagFA//ycCsSe8cQEwqOWv02Htnm1+uETe6rj9FKg7Xw0DRmN4xNnSa XaCSCQ2KEgNbdym5CqquKn8j6S4TiglxQioFA8r4ds/V75I4Zwo/H/aboEW/Mlg43Fek Mh1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680804746; x=1683396746; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=noIZ6gyggdCcV9Jf+A4HJbHZWTCeI9+D/loIw6bx8Kg=; b=azEVaTLrB1FcT2YbS+PZIuBELDyBpTpXVy3p5GCOjuZKcDp3cBFdGaoVnZLFFTyEUQ /ZXnCYC1K/eabL25jFZIgepxzY0BGJFGj/J8m98FbNpiwpeYtJJq7Ci5VHTc0KrBn/qJ fp9lbThyLYITluMxnix8saXksu5Gikq/ugQ9J4c340wrXWAwRkILdm9nDVGkNTQcE4tl LlfZEdAOnXtbSLeJnMMfTqRknzfCdmpZHkObHhpbeYtWny59w2E2a991q0Hye1lNIPMR nLHgxuGdPZgx+pGSyme+CNYMwjzDbk3WNFa4FtTdfi5UsF3mS3FIUnomJR/jTJUuXod2 AFOQ== X-Gm-Message-State: AAQBX9e0FH34bTtASOI8jZpsnFP8/7XDDadXN6vawwmaPyFF42b+I4IT qUlC1TmrO2HVn0sGXWxvoKwnPnQi2+JjaHTHtQVe/Q== X-Google-Smtp-Source: AKy350aOkI9py7bvnzHXT6T9+jzF/UCEK/zpHCdLjBdwz+BPHK0FIGGKEhjXC30Ah7cIzXgXjWud3m/cqGABqTpJk4Q= X-Received: by 2002:a81:ac1a:0:b0:54c:cd:f38d with SMTP id k26-20020a81ac1a000000b0054c00cdf38dmr2952015ywh.10.1680804745762; Thu, 06 Apr 2023 11:12:25 -0700 (PDT) MIME-Version: 1.0 References: <20230329151121.949896-1-jiaqiyan@google.com> In-Reply-To: From: Jiaqi Yan Date: Thu, 6 Apr 2023 11:12:14 -0700 Message-ID: Subject: Re: [PATCH v12 0/3] Memory poison recovery in khugepaged collapsing To: Yang Shi , akpm@linux-foundation.org Cc: kirill.shutemov@linux.intel.com, kirill@shutemov.name, tongtiangen@huawei.com, tony.luck@intel.com, naoya.horiguchi@nec.com, linmiaohe@huawei.com, linux-mm@kvack.org, osalvador@suse.de, wangkefeng.wang@huawei.com, stevensd@chromium.org, hughd@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0154B1C0023 X-Stat-Signature: 7mqdqpk4wzzito6udyp6qirkcs8f41wu X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680804746-497320 X-HE-Meta: U2FsdGVkX18Dj7QSxGimuSqu+t3KayH2UIfoPv4VYxhWLmRQ1JWANvQYwS9WV9rPiZ9deKOQ1hmEDaZ5OzeC7IpKpc/D1/FV65NmX/9QZTf9WUynXS1u8wrSO7NkanKX8Y5XD7pjlrHnuYkEImPT/YynEfKT9h9nCqv03zs3h8LGh93oTK2p808bC3nf8EQeVRz+c82qWxkCGxITptHIbmXK0qIuRqiamO6Uwg4DQAuj1xfVsPaSpUEIGU7nxQR1rEpll3bOXu5SbtLP352s3eTNjcOW0DFCj5J6/z2w5ycKYBZS+hDPnPtabdRM/mlnfJX2m6L9R9KzQi58RCLjEcxAjwT7whNj2gwLgZpX5zE5d2KlG4FncjZ7o9/4yru41f1IB2l1T98SXwbJ0l4GdYTGHhmRUlaAaOgIGTudwsucakHEzQF0dLEZtohDacubD2KSwhE+VaARAfaqsWmptaitxvEi3t1t1jfTHVl4ywo5nBrqq7SPtRvI0FDyJ16j7jI0UheqG77ej/dMLZcPFDI3CjR5SAAJbVmPtmdWLc9+tOrO2hlb0NBxlwMvngt+gMx6t6u+pKyvGoYwfx7O81B3fLZWgTD39tsdFLJz7TtPg3J+yPsPaoTrFvP1CVF445z2SkDRunk7QBDMHGc4AqRXEnswHuS+wHnlsILStF+qGmRF5BxcM1Oq2oarRx/x3ndjb0KmFngxVB/uYbzgx7RYn6LaVCmIm6LcZS8WGS77OpaXsOZeQGiUMb+d8fxHpukQQvzzsePRzkZ0GkE2/hCoaibxwf8lplHrjUHU1BtHTwdZxSrkGRP8kOR4U1wPjhavJY/KdqB7Er3b24FEp4gY6qoYkjzWL+aj0DITKfioSWku/efQCJ1ZTQ77SGsMGsTN+7h4mKJMydRW7+WLetPlk77lm3A/3MAatmi6pwSB1f14suS32vorZkDdZun/e31fu8pH9DHF8zxtB6Z 4UHSeDYy OyWtimdnVaLuNOvJkMtBpfnN2EI1ClnVL5edXI8v1I6pLXfBl/SFwCsN1XvgdMoCafm6wm6troNqV0FSMmWFx1HZd8oDEYFuHSfofQC6oH2GPm+AxSSaeP1Evhs1/ysLX7+tgqA0ooEMZ7joIYMDFPPR+Z4xn19SJNiOx0LGqEW4jGKqjXo6sEDTbXMbsHqTbloq5KFZSWplTv5JXoEJCXIZOBOp2YuhvQHkk1v+SC6iGQwCpVo+WunX8NO3RifNjwqS29eenOjberjYBaNhKuE0IhhkYeCAu6rhjxvUeJ2njCRbmS9iMVlVY0KVsfX8F5stfVkYzvWezLscPdN86kGYWFy9xEamq/AWIKGzcHVHqqDr8d6WH09vIM3lupr72jj/xHAKVmoIAoqWG1FH5D5JbSyG2aP2gs4FFrPDxFz85YYsH0BPV2BZTFA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 4, 2023 at 8:57=E2=80=AFPM Yang Shi wrote= : > > On Tue, Apr 4, 2023 at 11:44=E2=80=AFAM Jiaqi Yan w= rote: > > > > Friendly ping for review :) > > Both I and Hugh already gave reviewed/acked for the previous version. > Since there were just some minor changes so you could keep the > reviewed/acked from the previous version. Thanks Yang! Andrew, is there still anything I need to do at this point (e.g. resent V12 with reviewed/acked tags in commits)? Or are you fine with this V12 to be merged? > > > > > On Wed, Mar 29, 2023 at 8:11=E2=80=AFAM Jiaqi Yan = wrote: > >> > >> Problem > >> =3D=3D=3D=3D=3D=3D=3D > >> Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > >> As memory size and density increase, the chances of and number of > >> memory errors increase. The increasing size and density of server > >> RAM in the data center and cloud have shown increased uncorrectable > >> memory errors. There are already mechanisms in the kernel to recover > >> from uncorrectable memory errors. This series of patches provides > >> the recovery mechanism for the particular kernel agent khugepaged > >> when it collapses memory pages. > >> > >> Impact > >> =3D=3D=3D=3D=3D=3D > >> The main reason we chose to make khugepaged collapsing tolerant of > >> memory failures was its high possibility of accessing poisoned memory > >> while performing functionally optional compaction actions. > >> Standard applications typically don't have strict requirements on > >> the size of its pages. So they are given 4K pages by the kernel. > >> The kernel is able to improve application performance by either > >> > >> 1) giving applications 2M pages to begin with, or > >> 2) collapsing 4K pages into 2M pages when possible. > >> > >> This collapsing operation is done by khugepaged, a kernel agent that > >> is constantly scanning memory. When collapsing 4K pages into a 2M page= , > >> it must copy the data from the 4K pages into a physically contiguous > >> 2M page. Therefore, as long as there exists one poisoned cache line in > >> collapsible 4K pages, khugepaged will eventually access it. The curren= t > >> impact to users is a machine check exception triggered kernel panic. > >> However, khugepaged=E2=80=99s compaction operations are not functional= ly required > >> kernel actions. Therefore making khugepaged tolerant to poisoned memor= y > >> will greatly improve user experience. > >> > >> This patch series is for cases where khugepaged is the first guy > >> that detects the memory errors on the poisoned pages. IOW, the pages > >> are not known to have memory errors when khugepaged collapsing gets to > >> them. In our observation, this happens frequently when the huge page > >> ratio of the system is relatively low, which is fairly common in > >> virtual machines running on cloud. > >> > >> Solution > >> =3D=3D=3D=3D=3D=3D=3D=3D > >> As stated before, it is less desirable to crash the system only becaus= e > >> khugepaged accesses poisoned pages while it is collapsing 4K pages. > >> The high level idea of this patch series is to skip the group of pages > >> (usually 512 4K-size pages) once khugepaged finds one of them is poiso= ned, > >> as these pages have become ineligible to be collapsed. > >> > >> We are also careful to unwind operations khuagepaged has performed bef= ore > >> it detects memory failures. For example, before copying and collapsing > >> a group of anonymous pages into a huge page, the source pages will be > >> isolated and their page table is unlinked from their PMD. These operat= ions > >> need to be undone in order to ensure these pages are not changed/lost = from > >> the perspective of other threads (both user and kernel space). As for > >> file backed memory pages, there already exists a rollback case. This > >> patch just extends it so that khugepaged also correctly rolls back whe= n > >> it fails to copy poisoned 4K pages. > >> > >> Changelog > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D > >> v12 changes > >> - Incorporate feedbacks from Shi Yang . > >> - Drop unused pmd from __collapse_huge_page_copy_succeeded. > >> - Drop unused address from __collapse_huge_page_copy_failed. > >> - smp_mb() should be after filemap_nr_thps_dec. > >> - This revision is rebased to mm-unstable at commit 9b175ce664d33 > >> ("mm: move free_area_empty() to mm/internal.h") > >> > >> v11 changes > >> - Incorporate feedbacks from Shi Yang and Hugh > >> Dickins > >> - Replace releasing pages for-loop with release_pte_pages in > >> __collapse_huge_page_copy_failed. > >> - Rename pte_ptl to ptl in __collapse_huge_page_copy_succeeded. > >> - Fix a bug in __collapse_huge_page_copy_succeeded: ptep_clear should = be > >> used instead of pte_clear. > >> - Drop _address in __collapse_huge_page_copy_succeeded. > >> - Add smp_mb() before updating filemap_nr_thps_dec. > >> - Move `nr =3D thp_nr_pages()` closer to its references. > >> - Remove an unnecessary goto statement. > >> - This revision is rebased to mm-unstable at commit b4e1277ee31db > >> ("xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text") > >> > >> v10 changes > >> - Incorporate feedbacks from Kirill A. Shutemov > >> > >> - Refactor the 2nd loop (after the loop for copying memory) into 2 hel= per > >> functions, one for actions to take when copying succeeded, one for w= hen > >> copying failed due to #MC. > >> - Use copy_mc_user_highpage for anonymous memory. > >> - Introduce copy_mc_highpage and use it for file-backed memory. > >> - Rename the original PMD from `rollback` to `orig_pmd`. > >> - Some minor changes in comments, e.g. `normal page` to `raw page`. > >> - This revision is rebased to mm-unstable at commit df3ae4347aff9 > >> ("dma-buf: system_heap: avoid reclaim for order 4") > >> > >> v9 changes > >> - Incorporate feedback from Andrew Morton > >> - Move copy_mc_highpage into khugepage.c as a static out-of-line > >> function copy_mc_page. > >> > >> v8 changes > >> - Incorporate feedbacks from Tony Luck > >> - Rename copy_highpage_mc to copy_mc_highpage. > >> - Update copy_mc_highpage with kmsan changes. > >> - Code style changes: > >> 1) copy_mc_highpage returns int as "copy" is an action and is consis= tent > >> with copy_mc_user_highpage. > >> 2) __collapse_huge_page_copy returns scan_result(int) and is consist= ent > >> with __collapse_huge_page_isolate/swapin. > >> 3) variables are declared in separate lines in collapse_file. > >> > >> v7 changes > >> - Fix a bug "KASAN: stack-out-of-bounds Read in collapse_file". After > >> copying all pages into the huge page, clear_highpage should use inde= x > >> instead of page->index. > >> > >> v6 changes > >> - Address comments from Kirill Shutemov > >> - Rewrite __collapse_huge_page_copy to make rollback operations more > >> clear to its reader. > >> - Add detailed test steps in each commit message. > >> > >> v5 changes > >> - Rebase patches to mm-unstable at > >> commit ffb39098bf87 ("Merge tag 'linux-kselftest-kunit-6.1-rc1' of > >> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"= ). > >> - Resolves conflicts with: > >> commit 2f55f070e5b8 ("mm/khugepaged: minor cleanup for collapse_file= ") > >> commit 1baec203b77c ("mm/khugepaged: try to free transhuge swapcache > >> when possible") > >> > >> v4 changes > >> - Incorporate feedbacks from Yang Shi > >> - Remove tracepoint for __collapse_huge_page_copy, just keep SCAN_COPY= _MC > >> and let trace_mm_collapse_huge_page it > >> - Remove unnecessary comments > >> > >> v3 changes > >> - Incorporate feedbacks from Yang Shi > >> - Add tracepoint for __collapse_huge_page_copy > >> - Restore PMD in collapse_huge_page > >> - Correct comment about mmap_read_lock > >> > >> v2 changes > >> - Incorporate feedbacks from Yang Shi > >> - Only keep copy_highpage_mc > >> - Adding new scan_result SCAN_COPY_MC > >> - Defer NR_FILE_THPS update until copying succeeded > >> > >> Jiaqi Yan (3): > >> mm/khugepaged: recover from poisoned anonymous memory > >> mm/hwpoison: introduce copy_mc_highpage > >> mm/khugepaged: recover from poisoned file-backed memory > >> > >> include/linux/highmem.h | 54 ++++++-- > >> include/trace/events/huge_memory.h | 3 +- > >> mm/khugepaged.c | 200 ++++++++++++++++++++++------= - > >> 3 files changed, 198 insertions(+), 59 deletions(-) > >> > >> -- > >> 2.40.0.348.gf938b09366-goog > >>