From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF410C433EF for ; Fri, 25 Mar 2022 02:51:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A8816B0071; Thu, 24 Mar 2022 22:51:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 230A36B0073; Thu, 24 Mar 2022 22:51:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 082E06B0074; Thu, 24 Mar 2022 22:51:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id E6BB16B0071 for ; Thu, 24 Mar 2022 22:51:12 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 98B988249980 for ; Fri, 25 Mar 2022 02:51:12 +0000 (UTC) X-FDA: 79281381984.20.078CF11 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf16.hostedemail.com (Postfix) with ESMTP id 2096B180029 for ; Fri, 25 Mar 2022 02:51:11 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id w7so2737117pfu.11 for ; Thu, 24 Mar 2022 19:51:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=H+8E8C2AWh6ic5JlzwrVzG+3cDCsca9RP/Uet99gFrI=; b=I3DVhc1k3GmUpavD1Vvgx56URkfZ/RjTdGmDqmuEc9Qop2XPZYgf642y0D+B2BiQD8 ORn63PKqAU9KkPtEUkm4ulIPbzHaUoxw5mx/ClP/cnwsrROWTyYh1tkoWemKecYQez6h 7BGFR2IVbuDnAh7QqSvuYCm2wgfNfZgnTcucmxjciP+PiVKackTTaHT2P36UFIkyJrbs SQ8fuxTsjTJISEJivKldeLaI7/k7XNhGRIJr+gDy/3G9kJTBKgRxJBBCE01F23bdW6zf aG4n3usmgVCmH71NIO5D/DEq2Gf65WodoTSvTsrR+Otfg5/tKZd3Usr4LA5pCFc7GFhf I48g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=H+8E8C2AWh6ic5JlzwrVzG+3cDCsca9RP/Uet99gFrI=; b=lobzuJm5TjNTzmEN57tc3ZsoQQqNXNSjTq4zHLoWbUChG7NqHkK+bq4IAkN3n0nqRk h1UBE16FN0SZWy8e6I1d/nRQ9ZyCZdl1KFgZBmH8KZ9KU+szeS0kuvZvlH1NfuxNNj58 pd2vJO7R8cFaie1GHR51we6Wrb5iQd1clrfPfEIIpIHuxajHbEudZZty3UxmO66YTky6 4fWLVf7TWY3Jrt0+XZyg4kkmTEDWhwumA/lwUYKGbb2MfZ2JFQQuMPEEjYhyNZx8U5bt CeJtvr502wBFITzQXpqDlLr81KdA0vVBCdbn/VVFbabsZjVJ5V6DP2o5EC1n9IbsZRqF 5azg== X-Gm-Message-State: AOAM530nozQsNFQe8Rje+55YuSOFQ0ennG06Yp1ha5srQaSHpvqIz3PS SZkgbZ1om0+03IMa/Pofwg3S3eWb2ff2JSF9A4g= X-Google-Smtp-Source: ABdhPJzHUFsduGrBXRviMUvcOJ2aT+njnGgMRPZrdojOk8c6qxkEGJlvAaIjliXkXq13atRTdTYXMZSgOIM/zh/JsW8= X-Received: by 2002:a62:5583:0:b0:4fa:c74c:7eaa with SMTP id j125-20020a625583000000b004fac74c7eaamr8222387pfb.23.1648176671065; Thu, 24 Mar 2022 19:51:11 -0700 (PDT) MIME-Version: 1.0 References: <20220323232929.3035443-1-jiaqiyan@google.com> In-Reply-To: <20220323232929.3035443-1-jiaqiyan@google.com> From: Yang Shi Date: Thu, 24 Mar 2022 19:50:59 -0700 Message-ID: Subject: Re: [RFC v1 0/2] Memory poison recovery in khugepaged To: Jiaqi Yan Cc: Tony Luck , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Kirill A. Shutemov" , Miaohe Lin , Jue Wang , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2096B180029 X-Stat-Signature: mjeh1udpesghpk1d75kw9iiaqb57rb1d X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=I3DVhc1k; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-HE-Tag: 1648176671-415363 X-Bogosity: Ham, tests=bogofilter, spamicity=0.086243, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 23, 2022 at 4:29 PM Jiaqi Yan wrote: > > Problem > =3D=3D=3D=3D=3D=3D=3D > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > As memory size and density increase, the chances of and number of > memory errors increase. The increasing size and density of server > RAM in the data center and cloud have shown increased uncorrectable > memory errors. There are already mechanisms in the kernel to recover > from uncorrectable memory errors. This series of patches provides > the recovery mechanism for the particular kernel agent khugepaged. > > Impact > =3D=3D=3D=3D=3D=3D > The main reason we chose to make khugepaged tolerant of memory failures > was its high possibility of accessing poisoned memory while performing > functionally optional compaction actions. Standard applications > typically don't have strict requirements on the size of its pages. > So they are given 4K pages by the kernel. The kernel is able to improve > application performance by either 1) giving application 2M pages > to begin with, or 2) collapsing 4K pages into 2M pages when possible. > This collapsing operation is done by khugepaged, a kernel agent that > is constantly scanning memory. When collapsing 4K pages into a 2M page, > it must copy the data from the 4K pages into a physically contiguous > 2M page. Therefore, as long as there exists one poisoned cache line in > collapsible 4K pages, khugepaged will eventually access it. The current > impact to users is a machine check exception triggered kernel panic. > However, khugepaged=E2=80=99s compaction operations are not functionally = required > kernel actions. Therefore making khugepaged tolerant to poisoned memory > will greatly improve user experience. > > Solution > =3D=3D=3D=3D=3D=3D=3D=3D > As stated before, it is less desirable to crash the system only because > khugepaged accesses poisoned pages while it is collapsing 4K pages. > The high level idea of this patch series is to skip the group of pages > (usually 512 4K-size pages) once khugepaged finds one of them is poisoned= , > as these pages have become ineligible to be collapsed. > > We are also careful to unwind operations khuagepaged has performed before > it detects memory failures. For example, before copying and collapsing > a group of anonymous pages into a huge page, the source pages will be > isolated and their page table is unlinked from their PMD. These operation= s > need to be undone in order to ensure these pages are not changed/lost fro= m > the perspective of other threads (both user and kernel space). As for > file backed memory pages, there already exists a rollback case. This > patch just extends it so that khugepaged also correctly rolls back when > it fails to copy poisoned 4K pages. Actually I should asked the question in the first place before diving into the implementation details, if uncorrectable memory error happens, kernel will pin the poisoned page and set hwpoison flag, the bumped page refcount would prevent the page from being collapsed IIUC. So I'm wondering why we need this? > > Jiaqi Yan (2): > mm: khugepaged: recover from poisoned anonymous memory > mm: khugepaged: recover from poisoned file-backed memory > > include/linux/highmem.h | 37 +++++++ > mm/khugepaged.c | 211 +++++++++++++++++++++++++++++----------- > 2 files changed, 189 insertions(+), 59 deletions(-) > > -- > 2.35.1.894.gb6a874cedc-goog >