From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4D9DC433EF for ; Fri, 25 Mar 2022 21:42:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E74C76B0071; Fri, 25 Mar 2022 17:42:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E22528D0001; Fri, 25 Mar 2022 17:42:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC4EB6B0074; Fri, 25 Mar 2022 17:42:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id BE9E26B0071 for ; Fri, 25 Mar 2022 17:42:32 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 953F3120CD4 for ; Fri, 25 Mar 2022 21:42:32 +0000 (UTC) X-FDA: 79284232944.11.1DE9343 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf01.hostedemail.com (Postfix) with ESMTP id 12EE240031 for ; Fri, 25 Mar 2022 21:42:31 +0000 (UTC) Received: by mail-pg1-f175.google.com with SMTP id t13so6207580pgn.8 for ; Fri, 25 Mar 2022 14:42:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=grAuY+txJml23Y21AVh2/4bQqvFwYmNEqLrpu6DNuRo=; b=JG/boYAnn60dWU5RLDDtnwe6sHENJCL8iyGqs32xvx6lCfSm4ie6xDuFPRaJSTN7KV BZ4xukSHSRe/schzAxm23g1sdIZCa2xP4eduPniJVl/5jC/XguGFjiLwcmzTkg0aSu/p CqB7m+HsOdqIMUuqfQA/UtJkRKmoQLa8csHd9rvba/rESolxB4cLHjS/La44i/XBkyhp Mv8LKNtC9ZrfckrEZ6MjfSjPkW7GvHoVmECKa5g+vyZqfMqlREcXqEtoJD06XP1eM+f4 CHnjiEEJhnfQSZShTQScFPb0CH8wsETdCo4p5Y6Z7lh8lDhJIHaUh6DTyPJju6lIzCat fozg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=grAuY+txJml23Y21AVh2/4bQqvFwYmNEqLrpu6DNuRo=; b=OmK5h1cKhHc5m2zelIobeyiiBzgHopsDtK8dop/3tGbNAWBYoPY/QDwVSzTjnMDBXr N7mbYpzilFL+BDoUK6XqFh8MGbdkORDE+upDcrWAmN5FbgRUK5tRPufkBHKCXm7riY8S LInNes1K4dFjgycS72QxYBE10JeKG9b7j8pvXB9cKa5clyYfCWLhasZbCfHoAd93PABb rLvEmT/nYGmspUM/QGK1tP9ZwS00Vsiv/f6BNb1S06cgNCupVFV1SDNis3plFK4Wplpv uGlUD6HeoUYEbhLXN+7uTB/LaBl0HDusC4r9XPEv65Yyqohzr3JLcntnHgD2qA6jnNhO CV2A== X-Gm-Message-State: AOAM530+uZOlIGeqNzJTnFADVdrrjg6uLUkXty4NAQ6IBNiChAh8b/oa oKACmo11XNjPSx5LCqn+v0puokggRLDdtAnUlxI= X-Google-Smtp-Source: ABdhPJymCpexQ2JuOOVTse9P1svugMMQg5f2u/NA4UsZfBiZq0qdfDe6mc2xNSe7MRwz7rChOEpaoLH0FUUhJS+KiEg= X-Received: by 2002:a62:5583:0:b0:4fa:c74c:7eaa with SMTP id j125-20020a625583000000b004fac74c7eaamr12155503pfb.23.1648244550937; Fri, 25 Mar 2022 14:42:30 -0700 (PDT) MIME-Version: 1.0 References: <20220323232929.3035443-1-jiaqiyan@google.com> In-Reply-To: From: Yang Shi Date: Fri, 25 Mar 2022 14:42:19 -0700 Message-ID: Subject: Re: [RFC v1 0/2] Memory poison recovery in khugepaged To: Jiaqi Yan Cc: Tony Luck , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Kirill A. Shutemov" , Miaohe Lin , Jue Wang , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 4imdmbyqqc3thnftwcpqgbkp6gojizng Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="JG/boYAn"; spf=pass (imf01.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 12EE240031 X-HE-Tag: 1648244551-515385 X-Bogosity: Ham, tests=bogofilter, spamicity=0.071471, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 25, 2022 at 2:11 PM Jiaqi Yan wrote: > > On Thu, Mar 24, 2022 at 7:51 PM Yang Shi wrote: > > > > On Wed, Mar 23, 2022 at 4:29 PM Jiaqi Yan wrote: > > > > > > Problem > > > =3D=3D=3D=3D=3D=3D=3D > > > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > > > As memory size and density increase, the chances of and number of > > > memory errors increase. The increasing size and density of server > > > RAM in the data center and cloud have shown increased uncorrectable > > > memory errors. There are already mechanisms in the kernel to recover > > > from uncorrectable memory errors. This series of patches provides > > > the recovery mechanism for the particular kernel agent khugepaged. > > > > > > Impact > > > =3D=3D=3D=3D=3D=3D > > > The main reason we chose to make khugepaged tolerant of memory failur= es > > > was its high possibility of accessing poisoned memory while performin= g > > > functionally optional compaction actions. Standard applications > > > typically don't have strict requirements on the size of its pages. > > > So they are given 4K pages by the kernel. The kernel is able to impro= ve > > > application performance by either 1) giving application 2M pages > > > to begin with, or 2) collapsing 4K pages into 2M pages when possible. > > > This collapsing operation is done by khugepaged, a kernel agent that > > > is constantly scanning memory. When collapsing 4K pages into a 2M pag= e, > > > it must copy the data from the 4K pages into a physically contiguous > > > 2M page. Therefore, as long as there exists one poisoned cache line i= n > > > collapsible 4K pages, khugepaged will eventually access it. The curre= nt > > > impact to users is a machine check exception triggered kernel panic. > > > However, khugepaged=E2=80=99s compaction operations are not functiona= lly required > > > kernel actions. Therefore making khugepaged tolerant to poisoned memo= ry > > > will greatly improve user experience. > > > > > > Solution > > > =3D=3D=3D=3D=3D=3D=3D=3D > > > As stated before, it is less desirable to crash the system only becau= se > > > khugepaged accesses poisoned pages while it is collapsing 4K pages. > > > The high level idea of this patch series is to skip the group of page= s > > > (usually 512 4K-size pages) once khugepaged finds one of them is pois= oned, > > > as these pages have become ineligible to be collapsed. > > > > > > We are also careful to unwind operations khuagepaged has performed be= fore > > > it detects memory failures. For example, before copying and collapsin= g > > > a group of anonymous pages into a huge page, the source pages will be > > > isolated and their page table is unlinked from their PMD. These opera= tions > > > need to be undone in order to ensure these pages are not changed/lost= from > > > the perspective of other threads (both user and kernel space). As for > > > file backed memory pages, there already exists a rollback case. This > > > patch just extends it so that khugepaged also correctly rolls back wh= en > > > it fails to copy poisoned 4K pages. > > > > Actually I should asked the question in the first place before diving > > into the implementation details, if uncorrectable memory error > > happens, kernel will pin the poisoned page and set hwpoison flag, the > > bumped page refcount would prevent the page from being collapsed IIUC. > > This patch series is for cases where khugepaged is the first guy that det= ects > the memory errors on these poisoned pages. IOW, the pages are not known t= o > have memory errors when khugepaged collapsing gets to them. > In our observation, this happens frequently when the huge page ratio of > the system is relatively low, which is fairly common in cloud VMs. Thanks, this is the very important information that needs to be caught in the 1st patch's commit log. > > > > > So I'm wondering why we need this? > > > > > > > > Jiaqi Yan (2): > > > mm: khugepaged: recover from poisoned anonymous memory > > > mm: khugepaged: recover from poisoned file-backed memory > > > > > > include/linux/highmem.h | 37 +++++++ > > > mm/khugepaged.c | 211 +++++++++++++++++++++++++++++---------= -- > > > 2 files changed, 189 insertions(+), 59 deletions(-) > > > > > > -- > > > 2.35.1.894.gb6a874cedc-goog > > >