From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 502E7C433EF for ; Thu, 9 Sep 2021 23:07:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CCC12610FF for ; Thu, 9 Sep 2021 23:07:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CCC12610FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 11FEE6B0071; Thu, 9 Sep 2021 19:07:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CF4F6B0072; Thu, 9 Sep 2021 19:07:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB2F1900002; Thu, 9 Sep 2021 19:07:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DAE676B0071 for ; Thu, 9 Sep 2021 19:07:50 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 95D5430178 for ; Thu, 9 Sep 2021 23:07:50 +0000 (UTC) X-FDA: 78569574300.38.BAF7A00 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf16.hostedemail.com (Postfix) with ESMTP id 58B61F00008C for ; Thu, 9 Sep 2021 23:07:50 +0000 (UTC) Received: by mail-ej1-f42.google.com with SMTP id me10so260763ejb.11 for ; Thu, 09 Sep 2021 16:07:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=cfCV5eFbiqZRH3LUtu99b7bbD7QaaPcg2/PD3s5mWgk=; b=dHkJB+W08MV/ijfbJPo2wmU1tsachOY9Ua/COaltsFgNCiAIqil+TZgsDKoPmEk7Pj kkX5wb6zhlduoom/+erwZUOejqRs5NWJcW8/3Jtlcm5pQ15a+3FoFJWXk2jW31R3Wocn jIo1fh0JHBlrDx9iY6LT1JvH45UEp/Wp7Z75o8MuiOko8JOrV1W6wwoxcPLPzk9jkyHT fTuKXMHXDliTpZnuuOXxqPx0LgVSjDf2+X4sGOJT92xNW397slxZuTruznC9ifCDpTL4 8ZKuMPDvbxhFTaQ+p6jkIZAd57Jfk3hygcTTKbPc846TIsQ2XF1ks9cYpyR4GpuNhbNS UQLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=cfCV5eFbiqZRH3LUtu99b7bbD7QaaPcg2/PD3s5mWgk=; b=WevAMnZOym3acVcgyKBvQS5hT5b+3JusLkbUbWjDpr1BaUQGs7iDjHTVj+62BsPByb r4fLIJilCuzIqtnIdjb2xWMQYmXlG+HY+uyVnNWsK7kZo2YMYet7Y7lAfDH1yYEocV3T eodK+HZMS0AsE4PBNLHMNuOuCr8QEuZnEngm79aNXCOn2kJ0CYvYPRhfG+D9LWzrcdR8 6/vEvzcvUiM2isye/Zj8z3DY13jvH5VgecXUMcYJOT2iOr0aV909WR8rWSXmjBDtfQ8N Rho4s8KnP79OlvnP7mbYm6VwRWGc5nsUmoBxg9OjTSb3ftn5mmDuWsS8A8Xu5aHBbHpR cfKw== X-Gm-Message-State: AOAM532kFsZ4Ou85J3jaAaoCrpJZoH1NJlLZ7u26Xq7kTsolpWeFb5Vd A+yNW+c4pM4RKKLzPJa97FKjBZk234Fze2gAZkY= X-Google-Smtp-Source: ABdhPJxZ0vzxJVvPI+BCUBZlKwZw3IpYsDTcEYPRP9odJG7gteluzPJT0PP1htsGTQSsZpUH74iiqC7z6hYyIrtPP1M= X-Received: by 2002:a17:906:2556:: with SMTP id j22mr6289681ejb.233.1631228869186; Thu, 09 Sep 2021 16:07:49 -0700 (PDT) MIME-Version: 1.0 References: <20210902030728.GA1860112@hori.linux.bs1.fc.nec.co.jp> <20210903115311.GA2477773@hori.linux.bs1.fc.nec.co.jp> <20210908025024.GA4117799@hori.linux.bs1.fc.nec.co.jp> <20210908042547.GA11650@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: <20210908042547.GA11650@hori.linux.bs1.fc.nec.co.jp> From: Yang Shi Date: Thu, 9 Sep 2021 16:07:37 -0700 Message-ID: Subject: Re: [PATCH] mm: hwpoison: deal with page cache THP To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= Cc: "osalvador@suse.de" , "hughd@google.com" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 58B61F00008C Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dHkJB+W0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Stat-Signature: h1dyff8gxtjtbnhtzuayspy77tdknqwg X-HE-Tag: 1631228870-252488 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 7, 2021 at 9:25 PM HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80= =E7=9B=B4=E4=B9=9F) wrote: > > On Tue, Sep 07, 2021 at 08:14:26PM -0700, Yang Shi wrote: > ... > > > > > > > > 5. We also could define a new FGP flag to return poisoned p= age, NULL > > > > > > > > or error pointer. This also should need significant code ch= ange since > > > > > > > > a lt callsites need to be contemplated. > > > > > > > > > > > > > > Could you explain a little more about which callers should us= e the flag? > > > > > > > > > > > > Just to solve the above invalidate/truncate problem and page fa= ult > > > > > > doesn't expect an error pointer. But it seems the above > > > > > > invalidate/truncate paths don't matter. Page fault should be th= e only > > > > > > user since page fault may need unlock the page if poisoned page= is > > > > > > returned. > > > > > > Orignally PG_hwpoison is detected in page fault handler for file-back= ed pages > > > like below, > > > > > > static vm_fault_t __do_fault(struct vm_fault *vmf) > > > ... > > > ret =3D vma->vm_ops->fault(vmf); > > > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_F= AULT_RETRY | > > > VM_FAULT_DONE_COW))) > > > return ret; > > > > > > if (unlikely(PageHWPoison(vmf->page))) { > > > if (ret & VM_FAULT_LOCKED) > > > unlock_page(vmf->page); > > > put_page(vmf->page); > > > vmf->page =3D NULL; > > > return VM_FAULT_HWPOISON; > > > } > > > > > > so it seems to me that if a filesystem switches to the new scheme of = keeping > > > error pages in page cache, ->fault() callback for the filesystem need= s to be > > > aware of hwpoisoned pages inside it and returns VM_FAULT_HWPOISON whe= n it > > > detects an error page (maybe by detecting pagecache_get_page() to ret= urn > > > PTR_ERR(-EIO or -EHWPOISON) or some other ways). In the other filesy= stems > > > with the old scheme, fault() callback does not return VM_FAULT_HWPOIS= ON so > > > that page fault handler falls back to the generic PageHWPoison check = above. > > > > Yes, exactly. If we return ERR_PTR we need modify the above code > > accordingly. But returning the page, no change is required. > > > > > > > > > > > > > > > It seems page fault check IS_ERR(page) then just return > > > > > VM_FAULT_HWPOISON. But I found a couple of places in shmem which = want > > > > > to return head page then handle subpage or just return the page b= ut > > > > > don't care the content of the page. They should ignore hwpoison. = So I > > > > > guess we'd better to have a FGP flag for such cases. > > > > > > At least currently thp are supposed to be split before error handling= . > > > > Not for file THP. > > > > > We could loosen this assumption to support hwpoison on a subpage of t= hp, > > > but I'm not sure whether that has some benefit. > > > > I don't quite get your point. Currently the hwpoison flag is set on > > specific subpage instead of head page. > > Sorry, I miss the case when thp split fails, then the thp has hwpoison > flag set on any subpage. I only thought of the successful case, where th= e > resulting error page is no longer a subpage. > > > > > > And also this discussion makes me aware that we need to have a way to > > > prevent hwpoisoned pages from being used to thp collapse operation. > > > > Actually not. The refcount from hwpoison could prevent it from > > collapsing. But if we return ERR_PTR (#3) we need extra code in > > khugepaged to handle it. > > OK, so we already prevent it. I just realized I'm wrong for the refcount. The poisoned page has been isolated from LRU before khugepaged finds it. So the refcount from isolation is not incremented at all. The refcount seen by khugepaged is 3: hwpoison, page cache and khugepaged. It actually meets the expectation. The khugepaged does check if the page is a LRU page or not but without holding page lock, so it may race with hwpoison. After holding page lock, it doesn't check LRU flag anymore. So I think we need add the check. > > Thanks, > Naoya Horiguchi > > > > > So I realized #1 would require fewer changes. And leaving the page > > state check to callers seem reasonable since the callers usually check > > other states, e.g. uptodate.