From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AE83C433EF for ; Wed, 15 Sep 2021 23:10:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E0E8610E8 for ; Wed, 15 Sep 2021 23:10:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1E0E8610E8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6C6B96B0071; Wed, 15 Sep 2021 19:10:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6763E6B0072; Wed, 15 Sep 2021 19:10:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EF2E6B0073; Wed, 15 Sep 2021 19:10:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 415AF6B0071 for ; Wed, 15 Sep 2021 19:10:32 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ED2F4181E4378 for ; Wed, 15 Sep 2021 23:10:31 +0000 (UTC) X-FDA: 78591353862.20.D61DD42 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf23.hostedemail.com (Postfix) with ESMTP id B960590000BF for ; Wed, 15 Sep 2021 23:10:31 +0000 (UTC) Received: by mail-ed1-f44.google.com with SMTP id g21so9568433edw.4 for ; Wed, 15 Sep 2021 16:10:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9BtEwTezY+qlQ0z6udcY1NI+sWRfBUg4i/PQVpsepp0=; b=mKDdUXh3ZD6Q0nk7BTUP3zZAAgNh3CAlmZL8bCeApJLsy2E+8udpR+XxK5SomOti9c wE0fKNqqEg3jngk8STojSXv7igrZNhj0Kx9W749QBu/7Id6QwxsqtSBglOSKKJOC5GDM t6Gj5chRhn5m+m//diJNa25h157qVqj5ZUyYAIwx6A5oBB2Mgl1h8D7WXReI+ojaoJAk QIL3zznbP8S53Iafn6ampKpLeDScxlB9nvKm4PJCkyktfNHU7TsbHoNCJdiD49EK5D17 eqKsot1RJMRXHlx8Z4CnhcIMMIg6mCokUdNnAsa8M2TVVSm/mSVDh2Joouf0QXlyh6j0 MveQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9BtEwTezY+qlQ0z6udcY1NI+sWRfBUg4i/PQVpsepp0=; b=cCz9NVlFcCC3eheAS+fITyblVewFR/KQ5srMFVlyCl7QvwNwe2iMLsXZ/EEDbhCgoJ 9YVE/wG6RXNp9xOpVyGEy04Wd4mMwubC5KpY/VQYpYz5pTfFao+m38IMHgy59M4mGbyB llGuYVpQRBI6o3Y0dBI2aiC+1MGmZBzCc+Kfv8AfFXDQnDO19E4AkhddENCCiNgH2MTC 4bnar9lqBwlYJeZPbOXAop7m8iU4vejjuJpTfIMLmPjNNszDnbdY+VWo86dafz2ERU8t m7z9vHySx5hXpPcvtmjnB4iDvoN8IfOS2MtM4VMm+lhNWekG1Q/wCEiAh3+Fm/DFs0oA Fu0Q== X-Gm-Message-State: AOAM530Qzeo4bXJvkq9RTxz9MQlWBoKlE6m5hQTxdCQiPaRwFK828N2G s50af/SlSQsXc3X5yGWSG4MOq3yGfHbpMy/gLsk= X-Google-Smtp-Source: ABdhPJwUidQxI6KsoarEmVDuFeGRhn/6QSW1h4jhAdXCpwtLZDplDG7rOxUVfxwKzPWRszACj09qneVbrJeE8mHtByg= X-Received: by 2002:a05:6402:14c3:: with SMTP id f3mr2841214edx.312.1631747430578; Wed, 15 Sep 2021 16:10:30 -0700 (PDT) MIME-Version: 1.0 References: <20210914183718.4236-1-shy828301@gmail.com> <20210914183718.4236-3-shy828301@gmail.com> <20210915114947.2zh7inouztenth6o@box.shutemov.name> In-Reply-To: From: Yang Shi Date: Wed, 15 Sep 2021 16:10:18 -0700 Message-ID: Subject: Re: [PATCH 2/4] mm: khugepaged: check if file page is on LRU after locking page To: "Kirill A. Shutemov" Cc: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oscar Salvador , Andrew Morton , Linux MM , Linux FS-devel Mailing List , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mKDdUXh3; spf=pass (imf23.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: pgorziwb3q3s9myyubi17s97tahctpkz X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B960590000BF X-HE-Tag: 1631747431-49704 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 15, 2021 at 4:00 PM Yang Shi wrote: > > On Wed, Sep 15, 2021 at 10:48 AM Yang Shi wrote: > > > > On Wed, Sep 15, 2021 at 4:49 AM Kirill A. Shutemov wrote: > > > > > > On Tue, Sep 14, 2021 at 11:37:16AM -0700, Yang Shi wrote: > > > > The khugepaged does check if the page is on LRU or not but it doesn't > > > > hold page lock. And it doesn't check this again after holding page > > > > lock. So it may race with some others, e.g. reclaimer, migration, etc. > > > > All of them isolates page from LRU then lock the page then do something. > > > > > > > > But it could pass the refcount check done by khugepaged to proceed > > > > collapse. Typically such race is not fatal. But if the page has been > > > > isolated from LRU before khugepaged it likely means the page may be not > > > > suitable for collapse for now. > > > > > > > > The other more fatal case is the following patch will keep the poisoned > > > > page in page cache for shmem, so khugepaged may collapse a poisoned page > > > > since the refcount check could pass. 3 refcounts come from: > > > > - hwpoison > > > > - page cache > > > > - khugepaged > > > > > > > > Since it is not on LRU so no refcount is incremented from LRU isolation. > > > > > > > > This is definitely not expected. Checking if it is on LRU or not after > > > > holding page lock could help serialize against hwpoison handler. > > > > > > > > But there is still a small race window between setting hwpoison flag and > > > > bump refcount in hwpoison handler. It could be closed by checking > > > > hwpoison flag in khugepaged, however this race seems unlikely to happen > > > > in real life workload. So just check LRU flag for now to avoid > > > > over-engineering. > > > > > > > > Signed-off-by: Yang Shi > > > > --- > > > > mm/khugepaged.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index 045cc579f724..bdc161dc27dc 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -1808,6 +1808,12 @@ static void collapse_file(struct mm_struct *mm, > > > > goto out_unlock; > > > > } > > > > > > > > + /* The hwpoisoned page is off LRU but in page cache */ > > > > + if (!PageLRU(page)) { > > > > + result = SCAN_PAGE_LRU; > > > > + goto out_unlock; > > > > + } > > > > + > > > > if (isolate_lru_page(page)) { > > > > > > isolate_lru_page() should catch the case, no? TestClearPageLRU would fail > > > and we get here. > > > > Hmm... you are definitely right. How could I miss this point. > > > > It might be because of I messed up the page state by some tests which > > may do hole punch then reread the same index. That could drop the > > poisoned page then collapse succeed. But I'm not sure. Anyway I didn't > > figure out how the poisoned page could be collapsed. It seems > > impossible. I will drop this patch. > > I think I figured out the problem. This problem happened after the > page cache split patch and if the hwpoisoned page is not head page. It > is because THP split will unfreeze the refcount of tail pages to 2 > (restore refcount from page cache) then dec refcount to 1. The > refcount pin from hwpoison is gone and it is still on LRU. Then > khugepged locked the page before hwpoison, the refcount is expected to > khugepaged. > > The worse thing is it seems this problem is applicable to anonymous > page too. Once the anonymous THP is split by hwpoison the pin from > hwpoison is gone too the refcount is 1 (comes from PTE map). Then > khugepaged could collapse it to huge page again. It may incur data > corruption. > > And the poisoned page may be freed back to buddy since the lost refcount pin. > > If the poisoned page is head page, the code is fine since hwpoison > doesn't put the refcount for head page after split. > > The fix is simple, just keep the refcount pin for hwpoisoned subpage. Err... wait... I just realized I missed the below code block: if (subpage == page) continue; It skips the subpage passed to split_huge_page() so the refcount pin from the caller for this subpage is kept. And hwpoison doesn't put it. So it seems fine. > > > > > > > > > > result = SCAN_DEL_PAGE_LRU; > > > > goto out_unlock; > > > > -- > > > > 2.26.2 > > > > > > > > > > > > > > -- > > > Kirill A. Shutemov