From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12799C433EF for ; Thu, 14 Oct 2021 19:16:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 89D7261151 for ; Thu, 14 Oct 2021 19:16:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 89D7261151 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A96E6940007; Thu, 14 Oct 2021 15:16:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A474D900002; Thu, 14 Oct 2021 15:16:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93553940007; Thu, 14 Oct 2021 15:16:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0252.hostedemail.com [216.40.44.252]) by kanga.kvack.org (Postfix) with ESMTP id 85864900002 for ; Thu, 14 Oct 2021 15:16:23 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 35B101827D898 for ; Thu, 14 Oct 2021 19:16:23 +0000 (UTC) X-FDA: 78695999046.18.DFBBE49 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf06.hostedemail.com (Postfix) with ESMTP id 6D8ED801A89C for ; Thu, 14 Oct 2021 19:16:22 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id l6so4816309plh.9 for ; Thu, 14 Oct 2021 12:16:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=HTDYZolZ+ivgi1kzX7EW6yg+2PCzOZV/wqf9pSIfFwo=; b=nzmuf/eLBYEesOVRL8UTHmHlrYBmZnwfqXs4QJiSYz6oGx+X28femQo7IvEwrOTYTp JWL7ezJhF9CampIb31e2YbTBYtYqrUWHUP3pfwpEKRF7mzUbT5dkUbTskjWU7w0cSf8X jJtJ0la2N5gMfdQYuoFMSIsC4iuhs9EnLw8plFAyNA5yIBIcEXZYC2p73/m5XsMKWwFs FfJlUiPwcdGs0+ZaazvbAy7o63u0Xy/qXGAaDUoV56AcX9Ss+L+fPqREZPPDnKzvXz59 R9F5m8E2rc5EMnhRwkKHB8Re0Jw5rpTZYu0U8Zff1OjaNuol3V3rMm8W9fVZ9OuFKSpI mQMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=HTDYZolZ+ivgi1kzX7EW6yg+2PCzOZV/wqf9pSIfFwo=; b=YU77aLC/++MyoipYR1JQRMVJxaP+xZSr/WfF7bCAWGyGPve9RzOgKIpMyOpW861LFr fzsOrhspOBkwiBIHVMlOud3xyxkhC2y7RWhESSKiSsFtCQgCaUupBbjiUeEkPlVfq5Mx v1XxJ+iQx8XKAKebg+Qfxga/u+D3pfG0NzviclF71epLtpuu+OKJIrXZWy2tV0zgD8iB TL3LiWkESDJuYP0z6Cl5LY14ag/6pMK1Ty0Zy5Kdc3Zm6SSwSaltZiojE2l8zfzVtMOJ 2IOypkcd1wbHPAiN8Qp+bu4/Rm75JeZ+VYp0UkV3FYRjP7AGWkygUartdZ0liOVJ6QlT pNNg== X-Gm-Message-State: AOAM532DJXr7zSh2LLeFAqYf1rKEGXrToVKMIdTYdHXy7z7j0qg1qRDg sazyDB3oN7aBLmsX9qoqxTw= X-Google-Smtp-Source: ABdhPJwn/bv58JfObtM5eYcTND2o1mZYEdMu4OHOizTN8mhnPpHgm9GqE55IsFRCEexFIOoFc2C1Dw== X-Received: by 2002:a17:903:41c1:b0:13e:fe56:e42a with SMTP id u1-20020a17090341c100b0013efe56e42amr6559027ple.52.1634238981683; Thu, 14 Oct 2021 12:16:21 -0700 (PDT) Received: from localhost.localdomain (c-73-93-239-127.hsd1.ca.comcast.net. [73.93.239.127]) by smtp.gmail.com with ESMTPSA id x129sm3253922pfc.140.2021.10.14.12.16.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Oct 2021 12:16:20 -0700 (PDT) From: Yang Shi To: naoya.horiguchi@nec.com, hughd@google.com, kirill.shutemov@linux.intel.com, willy@infradead.org, peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v4 PATCH 0/6] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Date: Thu, 14 Oct 2021 12:16:09 -0700 Message-Id: <20211014191615.6674-1-shy828301@gmail.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6D8ED801A89C X-Stat-Signature: jaqx8hh3778i65pmyqao4cixbp7esi7r Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="nzmuf/eL"; spf=pass (imf06.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1634238982-890196 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When discussing the patch that splits page cache THP in order to offline = the poisoned page, Noaya mentioned there is a bigger problem [1] that prevent= s this from working since the page cache page will be truncated if uncorrectable errors happen. By looking this deeper it turns out this approach (trunca= ting poisoned page) may incur silent data loss for all non-readonly filesystem= s if the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/= tmpfs since the data blocks are actually gone. To solve this problem we could keep the poisoned dirty page in page cache= then notify the users on any later access, e.g. page fault, read/write, etc. = The clean page could be truncated as is since they can be reread from disk la= ter on. The consequence is the filesystems may find poisoned page and manipulate = it as healthy page since all the filesystems actually don't check if the page i= s poisoned or not in all the relevant paths except page fault. In general,= we need make the filesystems be aware of poisoned page before we could keep = the poisoned page in page cache in order to solve the data loss problem. To make filesystems be aware of poisoned page we should consider: - The page should be not written back: clearing dirty flag could prevent = from writeback. - The page should not be dropped (it shows as a clean page) by drop cache= s or other callers: the refcount pin from hwpoison could prevent from invali= dating (called by cache drop, inode cache shrinking, etc), but it doesn't avoi= d invalidation in DIO path. - The page should be able to get truncated/hole punched/unlinked: it work= s as it is. - Notify users when the page is accessed, e.g. read/write, page fault and= other paths (compression, encryption, etc). The scope of the last one is huge since almost all filesystems need do it= once a page is returned from page cache lookup. There are a couple of options= to do it: 1. Check hwpoison flag for every path, the most straightforward way. 2. Return NULL for poisoned page from page cache lookup, the most callsit= es check if NULL is returned, this should have least work I think. But t= he error handling in filesystems just return -ENOMEM, the error code will= incur confusion to the users obviously. 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but = this will involve significant amount of code change as well since all the p= aths need check if the pointer is ERR or not just like option #1. I did prototype for both #1 and #3, but it seems #3 may require more chan= ges than #1. For #3 ERR_PTR will be returned so all the callers need to chec= k the return value otherwise invalid pointer may be dereferenced, but not all c= allers really care about the content of the page, for example, partial truncate = which just sets the truncated range in one page to 0. So for such paths it nee= ds additional modification if ERR_PTR is returned. And if the callers have = their own way to handle the problematic pages we need to add a new FGP flag to = tell FGP functions to return the pointer to the page. It may happen very rarely, but once it happens the consequence (data corr= uption) could be very bad and it is very hard to debug. It seems this problem ha= d been slightly discussed before, but seems no action was taken at that time. [2= ] As the aforementioned investigation, it needs huge amount of work to solv= e the potential data loss for all filesystems. But it is much easier for in-memory filesystems and such filesystems actually suffer more than othe= rs since even the data blocks are gone due to truncating. So this patchset = starts from shmem/tmpfs by taking option #1. TODO: * The unpoison has been broken since commit 0ed950d1f281 ("mm,hwpoison: m= ake get_hwpoison_page() call get_any_page()"), and this patch series make refcount check for unpoisoning shmem page fail. * Expand to other filesystems. But I haven't heard feedback from filesys= tem developers yet. Patch breakdown: Patch #1: cleanup, depended by patch #2 Patch #2: fix THP with hwpoisoned subpage(s) PMD map bug Patch #3: coding style cleanup Patch #4: refactor and preparation. Patch #5: keep the poisoned page in page cache and handle such case for a= ll the paths. Patch #6: the previous patches unblock page cache THP split, so this patc= h add page cache THP split support. Changelog v3 --> v4: * Separated coding style cleanup from patch 2/5 by adding a new patch (patch 3/6) per Kirill. * Moved setting PageHasHWPoisoned flag to proper place (patch 2/6) per Peter Xu. * Elaborated why soft offline doesn't need to set this flag in the comm= it message (patch 2/6) per Peter Xu. * Renamed "dec" parameter to "extra_pins" for has_extra_refcount() (pat= ch 4/6) per Peter Xu. * Adopted the suggestions for comment and coding style (patch 5/6) per Naoya. * Checked if page is hwpoison or not for shmem_get_link() (patch 5/6) p= er Peter Xu. * Collected acks. v2 --> v3: * Incorporated the comments from Kirill. * Reordered the series to reflect the right dependency (patch #3 from v= 2 is patch #1 in this revision, patch #1 from v2 is patch #2 in this revision). * After the reorder, patch #2 depends on patch #1 and both need to be backported to -stable. v1 --> v2: * Incorporated the suggestion from Kirill to use a new page flag to indicate there is hwpoisoned subpage(s) in a THP. (patch #1) * Dropped patch #2 of v1. * Refctored the page refcount check logic of hwpoison per Naoya. (patch= #2) * Removed unnecessary THP check per Naoya. (patch #3) * Incorporated the other comments for shmem from Naoya. (patch #4) Yang Shi (6): mm: hwpoison: remove the unnecessary THP check mm: filemap: check if THP has hwpoisoned subpage for PMD page fault mm: filemap: coding style cleanup for filemap_map_pmd() mm: hwpoison: refactor refcount check handling mm: shmem: don't truncate page if memory failure happens mm: hwpoison: handle non-anonymous THP correctly include/linux/page-flags.h | 23 ++++++++++++ mm/filemap.c | 12 +++---- mm/huge_memory.c | 2 ++ mm/memory-failure.c | 136 +++++++++++++++++++++++++++++++++++++++= +++++------------------------- mm/memory.c | 9 +++++ mm/page_alloc.c | 4 ++- mm/shmem.c | 37 +++++++++++++++++-- mm/userfaultfd.c | 5 +++ 8 files changed, 170 insertions(+), 58 deletions(-) [1] https://lore.kernel.org/linux-mm/CAHbLzkqNPBh_sK09qfr4yu4WTFOzRy+MKj+= PA7iG-adzi9zGsg@mail.gmail.com/T/#m0e959283380156f1d064456af01ae51fdff912= 65 [2] https://lore.kernel.org/lkml/20210318183350.GT3420@casper.infradead.o= rg/