From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC226C4332F for ; Thu, 30 Sep 2021 21:53:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 56DF761994 for ; Thu, 30 Sep 2021 21:53:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 56DF761994 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 78A829400D4; Thu, 30 Sep 2021 17:53:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7138E94003A; Thu, 30 Sep 2021 17:53:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B40A9400D4; Thu, 30 Sep 2021 17:53:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 4655A94003A for ; Thu, 30 Sep 2021 17:53:18 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 01693180868F8 for ; Thu, 30 Sep 2021 21:53:18 +0000 (UTC) X-FDA: 78645591276.09.05D9E96 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf05.hostedemail.com (Postfix) with ESMTP id B47945071134 for ; Thu, 30 Sep 2021 21:53:17 +0000 (UTC) Received: by mail-pg1-f171.google.com with SMTP id s11so7555922pgr.11 for ; Thu, 30 Sep 2021 14:53:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=b91JNmgBs9FvxSHNJZlK8LCb6pIcgS0V6e/WL1ZeX70=; b=D+jiga5zkvtDM8pphyD7273sQkPhE155EYcr/nmh6p73psfFTYAS/NoXjjjuzttp// D3vI2Jx6LfuY/IHRLyUeWDeMuCAHu4y3FHVhnM3r+RzraASCO49dqtYQLi881elYt4k+ OPlnbku361y7hzOBKJUKijPFd3N+C1v0RB0L8CrPEJv0TpUiyPsq4Az30ZHJCBT7wxOs rh/7OLyFnZ4RPeS6BsP+XASRwUR7PnUCvCdOb2/D/Ce1pekE8dTiqnuBtiGiix9XYlzy NbRgRf/NivfKMRtfpNfwm+8QncdwTt9DsN7mzL8t6rF2q/YM647Ej/os46jIhRyTFlDW rcZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=b91JNmgBs9FvxSHNJZlK8LCb6pIcgS0V6e/WL1ZeX70=; b=CsCck+HzwnPyMI2xgUhE0/kXbIjKnTA2LZlMPwaxNVJkgymHiPP/9cjEuSaOxeKcxT ZXVD33lA3iAHrWet34WYBdFya5x5sJ9fjYnviP1Ry+yvHpFq9crlBo4+UGydhXWKcQ4o PiPLKJjZ9A80ZR5AVlQf5YyElvNuz/zQHV58DpXDh1P7Vhq52U+dRwsQ6xPIr8AnsFik XFi1CA3di0gtENAHjjHZNqnn9LoQVs/tyufl1/xmRDliiUjVHeKc1l0V5qPbwgR7Adi9 ++LHaJdEWfCS5HBO4ppZCUjH0f+ISzRblI/nljz+pi7YgHZ/lvEoDKZB+GfVwN0QNcV9 irFQ== X-Gm-Message-State: AOAM532B7nQg7LqBeeztp1DUXLS30jupZ/TU+dR0d8KAQb2KCU5HyA9j t2+N0MvKzfX5mVgF6W5+W+Q= X-Google-Smtp-Source: ABdhPJxwj4Z9BwgO9J0ScgP/SgIfkjSrFfHVQW21v+uWYPmuQANXYFw1njf1wRrPK5erk7uwDMZLHw== X-Received: by 2002:a65:6251:: with SMTP id q17mr6903883pgv.416.1633038796460; Thu, 30 Sep 2021 14:53:16 -0700 (PDT) Received: from localhost.localdomain (c-73-93-239-127.hsd1.ca.comcast.net. [73.93.239.127]) by smtp.gmail.com with ESMTPSA id p17sm5647535pjg.54.2021.09.30.14.53.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Sep 2021 14:53:15 -0700 (PDT) From: Yang Shi To: naoya.horiguchi@nec.com, hughd@google.com, kirill.shutemov@linux.intel.com, willy@infradead.org, peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v3 PATCH 0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) Date: Thu, 30 Sep 2021 14:53:06 -0700 Message-Id: <20210930215311.240774-1-shy828301@gmail.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=D+jiga5z; spf=pass (imf05.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B47945071134 X-Stat-Signature: ckghkreyyz5yfkofn5g3umuq793mbzfz X-HE-Tag: 1633038797-932224 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When discussing the patch that splits page cache THP in order to offline = the poisoned page, Noaya mentioned there is a bigger problem [1] that prevent= s this from working since the page cache page will be truncated if uncorrectable errors happen. By looking this deeper it turns out this approach (trunca= ting poisoned page) may incur silent data loss for all non-readonly filesystem= s if the page is dirty. It may be worse for in-memory filesystem, e.g. shmem/= tmpfs since the data blocks are actually gone. To solve this problem we could keep the poisoned dirty page in page cache= then notify the users on any later access, e.g. page fault, read/write, etc. = The clean page could be truncated as is since they can be reread from disk la= ter on. The consequence is the filesystems may find poisoned page and manipulate = it as healthy page since all the filesystems actually don't check if the page i= s poisoned or not in all the relevant paths except page fault. In general,= we need make the filesystems be aware of poisoned page before we could keep = the poisoned page in page cache in order to solve the data loss problem. To make filesystems be aware of poisoned page we should consider: - The page should be not written back: clearing dirty flag could prevent = from writeback. - The page should not be dropped (it shows as a clean page) by drop cache= s or other callers: the refcount pin from hwpoison could prevent from invali= dating (called by cache drop, inode cache shrinking, etc), but it doesn't avoi= d invalidation in DIO path. - The page should be able to get truncated/hole punched/unlinked: it work= s as it is. - Notify users when the page is accessed, e.g. read/write, page fault and= other paths (compression, encryption, etc). The scope of the last one is huge since almost all filesystems need do it= once a page is returned from page cache lookup. There are a couple of options= to do it: 1. Check hwpoison flag for every path, the most straightforward way. 2. Return NULL for poisoned page from page cache lookup, the most callsit= es check if NULL is returned, this should have least work I think. But t= he error handling in filesystems just return -ENOMEM, the error code will= incur confusion to the users obviously. 3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but = this will involve significant amount of code change as well since all the p= aths need check if the pointer is ERR or not just like option #1. I did prototype for both #1 and #3, but it seems #3 may require more chan= ges than #1. For #3 ERR_PTR will be returned so all the callers need to chec= k the return value otherwise invalid pointer may be dereferenced, but not all c= allers really care about the content of the page, for example, partial truncate = which just sets the truncated range in one page to 0. So for such paths it nee= ds additional modification if ERR_PTR is returned. And if the callers have = their own way to handle the problematic pages we need to add a new FGP flag to = tell FGP functions to return the pointer to the page. It may happen very rarely, but once it happens the consequence (data corr= uption) could be very bad and it is very hard to debug. It seems this problem ha= d been slightly discussed before, but seems no action was taken at that time. [2= ] As the aforementioned investigation, it needs huge amount of work to solv= e the potential data loss for all filesystems. But it is much easier for in-memory filesystems and such filesystems actually suffer more than othe= rs since even the data blocks are gone due to truncating. So this patchset = starts from shmem/tmpfs by taking option #1. Patch #1: cleanup, depended by patch #2 Patch #2: fix THP with hwpoisoned subpage(s) PMD map bug Patch #2: refactor and preparation. Patch #4: keep the poisoned page in page cache and handle such case for a= ll the paths. Patch #5: the previous patches unblock page cache THP split, so this patc= h add page cache THP split support. I didn't receive too many comments for patch #3 ~ #5, so may consider sep= arate the bug fixes (patch #1 and #2) from others to make them merged sooner. = This version still includes all 5 patches. Changelog v2 --> v3: * Incorporated the comments from Kirill. * Reordered the series to reflect the right dependency (patch #3 from v= 2 is patch #1 in this revision, patch #1 from v2 is patch #2 in this revision). * After the reorder, patch #2 depends on patch #1 and both need to be backported to -stable. v1 --> v2: * Incorporated the suggestion from Kirill to use a new page flag to indicate there is hwpoisoned subpage(s) in a THP. (patch #1) * Dropped patch #2 of v1. * Refctored the page refcount check logic of hwpoison per Naoya. (patch= #2) * Removed unnecessary THP check per Naoya. (patch #3) * Incorporated the other comments for shmem from Naoya. (patch #4) Yang Shi (5): mm: hwpoison: remove the unnecessary THP check mm: filemap: check if THP has hwpoisoned subpage for PMD page fault mm: hwpoison: refactor refcount check handling mm: shmem: don't truncate page if memory failure happens mm: hwpoison: handle non-anonymous THP correctly include/linux/page-flags.h | 19 +++++++++++ mm/filemap.c | 12 +++---- mm/huge_memory.c | 2 ++ mm/memory-failure.c | 129 +++++++++++++++++++++++++++++++++++++++= ++++-------------------------- mm/memory.c | 9 +++++ mm/page_alloc.c | 4 ++- mm/shmem.c | 31 +++++++++++++++-- mm/userfaultfd.c | 5 +++ 8 files changed, 153 insertions(+), 58 deletions(-) [1] https://lore.kernel.org/linux-mm/CAHbLzkqNPBh_sK09qfr4yu4WTFOzRy+MKj+= PA7iG-adzi9zGsg@mail.gmail.com/T/#m0e959283380156f1d064456af01ae51fdff912= 65 [2] https://lore.kernel.org/lkml/20210318183350.GT3420@casper.infradead.o= rg/