From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 623E7C4332F for ; Wed, 19 Oct 2022 18:31:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B70746B0072; Wed, 19 Oct 2022 14:31:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1F476B0073; Wed, 19 Oct 2022 14:31:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C0716B0074; Wed, 19 Oct 2022 14:31:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 89DF46B0072 for ; Wed, 19 Oct 2022 14:31:23 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5DDF0120BDD for ; Wed, 19 Oct 2022 18:31:23 +0000 (UTC) X-FDA: 80038541646.12.8E80E7D Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf22.hostedemail.com (Postfix) with ESMTP id D488CC0028 for ; Wed, 19 Oct 2022 18:31:21 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id t12-20020a17090a3b4c00b0020b04251529so753364pjf.5 for ; Wed, 19 Oct 2022 11:31:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yM0uwPnj6p1l0Vw8Vwd+2ABZhuvsHqcxHJob6ITYEUE=; b=C+O4qNrSYj7KA1j+eCNL+Ay7VrHzRJXkWUQ2b6oXQQA/T4lowFXpfzpWsEsS5aJK8H F/k79iImu0EYUEqiHhhE0pgJcT5KBFpFvO3T/9uUM8wS+bFvwUA42tfp9tParmphQ0Gc U0d5FCUKSBhANP3iTzrLc1aIL/gQJQN2vPYgH8Zj7OPLoLeSytrz/H4Oq7q+/5NkO3lQ +pSemIEr10NjFQxZ6vCUzhVpaBV2kOO2L9fqMU1gLOLHqAZ2BPpIYfT1u5LTrOxPi1x+ 7gJuF8ukdDmr/ytYr3TY8pilGLNhv90kTVkoYyDx32mIAYSBBUlJVhl0AfXNH4J0/tzw NI1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yM0uwPnj6p1l0Vw8Vwd+2ABZhuvsHqcxHJob6ITYEUE=; b=qdMKKHion9ef5fz3g6eaNq8VeQCTmvRjyV5Y8RLyq7vjpl6Myjkw+iMtNumByCWiID jHIN3L1AwvvbjrO2LQVbH3hik8hWLUaSDHOXXmWgHnstEDnaNVMgHAfm/w+qwcWvNmay Wyl85tyl+Fxh+OnixaS+QdRR1qewBYo3N30Jeu4Zbq0GYo5OkB0jEATdPff0cjuGJA8t 5YPgsCkEP2yIq4ez+1Sg/qIIDSvMZtk/k6twuaIWdyAzHEuCaCCN4VWRzdNsOiF2NdVC 02o9uanBa9JRQ1KnJbN8WN5A2t/gskpwX5iGKNKeJ4nS/Q4AhnlABGEU+RrMYEXuy37A Nxbg== X-Gm-Message-State: ACrzQf1S/b8ncjo0mgge1INtHASafMphDC56JtHp1fOn9MSZj0keYnEi nNj7UQgaG/cXD9N8ideud8JpSzsw3XZWH1QK6r8= X-Google-Smtp-Source: AMsMyM7wfgoNmD59GgTzTHHz2vdkzDHJoKgejRCMy0H+v9wmEhS0AeNx7ASEW6sDS2ebG48W9deqSp/dFv2yM8BKpAE= X-Received: by 2002:a17:90b:1c87:b0:20a:e485:4e21 with SMTP id oo7-20020a17090b1c8700b0020ae4854e21mr11232233pjb.194.1666204280538; Wed, 19 Oct 2022 11:31:20 -0700 (PDT) MIME-Version: 1.0 References: <20221018200125.848471-1-jthoughton@google.com> In-Reply-To: <20221018200125.848471-1-jthoughton@google.com> From: Yang Shi Date: Wed, 19 Oct 2022 11:31:08 -0700 Message-ID: Subject: Re: [PATCH] hugetlbfs: don't delete error page from pagecache To: James Houghton Cc: Mike Kravetz , Muchun Song , Naoya Horiguchi , Miaohe Lin , Andrew Morton , Axel Rasmussen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666204281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yM0uwPnj6p1l0Vw8Vwd+2ABZhuvsHqcxHJob6ITYEUE=; b=DCwNozjjDhZ7h8e02yqYVm5KL1nlu1Nm4xlNApFPklDIsy0zYlTIx5QjeKCKt74O0jumes TwCD8/PDHMEe74PALV1mozBDNe9k0knqFm7b9ILYPiuy8K3rY6DQZwiaAzRTdqtxYHf4sO c4pYWHw3LM+GBlVRjBz//TPI/7I69bk= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=C+O4qNrS; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666204281; a=rsa-sha256; cv=none; b=U6jGnSCJgWM3M0RNWdie1rLb76LkBmvBOGXNziFuO7c8Eh0qNJo92DitIULKbxkvzj+YBj ihaWYqKuJqKUazZFjDbp1F7V2MY9iw8a5FeARNRMSfFoxfAuH69C3GxPmuAc0Iz4QnNzzw TIk4wFM+uu2Qe6QR4+LIRusbdAMXkDQ= Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=C+O4qNrS; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: kq86g7s69yz4cn4tw7e9ge5purbtg4sg X-Rspamd-Queue-Id: D488CC0028 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1666204281-971560 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 18, 2022 at 1:01 PM James Houghton wrote: > > This change is very similar to the change that was made for shmem [1], > and it solves the same problem but for HugeTLBFS instead. > > Currently, when poison is found in a HugeTLB page, the page is removed > from the page cache. That means that attempting to map or read that > hugepage in the future will result in a new hugepage being allocated > instead of notifying the user that the page was poisoned. As [1] states, > this is effectively memory corruption. > > The fix is to leave the page in the page cache. If the user attempts to > use a poisoned HugeTLB page with a syscall, the syscall will fail with > EIO, the same error code that shmem uses. For attempts to map the page, > the thread will get a BUS_MCEERR_AR SIGBUS. > > [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens") > > Signed-off-by: James Houghton Thanks for the patch. Yes, we should do the same thing for hugetlbfs. When I was working on shmem I did look into hugetlbfs too. But the problem is we actually make the whole hugetlb page unavailable even though just one 4K sub page is hwpoisoned. It may be fine to 2M hugetlb page, but a lot of memory may be a huge waste for 1G hugetlb page, particular for the page fault path. So I discussed this with Mike offline last year, and I was told Google was working on PTE mapped hugetlb page. That should be able to solve the problem. And we'd like to have the high-granularity hugetlb mapping support as the predecessor. There were some other details, but I can't remember all of them, I have to refresh my memory by rereading the email discussions... > --- > fs/hugetlbfs/inode.c | 13 ++++++------- > mm/hugetlb.c | 4 ++++ > mm/memory-failure.c | 5 ++++- > 3 files changed, 14 insertions(+), 8 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index fef5165b73a5..7f836f8f9db1 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) > } else { > unlock_page(page); > > + if (PageHWPoison(page)) { > + put_page(page); > + retval = -EIO; > + break; > + } > + > /* > * We have the page, copy it to user space buffer. > */ > @@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping, > static int hugetlbfs_error_remove_page(struct address_space *mapping, > struct page *page) > { > - struct inode *inode = mapping->host; > - pgoff_t index = page->index; > - > - hugetlb_delete_from_page_cache(page_folio(page)); > - if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) > - hugetlb_fix_reserve_counts(inode); > - > return 0; > } > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 97896165fd3f..5120a9ccbf5b 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6101,6 +6101,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > > ptl = huge_pte_lock(h, dst_mm, dst_pte); > > + ret = -EIO; > + if (PageHWPoison(page)) > + goto out_release_unlock; > + > /* > * We allow to overwrite a pte marker: consider when both MISSING|WP > * registered, we firstly wr-protect a none pte which has no page cache > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 145bb561ddb3..bead6bccc7f2 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p) > int res; > struct page *hpage = compound_head(p); > struct address_space *mapping; > + bool extra_pins = false; > > if (!PageHuge(hpage)) > return MF_DELAYED; > @@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p) > mapping = page_mapping(hpage); > if (mapping) { > res = truncate_error_page(hpage, page_to_pfn(p), mapping); > + /* The page is kept in page cache. */ > + extra_pins = true; > unlock_page(hpage); > } else { > unlock_page(hpage); > @@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p) > } > } > > - if (has_extra_refcount(ps, p, false)) > + if (has_extra_refcount(ps, p, extra_pins)) > res = MF_FAILED; > > return res; > -- > 2.38.0.413.g74048e4d9e-goog >