From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17DD0C433F5 for ; Thu, 12 May 2022 02:54:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7AEB6B0074; Wed, 11 May 2022 22:54:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A00A86B0075; Wed, 11 May 2022 22:54:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A1FC6B0078; Wed, 11 May 2022 22:54:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 747126B0074 for ; Wed, 11 May 2022 22:54:10 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5239630784 for ; Thu, 12 May 2022 02:54:10 +0000 (UTC) X-FDA: 79455571860.23.BD4D442 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf24.hostedemail.com (Postfix) with ESMTP id B3B3B1800CA for ; Thu, 12 May 2022 02:54:00 +0000 (UTC) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KzGYf0lt7zhZ20; Thu, 12 May 2022 10:53:26 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 12 May 2022 10:54:05 +0800 Subject: Re: [PATCH v1] mm,hwpoison: set PG_hwpoison for busy hugetlb pages To: Mike Kravetz , Naoya Horiguchi CC: Andrew Morton , zhenwei pi , Naoya Horiguchi , , Linux-MM References: <20220511151955.3951352-1-naoya.horiguchi@linux.dev> From: Miaohe Lin Message-ID: Date: Thu, 12 May 2022 10:54:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B3B3B1800CA X-Stat-Signature: ow3agfcotisecx1xou4wrfziarksuzq1 X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-HE-Tag: 1652324040-709604 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/5/12 2:35, Mike Kravetz wrote: > On 5/11/22 08:19, Naoya Horiguchi wrote: >> From: Naoya Horiguchi >> >> If memory_failure() fails to grab page refcount on a hugetlb page >> because it's busy, it returns without setting PG_hwpoison on it. >> This not only loses a chance of error containment, but breaks the rule >> that action_result() should be called only when memory_failure() do >> any of handling work (even if that's just setting PG_hwpoison). >> This inconsistency could harm code maintainability. >> >> So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case. I'm sorry but where is hugetlb_set_page_hwpoison() defined and used ? I can't find it. >> >> Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") >> Signed-off-by: Naoya Horiguchi >> --- >> include/linux/mm.h | 1 + >> mm/memory-failure.c | 8 ++++---- >> 2 files changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index d446e834a3e5..04de0c3e4f9f 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -3187,6 +3187,7 @@ enum mf_flags { >> MF_MUST_KILL = 1 << 2, >> MF_SOFT_OFFLINE = 1 << 3, >> MF_UNPOISON = 1 << 4, >> + MF_NO_RETRY = 1 << 5, >> }; >> extern int memory_failure(unsigned long pfn, int flags); >> extern void memory_failure_queue(unsigned long pfn, int flags); >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 6a28d020a4da..e3269b991016 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1526,7 +1526,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) >> count_increased = true; >> } else { >> ret = -EBUSY; >> - goto out; >> + if (!(flags & MF_NO_RETRY)) >> + goto out; >> } > > Hi Naoya, > > We are in the else block because !HPageFreed() and !HPageMigratable(). > IIUC, this likely means the page is isolated. One common reason for isolation > is migration. So, the page could be isolated and on a list for migration. > > I took a quick look at the hugetlb migration code and did not see any checks > for PageHWPoison after a hugetlb page is isolated. I could have missed > something? If there are no checks, we will read the PageHWPoison page > in kernel mode while copying to the migration target. > > Is this an issue? Is is something we need to be concerned with? Memory > errors can happen at any time, and gracefully handling them is best effort. It seems HWPoison hugetlb page will still be accessed before this patch. Can we do a get_page_unless_zero first here to ensure that hugetlb page migration should fail due to this extra page reference and thus not access the page content? If hugetlb page is already freezed, corrupted memory will still be consumed though. :( Thanks! >