From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCB9FC433FE for ; Thu, 17 Mar 2022 09:28:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE8136B0071; Thu, 17 Mar 2022 05:28:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E97E78D0002; Thu, 17 Mar 2022 05:28:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D865B8D0001; Thu, 17 Mar 2022 05:28:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id C844E6B0071 for ; Thu, 17 Mar 2022 05:28:21 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 68DCEA2B15 for ; Thu, 17 Mar 2022 09:28:21 +0000 (UTC) X-FDA: 79253352402.16.8F95F85 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf13.hostedemail.com (Postfix) with ESMTP id F1E332001C for ; Thu, 17 Mar 2022 09:28:18 +0000 (UTC) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4KK1td0MzNz9smV; Thu, 17 Mar 2022 17:24:25 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Thu, 17 Mar 2022 17:28:14 +0800 Subject: Re: [PATCH v4] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() To: Mike Kravetz , Naoya Horiguchi CC: Andrew Morton , Yang Shi , Naoya Horiguchi , , Linux-MM References: <20220316120701.394061-1-naoya.horiguchi@linux.dev> <7362f9ee-81fa-702a-7a03-1a91ecf0b58e@oracle.com> From: Miaohe Lin Message-ID: <3fe5a7e9-785d-db79-543a-c7723fc6f505@huawei.com> Date: Thu, 17 Mar 2022 17:28:13 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <7362f9ee-81fa-702a-7a03-1a91ecf0b58e@oracle.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: F1E332001C X-Stat-Signature: fibz5tmymzfahzuz5i8uj1ncoergcubm X-HE-Tag: 1647509298-533606 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/3/17 6:51, Mike Kravetz wrote: > On 3/16/22 05:07, Naoya Horiguchi wrote: >> From: Miaohe Lin >> >> There is a race condition between memory_failure_hugetlb() and hugetlb >> free/demotion, which causes setting PageHWPoison flag on the wrong page. >> The one simple result is that wrong processes can be killed, but another >> (more serious) one is that the actual error is left unhandled, so no one >> prevents later access to it, and that might lead to more serious results >> like consuming corrupted data. >> >> Think about the below race window: >> >> CPU 1 CPU 2 >> memory_failure_hugetlb >> struct page *head = compound_head(p); >> hugetlb page might be freed to >> buddy, or even changed to another >> compound page. >> >> get_hwpoison_page -- page is not what we want now... >> >> The compound_head is called outside hugetlb_lock, so the head is not >> reliable. >> >> So set PageHWPoison flag after passing prechecks. And to detect >> potential violation, this patch also introduces a new action type >> MF_MSG_DIFFERENT_PAGE_SIZE. > > Thanks for squashing these patches. > > In my testing, there is a change in behavior that may not be intended. > > My test strategy is: > - allocate two hugetlb pages > - create a mapping which reserves those two pages, but does not fault them in > - as a result, the pages are on the free list but can not be freed > - inject error on a subpage of one of the huge pages > - echo 0xYYY > /sys/kernel/debug/hwpoison/corrupt-pfn > - memory error code will call dissolve_free_huge_page > - dissolve_free_huge_page returns -EBUSY because > h->free_huge_pages - h->resv_huge_pages == 0 > - We never end up setting Poison on the page with error or head page > - Huge page sitting on free list with error in subpage and not marked > - huge page with error could be given to an application or returned to buddy > > Prior to this change, Poison would be set on the head page > Many thanks for pointing this out. IIUC, this change in behavior should be a bit unintended. We're trying to avoid setting PageHWPoison flag on the wrong page so we have to set the PageHWPoison flag after passing prechecks as commit log said. But there is room for improvement, e.g. when page changed to single page or another compound-size page after we grab the page refcnt, we could also set PageHWPoison before bailing out ? There might be something more we can do? > I do not think this was an intended change in behavior. But, perhaps it is > all we can do in this case? Sorry for not being able to look more closely > at the code right now. > Thanks.