From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0DC7C10F1A for ; Thu, 9 May 2024 08:30:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 72F256B0092; Thu, 9 May 2024 04:30:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DFA76B0093; Thu, 9 May 2024 04:30:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A84B6B0095; Thu, 9 May 2024 04:30:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3D1276B0092 for ; Thu, 9 May 2024 04:30:24 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F1583A1D15 for ; Thu, 9 May 2024 08:30:23 +0000 (UTC) X-FDA: 82098185526.17.F4E5368 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf17.hostedemail.com (Postfix) with ESMTP id 8E5044000C for ; Thu, 9 May 2024 08:30:20 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf17.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715243422; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f/s8dhME/aBb0R6J9ejxs2ShcI5wQ/w4sRm8HFH0eBE=; b=Evms07rHM8WLis5Do+QSPkoXZq/LtAhvZKPRBl+g0msYwPDMX2oqhHkckocBya6dPjCQdt AUIC6m39Ov2Ko/T96He4SF5wNXQTE2pH0NtmmueiGOcmzGeZAPS484OekZFrTxQOxX9ZgZ RGz5tnCEymBHBlZDBdGDeUwPwUr8VTw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf17.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715243422; a=rsa-sha256; cv=none; b=jx3e11a3tqF0nMp7HA+uCB7VvsnXdZi4mKHJvP2wwcoqxIAADMQW8sOdB7n9VwsY2zmAjZ nLcDhjsG9Sm5FSNnwigwILVPVcP6OipPwABr0ZvP4rx3o/5iNejXVCTwEsqkAQ0ljgOCLY T0BYsLlwvkZkRLVw8ij46blcMf/tJN8= Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4VZlVR4S05z1RDD8; Thu, 9 May 2024 16:26:55 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id 9962A180032; Thu, 9 May 2024 16:30:16 +0800 (CST) Received: from [10.173.135.154] (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Thu, 9 May 2024 16:30:16 +0800 Subject: Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail To: Jane Chu , , , , References: <20240501232458.3919593-1-jane.chu@oracle.com> <20240501232458.3919593-4-jane.chu@oracle.com> <038cffc0-e027-b518-460f-40099819c588@huawei.com> <1b4c50b6-2371-4e1b-aef3-d70c32888054@oracle.com> From: Miaohe Lin Message-ID: <30d4d249-e3b1-79d5-3501-0ccb9c529110@huawei.com> Date: Thu, 9 May 2024 16:30:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <1b4c50b6-2371-4e1b-aef3-d70c32888054@oracle.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.135.154] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To canpemm500002.china.huawei.com (7.192.104.244) X-Stat-Signature: 41rerse1gaddhu8n8xhof59wh7ikszn7 X-Rspamd-Queue-Id: 8E5044000C X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1715243420-826774 X-HE-Meta: U2FsdGVkX1/lQsdcKbvqEpeAn6K1OCpluXvrpOspPRLJHYQo8LYMnxJEoiRLOYvAYGuAwJ+PEWhpnSdhAzseXnGOD/OKHvT97FuGPhD3guJ7K62auHeQwKgwRftwM/izfFevAnilZ4c2GCmNbslxnwTy1/uMlpv6wfS7H33kemHyR/ccNuBFB9PPGEJqWIsLgJqkeshR2liDp20TUpp0/23Hb0My9MPZJ43yUL0bJlVIXG8tIlI47H4bl1/E464rPXD9VHV6GEh0LdfvOEMTdYNHB14rnf2k00xH1OIVYmcOKrQ3tqXF6zfyiTh/LOKrkkcELSRCKA/mHZk4Qj1NPuGXHKr0irDX4/TwW0leQdQIYHbERRfv+QsTlbGfqlw3dko269n0OtlYmfkdl8PX9IKpKk4eya8n1Y7Y/wXk3cP7UrMGxWZYCon2Z4hNzdX7EQFWLMmfWkMYRf6ohFv6sLiEgN9dV3DRjDVGs1n+3S3rpPfDB7H1G++6+huY6R0Vv0DhdOMM4Qvi+S1cPQTp2hAYYl/h9hAJE5mlQHYbUF/2xBjuyWeFjoGm5ST9JagM1T9BL4VzQxM29s8q3TQw9Mtmwp7a4OHeKGqS9+56mHybt3URz9tbTte+iUZUn4EnnwK6XeNGbmNTZGkX8eSLf7qv11arER0AILlXCbaawauNfqYMsnZf+rUHXlhzOtwfRyqyUAbeY6zgrApPqJJThaffKvx5RmCzAH0KN0BPRFT76wMYO/GHeYu/wFQOPI6T+1wE5UI4Ny1yX2iKXZstRrZ/hfTz64QFFjqCgaZCnze5YE0ZYM+0pR64tev8htg7ZeVyZspiKKSXDfW7XuEKlrty/qrG1xspOFHKNY2n3GOArSUr7YsAAjQ01yYoIirEUnOD2KcUPkkd+uzD5gKaOBipRRPBUAL7gZa4+Q9cci8/Vhl0231RiQ32HrvNTvCOWBSQw7Jjo2sHBfy/3lm c5VRAI6R OSQUd97Q2qpHPjfHMgyiWS4ueN1WOBr0FiIHTr/j6gPN7Pw7itDEKixmqi+39y/dqpv5siHZNSOI+uAwDXOwjX6dQ8gUVIox0KfHIDK4K8wg1It6cnHCYSGFkEFPMdNb98Flvzhu8llW2jJwDhE19rQjqRt+dtplejY64dl1blFvWCvUd8flvdUtOPVAisO7RJnDjAHdA03aujG9TAfbMI61e2iEks8xqL7FarBxS7Zb9W8dFpbSzDxgpEFBupqWcxDNRMYItzqznXFU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/5/9 1:45, Jane Chu wrote: > On 5/8/2024 1:08 AM, Miaohe Lin wrote: > >> On 2024/5/7 4:26, Jane Chu wrote: >>> On 5/5/2024 12:00 AM, Miaohe Lin wrote: >>> >>>> On 2024/5/2 7:24, Jane Chu wrote: >>>>> When handle hwpoison in a GUP longterm pin'ed thp page, >>>>> try_to_split_thp_page() will fail. And at this point, there is little else >>>>> the kernel could do except sending a SIGBUS to the user process, thus >>>>> give it a chance to recover. >>>>> >>>>> Signed-off-by: Jane Chu >>>> Thanks for your patch. Some comments below. >>>> >>>>> --- >>>>>    mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++ >>>>>    1 file changed, 36 insertions(+) >>>>> >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>> index 7fcf182abb96..67f4d24a98e7 100644 >>>>> --- a/mm/memory-failure.c >>>>> +++ b/mm/memory-failure.c >>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, >>>>>        return rc; >>>>>    } >>>>>    +/* >>>>> + * The calling condition is as such: thp split failed, page might have >>>>> + * been GUP longterm pinned, not much can be done for recovery. >>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user >>>>> + * application has a chance to recover. Also, application processes' >>>>> + * election for MCE early killed will be honored. >>>>> + */ >>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, >>>>> +            struct page *hpage) >>>>> +{ >>>>> +    struct folio *folio = page_folio(hpage); >>>>> +    LIST_HEAD(tokill); >>>>> +    int res = -EHWPOISON; >>>>> + >>>>> +    /* deal with user pages only */ >>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p)) >>>>> +        res = -EBUSY; >>>>> +    if (!(PageLRU(hpage) || PageHuge(p))) >>>>> +        res = -EBUSY; >>>> Above checks seems unneeded. We already know it's thp? >>> Agreed. >>> >>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic, >>> >>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page. >>> >>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios. >>> >>>>> + >>>>> +    if (res == -EHWPOISON) { >>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); >>>>> +        kill_procs(&tokill, true, pfn, flags); >>>>> +    } >>>>> + >>>>> +    if (flags & MF_COUNT_INCREASED) >>>>> +        put_page(p); >>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails? >>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should. >> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below: >> >> static int try_to_split_thp_page(struct page *page) >> { >>     int ret; >> >>     lock_page(page); >>     ret = split_huge_page(page); >>     unlock_page(page); >> >>     if (unlikely(ret)) >>         put_page(page); >>     ^^^^^^^^^^^^^^^^^^^^^^^ >>     return ret; >> } >> >> Or am I miss something? > > I think you caught a bug in my code, thanks! > > How about moving put_page() outside try_to_split_thp_page() ? If you want to send SIGBUS in the event of thp split fail, it might be required to do so. I think kill_procs_now() needs extra thp refcnt to do its work. > >> >>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not. >>> >>>>> + >>>> action_result is missing? >>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios. >>> >>> In this case, I think the reason  is that, we just killed the process and there is nothing >>> >>> else to do or to report. >>> >>>>> +    return res; >>>>> +} >>>>> + >>>>>    /** >>>>>     * memory_failure - Handle memory failure of a page. >>>>>     * @pfn: Page Number of the corrupted page >>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags) >>>>>             */ >>>>>            SetPageHasHWPoisoned(hpage); >>>>>            if (try_to_split_thp_page(p) < 0) { >>>> Should hwpoison_filter() be called in this case? >>> Yes, it should. I will add the hwpoison_filter check. >>>>> +            if (flags & MF_ACTION_REQUIRED) { >> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it? > > I took a clue from kill_accessing_process() which is invoked only if MF_ACTION_REQUIRED is set. > > The usual code path for delivery signal is > > if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then > > - send SIGKILL if vaddr is -EFAULT > > - send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set > > - send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and process elected for MCE-early-kill > > So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is in the patch), one can argue that > > the MCE-early-kill request is not honored which deviates from the existing behavior. > > Perhaps I should remove the > > + if (flags & MF_ACTION_REQUIRED) { I tend to agree MCE-early-kill request should be honored when try to kill process. Thanks. .