From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91E91C41535 for ; Tue, 19 Dec 2023 11:50:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA0756B007D; Tue, 19 Dec 2023 06:50:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E50106B007E; Tue, 19 Dec 2023 06:50:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3EA16B0080; Tue, 19 Dec 2023 06:50:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C157F6B007D for ; Tue, 19 Dec 2023 06:50:56 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7FBE612038C for ; Tue, 19 Dec 2023 11:50:56 +0000 (UTC) X-FDA: 81583401312.09.32BC299 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf11.hostedemail.com (Postfix) with ESMTP id 22BE040027 for ; Tue, 19 Dec 2023 11:50:52 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702986654; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tY0r2lyKp713YiQDDonQH4ns8LXVI+WvnrsbguaqVaY=; b=CMny/C8sE/2xWpgh9fTuC5C0xTstGxhzi/JQsqektS42XtYLmNlmWXm7l+g8U7E9c2cXoc S533c0FTHL7yNKghdMu5u7xZsLBjbOtaxORPmu7WnbYwKgRj+YW/0ow4VLN7F1kxwK8C+K jz2UKGvpcEJxPa8cEdWLXt+xKOJ5FWE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702986654; a=rsa-sha256; cv=none; b=EW0onjGj/+z+Aw73tclPjvaKdFcak+gb9aDggmik5YSWV3QsrNaEVdBRTHjkcrLBDsjIIr 4D22BOFLOwT1pwCkAjI80b8ni7s/7KBWgbWaM8ggubOSsbQbLnLo9rfYcbU4AuyXJtdN3f qdwCmH2SK9bdV+2ZqJ1lCnhkdLOZhu4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4SvZkx1mLYzMp0Y; Tue, 19 Dec 2023 19:50:33 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id 494BD18006C; Tue, 19 Dec 2023 19:50:47 +0800 (CST) Received: from [10.174.151.185] (10.174.151.185) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 19 Dec 2023 19:50:46 +0800 Subject: Re: [PATCH 1/1] mm: memory-failure: Re-split hw-poisoned huge page on -EAGAIN To: Qiuxu Zhuo CC: , , , , , HORIGUCHI NAOYA References: <20231215081204.8802-1-qiuxu.zhuo@intel.com> From: Miaohe Lin Message-ID: <81eebf23-fce3-3bb3-857d-8aab5a75d788@huawei.com> Date: Tue, 19 Dec 2023 19:50:46 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20231215081204.8802-1-qiuxu.zhuo@intel.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.151.185] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To canpemm500002.china.huawei.com (7.192.104.244) X-Rspamd-Queue-Id: 22BE040027 X-Rspam-User: X-Stat-Signature: 889obz3zz8ycmcj9p7wgmwgsp8dgou3q X-Rspamd-Server: rspam03 X-HE-Tag: 1702986652-908937 X-HE-Meta: U2FsdGVkX18Qw23mTbaHAHF62BPCUk7IYBrnS6zc10Nm8QQ2hy5AHYvq1HOmV9wjey0snTbnMzPPdMsWEvRKwYNazTlfsullzWV3ilD5VvP+iWLlb05wKN9RzUaemlQTa9enED5P0T+YfmYTRPuu8dwe4ka1x/V4lR1o2NJPixNmmIQwglB4kQNSDm8hKg0mzWuagAlNMbtgS7vT2AtIttT5BjEeN0opMYzz41f6d8uAyMU8ekr6ElRdGcKTXQ1rqJdoAizuVW1/kDO9kwQREzFnpaaBq15DsfhZcq+U2zRaH+yU9bZFFxd4BhJAB4ATMkxwFiwlmUwkOEYiTxhuM9WJxUqaqzMTlTy+L72/avD/ZuOGPvt2HXpIDGyqo945SqGRlzAtqragNjhxjnqLIC1ZFoo3YrM/NJLwPq8m9WNLvIcMzEWa7mZi4jUuOOeh5DD2s01xXlvK4fWuhLT1ALYXUdw0xWKZBRNqxjomw3icU+vaHFmsvi7OhJX6FMEMmMAsicpbCFEdFXe/PGg0iX1HfiR8sX0VfZe+vbn8i6l8JTWNkXlsy0PnDGu+uxyRmIygN9l/UAG1q+bWcjEM90FeuMD/5rOhBjdoTawxVPcbknakPj9q/isa29/w2klWU1oOXSstwzLaUGw2VT814me5VQTDpxUeJP+4W0Nw8ZzoayWs12mDw5t/7dyve2TgmgYO5OrTVpeEY0/GY+VYeM5r+Hc5H91OZQ4FapC+Zuu14ZXbCLGplj+zoX8xsoPhvrjM4ELXsYJmLgugwmgyehoDOiPlVfQKG/BIcxrYokobjQJKLPGYtkWxYIdoajbJ2FRXCTGbq2X0uUye35KLLs07eSDlklnpQz1nXa21enRIlVSYD50PmFBDy601WZqaVT8fe4S7KVDwpWrbuNa870V4+DzELdfgGMTMP/Sox7th/oV0JQkb/FWJnGiwQM9ex5ESQYRtdWZPUlKNkrq yH7EzgEb YNm9mYlGH+d5tupJaWpxt+g8JlwtLqwES9fiIfebWhJLPz+ME+ClUzAMduunghUcEGhKDePhAjWaMWTEpQUokegN1VGfcsvJezNszFCJMq49mrWbbXmugZaf65xUgdsu89E/LJf6FXtPW0WH/GDYQA2IHcN0DEsbnM+qz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2023/12/15 16:12, Qiuxu Zhuo wrote: > During the process of splitting a hw-poisoned huge page, it is possible > for the reference count of the huge page to be increased by the threads > within the affected process, leading to a failure in splitting the > hw-poisoned huge page with an error code of -EAGAIN. > > This issue can be reproduced when doing memory error injection to a > multiple-thread process, and the error occurs within a huge page. > The call path with the returned -EAGAIN during the testing is shown below: > > memory_failure() > try_to_split_thp_page() > split_huge_page() > split_huge_page_to_list() { > ... > Step A: can_split_folio() - Checked that the thp can be split. > Step B: unmap_folio() > Step C: folio_ref_freeze() - Failed and returned -EAGAIN. > ... > } > > The testing logs indicated that some huge pages were split successfully > via the call path above (Step C was successful for these huge pages). > However, some huge pages failed to split due to a failure at Step C, and > it was observed that the reference count of the huge page increased between > Step A and Step C. > > Testing has shown that after receiving -EAGAIN, simply re-splitting the > hw-poisoned huge page within memory_failure() always results in the same > -EAGAIN. This is possible because memory_failure() is executed in the > currently affected process. Before this process exits memory_failure() and > is terminated, its threads could increase the reference count of the > hw-poisoned page. > > To address this issue, employ the kernel worker to re-split the hw-poisoned > huge page. By the time this worker begins re-splitting the hw-poisoned huge > page, the affected process has already been terminated, preventing its > threads from increasing the reference count. Experimental results have > consistently shown that this worker successfully re-splits these > hw-poisoned huge pages on its first attempt. > > The kernel log (before): > [ 1116.862895] Memory failure: 0x4097fa7: recovery action for unsplit thp: Ignored > > The kernel log (after): > [ 793.573536] Memory failure: 0x2100dda: recovery action for unsplit thp: Delayed > [ 793.574666] Memory failure: 0x2100dda: split unsplit thp successfully. > > Signed-off-by: Qiuxu Zhuo Thanks for your patch. Except for the comment from Naoya, I have some questions about the code itself. > --- > mm/memory-failure.c | 73 +++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 71 insertions(+), 2 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 660c21859118..0db4cf712a78 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -72,6 +72,60 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); > > static bool hw_memory_failure __read_mostly = false; > > +#define SPLIT_THP_MAX_RETRY_CNT 10 > +#define SPLIT_THP_INIT_DELAYED_MS 1 > + > +static bool split_thp_pending; > + > +struct split_thp_req { > + struct delayed_work work; > + struct page *thp; > + int retries; > +}; > + > +static void split_thp_work_fn(struct work_struct *work) > +{ > + struct split_thp_req *req = container_of(work, typeof(*req), work.work); > + int ret; > + > + /* Split the thp. */ > + get_page(req->thp); Can req->thp be freed when split_thp_work_fn is scheduled ? > + lock_page(req->thp); > + ret = split_huge_page(req->thp); > + unlock_page(req->thp); > + put_page(req->thp); > + > + /* Retry with an exponential backoff. */ > + if (ret && ++req->retries < SPLIT_THP_MAX_RETRY_CNT) { > + schedule_delayed_work(to_delayed_work(work), > + msecs_to_jiffies(SPLIT_THP_INIT_DELAYED_MS << req->retries)); > + return; > + } > + > + pr_err("%#lx: split unsplit thp %ssuccessfully.\n", page_to_pfn(req->thp), ret ? "un" : ""); > + kfree(req); > + split_thp_pending = false; split_thp_pending is not protected against split_thp_delayed? Though this race should be benign. Thanks.