From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E09EC433DB for ; Mon, 18 Jan 2021 09:29:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8478B2245C for ; Mon, 18 Jan 2021 09:29:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8478B2245C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1AEF6B02C3; Mon, 18 Jan 2021 04:29:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAD826B02C4; Mon, 18 Jan 2021 04:29:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6C418D0002; Mon, 18 Jan 2021 04:29:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id BA67D6B02C3 for ; Mon, 18 Jan 2021 04:29:22 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6F3978249980 for ; Mon, 18 Jan 2021 09:29:22 +0000 (UTC) X-FDA: 77718372564.04.sugar75_3911fbd27548 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 4CC0D80081C8 for ; Mon, 18 Jan 2021 09:29:22 +0000 (UTC) X-HE-Tag: sugar75_3911fbd27548 X-Filterd-Recvd-Size: 5401 Received: from mail.kingsoft.com (unknown [114.255.44.146]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 18 Jan 2021 09:29:19 +0000 (UTC) X-AuditID: 0a580157-f39ff7000005df43-e1-60054af3818d Received: from mail.kingsoft.com (localhost [10.88.1.32]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-1-NODE-87) with SMTP id F2.88.57155.3FA45006; Mon, 18 Jan 2021 16:46:43 +0800 (HKT) Received: from aili-OptiPlex-7020 (172.16.253.254) by KSBJMAIL2.kingsoft.cn (10.88.1.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 18 Jan 2021 17:09:01 +0800 Date: Mon, 18 Jan 2021 17:09:00 +0800 From: Aili Yao To: "HORIGUCHI =?UTF-8?B?TkFPWUE=?=(=?UTF-8?B?5aCA5Y+j44CA55u05Lmf?=)" CC: Oscar Salvador , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Message-ID: <20210118170900.6fe9595a.yaoaili@kingsoft.com> In-Reply-To: <20210118085747.GA904@hori.linux.bs1.fc.nec.co.jp> References: <20210115155506.2d59fe83.yaoaili@kingsoft.com> <20210115084920.GA4092@linux> <20210115172622.699d68e5.yaoaili@kingsoft.com> <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> <20210118135744.7413cd06.yaoaili@kingsoft.com> <20210118065054.GA7447@hori.linux.bs1.fc.nec.co.jp> <20210118161512.701c94e7.yaoaili@kingsoft.com> <20210118085747.GA904@hori.linux.bs1.fc.nec.co.jp> Organization: Kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL2.kingsoft.cn (10.88.1.32) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrELMWRmVeSWpSXmKPExsXCFcGooPvZizXB4NJ8FYt7a/6zWlxsPMBo cWZakQOzx6ZPk9g9XlzdyOKx+XR1AHMUl01Kak5mWWqRvl0CV8br5tqC80IVXXuXsTcwnuPr YuTkkBAwkejceJ+ti5GLQ0hgOpNE+//rzCAJIYEXjBJftlmA2CwCqhLT3t8Fi7MB2bvuzWIF sUUEkiQWz/7KBNLMLNDGKHFlx0cmkISwQKLE8Un72EBsXgEridtXm8FsTgF7ieubvzNBbOtj lth4ug8owcHBLyAm8arBGOIie4nnf88yQ/QKSpyc+YQFxGYW0JRo3f6bHcLWlli28DXUoYoS h5f8YofoVZI40j2DDcKOlVg27xXrBEbhWUhGzUIyahaSUQsYmVcxshTnphtuYoQEc/gOxnlN H/UOMTJxMB5ilOBgVhLhLV3HlCDEm5JYWZValB9fVJqTWnyIUZqDRUmcl/fLn3ghgfTEktTs 1NSC1CKYLBMHp1QDE49i+U7BRddCLpX1zm18k12WGb76QS1T0MeWhWcvqphZXv30aNI3s+eP GDP2HHrj+HNCXOL5CRm7Jn5mu/WZPetZsHm82xH7lyvdJk3p36vzUai8X3bPnfYKrc3FzCU8 q4t4nmrt3/xqxXS+GztNvS6cKoi9c/fcmWZhKRvT1tkqkpFTQx/MuN5Q0Jyz6ufl6Wdsjpjb TfjyrW1zuujOzmjB9DmpNYm76oMWVxz9E1B0VKtnvfHMma/tBU8ITJvMv/91zYVFXYrfE2b+ MnNdsOV49DKP1rCbmVt6rhzlsqr4IOIrlcW3UF7HMyQ5x2rlrkNnbp77oD3nDV/wwaehbWsf aCp+DzOX711wf57NKmMlluKMREMt5qLiRAAkLrjI1QIAAA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.004906, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 18 Jan 2021 08:57:47 +0000 HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F) wrote: > > >=20 > > > For action optional cases, one error event kills *only one* process. = If an > > > error page are shared by multiple processes, these processes will be = killed > > > by separate error events, each of which is triggered when each proces= s tries > > > to access the error memory. So these processes would be killed immed= iately > > > when accessing the error, but you don't have to kill all at the same = time > > > (or actually you might not even have to kill it at all if the process= exits > > > finally without accessing the error later). > > >=20 > > > Maybe the function variable "force_early" is named confusingly (it so= unds > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > > I'll submit a fix later. (I'll add your "Reported-by" because you ma= de me > > > find it, thank you.) > > > =20 > > I think we should do more for non current process error case, we should= mark it AO for processes to be signaled > > or we may take wrong action. =20 >=20 > I'm not sure what you mean by "non current process error case" and "we > should mark it AO", so could you explain more specifically about your err= or > scenario? =20 I will share my test code and i will submit another patch to this scenari= o. please give me some time, thanks! And I think you are right, AR is only current process. > Especially I'd like to know about who triggers hard offline on > what hardware events and what "wrong action" could happen. Maybe just > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, because > it's not enough for us to see that your scenario is possible. Current > implementation implicitly assumes some hardware behavior, and does not wo= rk > for the case which never happens under the assumption. >=20 This action is from mcelog daemon, normally softpage offlie is default, b= ut we can configure hardpage offline for CE storms, to get related processes signaled. > Do you have some test cases to reproduce any specific issue (like data lo= st) > on your system? (If yes, please share it.) Or your concern is from code r= eview? > I will make it clean, get it shared Thanks --=20 Best Regards! Aili Yao