From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 631B0C433E0 for ; Mon, 18 Jan 2021 05:58:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8D73522512 for ; Mon, 18 Jan 2021 05:58:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D73522512 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D715E6B00B4; Mon, 18 Jan 2021 00:57:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D486E6B00B6; Mon, 18 Jan 2021 00:57:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C84726B00B7; Mon, 18 Jan 2021 00:57:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id B05D96B00B4 for ; Mon, 18 Jan 2021 00:57:59 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 719448249980 for ; Mon, 18 Jan 2021 05:57:59 +0000 (UTC) X-FDA: 77717839878.12.brake30_630126d27546 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 4B84218018045 for ; Mon, 18 Jan 2021 05:57:59 +0000 (UTC) X-HE-Tag: brake30_630126d27546 X-Filterd-Recvd-Size: 5987 Received: from mail.kingsoft.com (mail.kingsoft.com [114.255.44.145]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Mon, 18 Jan 2021 05:57:55 +0000 (UTC) X-AuditID: 0a580155-6fbff700000550c6-19-60051d76952f Received: from mail.kingsoft.com (localhost [10.88.1.32]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-2-NODE-85) with SMTP id CE.38.20678.67D15006; Mon, 18 Jan 2021 13:32:38 +0800 (HKT) Received: from aili-OptiPlex-7020 (172.16.253.254) by KSBJMAIL2.kingsoft.cn (10.88.1.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 18 Jan 2021 13:57:44 +0800 Date: Mon, 18 Jan 2021 13:57:44 +0800 From: Aili Yao To: "HORIGUCHI =?UTF-8?B?TkFPWUE=?=(=?UTF-8?B?5aCA5Y+j44CA55u05Lmf?=)" CC: Oscar Salvador , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Message-ID: <20210118135744.7413cd06.yaoaili@kingsoft.com> In-Reply-To: <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> References: <20210115155506.2d59fe83.yaoaili@kingsoft.com> <20210115084920.GA4092@linux> <20210115172622.699d68e5.yaoaili@kingsoft.com> <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> Organization: Kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL2.kingsoft.cn (10.88.1.32) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsXCFcGooFsmy5pgcLrP1OLemv+sFhcbDzBa nJlW5MDssenTJHaPF1c3snhsPl0dwBzFZZOSmpNZllqkb5fAlbHj/nO2gusSFd/fdjI2ML4T 7mLk5JAQMJFY/2MtSxcjF4eQwHQmiYtb2hghnBeMEh8ajrGCVLEIqEqca78FZrMB2bvuzQKz RQSSJBbP/soE0sAs0MYocWXHRyaQhLBAosTxSfvYuhg5OHgFrCTmNymDhDkFHCQ+/Oliglhw kFFi6frf7CA1/AJiEq8ajCEuspd4/vcsM4jNKyAocXLmExYQm1lAU6J1O0g5iK0tsWzha7Aa IQFFicNLfrFD9CpJHOmewQZhx0osm/eKdQKj8Cwko2YhGTULyagFjMyrGFmKc9ONNjFCwjl0 B+OMpo96hxiZOBgPMUpwMCuJ8JauY0oQ4k1JrKxKLcqPLyrNSS0+xCjNwaIkzjv38594IYH0 xJLU7NTUgtQimCwTB6dUA9PFI7+PyunFc4pUF1+vW25rx5bzO+EJz5PiAzv+XxBM6DP+YTIh 4OT6qfeeL5q+Sivo7zongzUz1jvYvJyRIr4qLsSP+XrkT5b3n498qzefED3VMyE6553wsubW A455hdOmKWdu/nhG8OzzVPa/h+ecyQxWkjS2+FrJePRiuoCHdr/pJNZH85Rs117K26OSxKZ8 iLX8ijtnx7KYxTPYkt/3/zoX6xr2W232/KcJbdrxhRP/3z4sWK+w/fiqfVXGhp/OLWmZ+6si s97qp9C1cx5b+S+dcvz4aadyT1/Sc5aw2Wdm5K6ou7ZkTknybpPHrceVIyWcfBle7I+ye5DS YzGFb1b+AvOv9++eqlvQtUZZiaU4I9FQi7moOBEAE2knrdYCAAA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001127, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 18 Jan 2021 05:15:55 +0000 HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F) wrote: > Hi Aili, >=20 > On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > > On Fri, 15 Jan 2021 09:49:24 +0100 > > Oscar Salvador wrote: > > =20 > > > I am having a hard time trying to grasp what are you trying to achiev= e here. > > > Could you elaborate some more? Ideally stating what is the problem yo= u are > > > fixing here. > > > =20 > > Sorry for confusion, example: there are four process A,B,C,D,which map = the same file into > > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, if= process A trigger one > > UE with MF_ACTION_REQUIRED set, in current code, only process A will b= e killed, B,C,D remain > > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be kill= ed. =20 >=20 > This behavior seems not to me what PF_MCE_KILL_EARLY intends. This flag > controls whether memory error handler kills processes immediately or not, > and it only affects action optional cases (i.e. called without > MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such choice > and affected processes should be always killed immediately. >=20 > We may also need to consider the difference in context of these two cases. > Action optional case is called asynchronously by background process like > memory scrubbing, so all processes mapping the error memory are the affec= ted > ones. Action required event is more synchronous, and is called when a > process experiences memory access errors on data load and instruction fet= ch > instructions. So the affected process in this case is only the process. > So I still think the this background justifies the current behavior. >=20 > But my knowledge might be old, if you have newer hardwares which define > other type of memory error and that doesn't fit with current implementati= on, > I'd like to extend code to support the new cases, so please let me know. >=20 Sorry, I don't fully get your concern. For Action optional cases, It's may from CE storm or patrol scrub, when the= process want to process this condition, it will set PF_MCE_KILL_EARLY, and it will be signaled for such case. For Action Required cases,we must do something, I think it's more urgent an= d serious, In the current code, the process triggered the Error Should be signaled. but the process with PF_MCE_KILL_EARLY won't get signal= ed, just because PF_MCE_KILL_EARLY is for action optional case? Action Required is for current we must handle, the same Action Required iss= ue is Action optional for non-current processes, Right? I don't think Action Required is for all processes, For current processes ,= it may be AR, for other process, it may be AO, and they should also be signaled, I think this behavior its reasonable.=20 And we can't determine which error will be triggered, the PF_MCE_KILL_EARLY= fLAG is meant to handle memory error gracefully and won't be restricted to explicitly declared AO errors. Thanks! --=20 Best Regards! Aili Yao