From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E9C5C433E0 for ; Mon, 18 Jan 2021 07:16:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 577DB223E4 for ; Mon, 18 Jan 2021 07:16:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 577DB223E4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 77BC16B0359; Mon, 18 Jan 2021 02:16:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 72CF66B035B; Mon, 18 Jan 2021 02:16:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CD536B035C; Mon, 18 Jan 2021 02:16:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 382A16B0359 for ; Mon, 18 Jan 2021 02:16:52 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B73CC8249980 for ; Mon, 18 Jan 2021 07:16:51 +0000 (UTC) X-FDA: 77718038622.16.fang61_171659c27547 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 943CE100E6903 for ; Mon, 18 Jan 2021 07:16:51 +0000 (UTC) X-HE-Tag: fang61_171659c27547 X-Filterd-Recvd-Size: 8164 Received: from mail.kingsoft.com (mail.kingsoft.com [114.255.44.145]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Mon, 18 Jan 2021 07:16:38 +0000 (UTC) X-AuditID: 0a580155-713ff700000550c6-27-60052fe6a9f8 Received: from mail.kingsoft.com (localhost [10.88.1.32]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-2-NODE-85) with SMTP id 89.78.20678.6EF25006; Mon, 18 Jan 2021 14:51:18 +0800 (HKT) Received: from aili-OptiPlex-7020 (172.16.253.254) by KSBJMAIL2.kingsoft.cn (10.88.1.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 18 Jan 2021 15:16:24 +0800 Date: Mon, 18 Jan 2021 15:16:19 +0800 From: Aili Yao To: "HORIGUCHI =?UTF-8?B?TkFPWUE=?=(=?UTF-8?B?5aCA5Y+j44CA55u05Lmf?=)" CC: Oscar Salvador , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Message-ID: <20210118151619.592067a5.yaoaili@kingsoft.com> In-Reply-To: <20210118065054.GA7447@hori.linux.bs1.fc.nec.co.jp> References: <20210115155506.2d59fe83.yaoaili@kingsoft.com> <20210115084920.GA4092@linux> <20210115172622.699d68e5.yaoaili@kingsoft.com> <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> <20210118135744.7413cd06.yaoaili@kingsoft.com> <20210118065054.GA7447@hori.linux.bs1.fc.nec.co.jp> Organization: Kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL2.kingsoft.cn (10.88.1.32) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrCLMWRmVeSWpSXmKPExsXCFcGooPtMnzXBoOMLr8W9Nf9ZLS42HmC0 ODOtyIHZY9OnSeweL65uZPHYfLo6gDmKyyYlNSezLLVI3y6BK2PVgxvMBV81KuY/Ps7cwLhQ oYuRk0NCwETi1MIlzF2MXBxCAtOZJNZe3ArlvGCU+PHxJyNIFYuAqsTas4fZQWw2IHvXvVms ILaIQJLE4tlfmUAamAXaGCWu7PjIBJIQFkiUOD5pHxuIzStgJbHsxVWwOKeAg8SsN9dZIDas YJJo/dQNtI6Dg19ATOJVgzHESfYSz/+eZYboFZQ4OfMJC4jNLKAp0br9NzuErS2xbOFrsBoh AUWJw0t+sUP0Kkkc6Z7BBmHHSiyb94p1AqPwLCSjZiEZNQvJqAWMzKsYWYpz0402MUICOnQH 44ymj3qHGJk4GA8xSnAwK4nwlq5jShDiTUmsrEotyo8vKs1JLT7EKM3BoiTOO/fzn3ghgfTE ktTs1NSC1CKYLBMHp1QDU8tDS7unSlzxTAGXj2SFR2/87R+dofZzY8Fv1+UbyhLO//hTt9S+ j0E72OIp95IXsotZX27TZDI3752uJ+Fk+cMsZnJ2p9ql7Pqb10s4ShNaKi6bZD837VrU9eZ5 XvrBWf/a+v4eSdiwxPWznax1SP6/d1Pmz5oT5tv6Mm3SxPOV7oE2OQVd7RVMe1ifHREolZwa /495//8Y7x/2S3wE/xZ6xorpWUt/+KpjGOj7cz/zMX7zdZe0Y/R2nLiXI6vVOc3ov8C2gyaO Dzjktr45knjs7G5X5xW/Nsfzyh/4rrNP7IP15cw7nWl7Za4JhXV2712+/Q3PItMrPsvP+tyZ /efIpICv16eEFR1kK570V4mlOCPRUIu5qDgRACi6ELzXAgAA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 18 Jan 2021 06:50:54 +0000 HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F) wrote: > On Mon, Jan 18, 2021 at 01:57:44PM +0800, Aili Yao wrote: > > On Mon, 18 Jan 2021 05:15:55 +0000 > > HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F) wrote: > > =20 > > > Hi Aili, > > >=20 > > > On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: =20 > > > > On Fri, 15 Jan 2021 09:49:24 +0100 > > > > Oscar Salvador wrote: > > > > =20 > > > > > I am having a hard time trying to grasp what are you trying to ac= hieve here. > > > > > Could you elaborate some more? Ideally stating what is the proble= m you are > > > > > fixing here. > > > > > =20 > > > > Sorry for confusion, example: there are four process A,B,C,D,which = map the same file into > > > > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE= , if process A trigger one > > > > UE with MF_ACTION_REQUIRED set, in current code, only process A wi= ll be killed, B,C,D remain > > > > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be = killed. =20 > > >=20 > > > This behavior seems not to me what PF_MCE_KILL_EARLY intends. This f= lag > > > controls whether memory error handler kills processes immediately or = not, > > > and it only affects action optional cases (i.e. called without > > > MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such cho= ice > > > and affected processes should be always killed immediately. > > >=20 > > > We may also need to consider the difference in context of these two c= ases. > > > Action optional case is called asynchronously by background process l= ike > > > memory scrubbing, so all processes mapping the error memory are the a= ffected > > > ones. Action required event is more synchronous, and is called when a > > > process experiences memory access errors on data load and instruction= fetch > > > instructions. So the affected process in this case is only the proce= ss. > > > So I still think the this background justifies the current behavior. > > >=20 > > > But my knowledge might be old, if you have newer hardwares which defi= ne > > > other type of memory error and that doesn't fit with current implemen= tation, > > > I'd like to extend code to support the new cases, so please let me kn= ow. > > > =20 > > Sorry, I don't fully get your concern. > >=20 > > For Action optional cases, It's may from CE storm or patrol scrub, ... = =20 >=20 > hwpoison is not about corrected errors, but about uncorrected errors. CE = storm > should be handled by CMCI and userspace tool like mcelog, although it see= ms not > current main topic, sorry for nitpick. >=20 When hard page offline is configured, CE will also call memory-failure > > when the process want to process this condition, > > it will set PF_MCE_KILL_EARLY, and it will be signaled for such case. > > For Action Required cases,we must do something, I think it's more urgen= t and serious, In the current code, the process triggered the Error > > Should be signaled. but the process with PF_MCE_KILL_EARLY won't get si= gnaled, just because PF_MCE_KILL_EARLY is for action optional case? =20 >=20 > I don't use PF_MCE_KILL_EARLY to justify current code. Let me explain mor= e. >=20 > For action optional cases, one error event kills *only one* process. If an > error page are shared by multiple processes, these processes will be kill= ed > by separate error events, each of which is triggered when each process tr= ies > to access the error memory. So these processes would be killed immediate= ly > when accessing the error, but you don't have to kill all at the same time > (or actually you might not even have to kill it at all if the process exi= ts > finally without accessing the error later). >=20 It's not the way PF_MCE_KILL_EARLY want, normally one action optional witho= ut PF_MCE_KILL_EARLY will be signaled when it really access it, when PF_MCE_KILL_EARLY set, we may no= t just want be killed, wo may capture the signal and do some thing more. > Maybe the function variable "force_early" is named confusingly (it sounds > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > I'll submit a fix later. (I'll add your "Reported-by" because you made me > find it, thank you.) >=20 not related to force_early, this is about the memory action we take for err= or , but if you have a better one, that's will be good. > >=20 > > Action Required is for current we must handle, the same Action Required= issue is Action optional for non-current processes, Right? =20 >=20 > Right. >=20 > > I don't think Action Required is for all processes, For current process= es , it may be AR, for other process, it may be AO, and they should also > > be signaled, I think this behavior its reasonable.=20 > >=20 > > And we can't determine which error will be triggered, the PF_MCE_KILL_E= ARLY fLAG is meant to handle memory error gracefully and won't be restricted > > to explicitly declared AO errors. > >=20 Thanks --=20 Best Regards! Aili Yao