From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.6 required=3.0 tests=BAYES_00, CHARSET_FARAWAY_HEADER,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,PDS_BAD_THREAD_QP_64, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71E76C433DB for ; Mon, 18 Jan 2021 06:51:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EDD382231F for ; Mon, 18 Jan 2021 06:51:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDD382231F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=nec.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5CEE86B0356; Mon, 18 Jan 2021 01:51:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 57EFC6B0359; Mon, 18 Jan 2021 01:51:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 445826B035A; Mon, 18 Jan 2021 01:51:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id 2D65E6B0356 for ; Mon, 18 Jan 2021 01:51:01 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D93368249980 for ; Mon, 18 Jan 2021 06:51:00 +0000 (UTC) X-FDA: 77717973480.27.crown18_2400a5a27547 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id B9A783D668 for ; Mon, 18 Jan 2021 06:51:00 +0000 (UTC) X-HE-Tag: crown18_2400a5a27547 X-Filterd-Recvd-Size: 12250 Received: from JPN01-TY1-obe.outbound.protection.outlook.com (mail-eopbgr1400053.outbound.protection.outlook.com [40.107.140.53]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Mon, 18 Jan 2021 06:50:58 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NjbSvFwY9t/uJ2HcWwYU7cBSy3KLkZsDO0OLPUif6SZiGxTZ0GSKvW2DZ7YZ3enTgkF/gQae4ERXxrgJojd98wPwnnMqhLQ4gqv9vQm/PTyQlTD0c0l/O8glM/nPth46we9g4k/aNH4O1cmhgKhxrZzMMk/rm8mCZ9i+vKOcX3EyuuN/YJHXIsGe/RwOYvUhjedlDzJJzNHZv4BcVfXtyFozRh/JY+06y1w4x55AGqSczw0W/nr5NUEJv8tFNL0YQQKxh+89vC50ZjL1bSzaX+5n1bnibNy5TfDWYTmwzGaEnX6qqxFUy628/pgUEqQPgISVD+S+nWPTMj4NXVx5+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HUHlbFmGlakb/+PUqZPPgaRTsvXFIB/CGI9Hv36YO8E=; b=Y46aNO6NSET2G2kM/ll2S72hN5PLvtqUgarC1KK356KyeKCA5HZr775FAB55DE9Yf7p4gI+oXZA5UifW7m23Mp62zaDAVjKncBqab2eH2wnhBFcCOBkIgxY7OeVu1j2uIVTJndtVpjFntRzeUFqXn2UyryCB+BM610Xn6PUbFRMCi6KT5llHud6745ovDLOZAk9HQSOBhXeIBYkTI0YZwKmcmvBe/d3zOxOqFxDcAdSfaLkY+i0fnDT0hIklRvVdCBcBU4hPOu3lUUYVap8bhXVQfuqWnrNWbu69acb8kHHlY2zeqTmIFUpgLBYNACrQ+AcRzn3mF088H6/TxJ+y2Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nec.com; dmarc=pass action=none header.from=nec.com; dkim=pass header.d=nec.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nec.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HUHlbFmGlakb/+PUqZPPgaRTsvXFIB/CGI9Hv36YO8E=; b=d6xEmXXlbBUMoAs95oVUF1mm5wGwZaFZzzTbAKSeM1QIjmgcZ6lF8o2wP/tqOvjQnjBTrqmNcnbvvTrtdkfmmm/eNwbmHg29An+U84pZj7bJMpkrWhoedxemNwJ3cLbFT+m54w/xsiGd/cX/d7POVo+zlk9v8EhHoVhsgAkj1N8= Received: from TY1PR01MB1852.jpnprd01.prod.outlook.com (2603:1096:403:8::12) by TYAPR01MB5820.jpnprd01.prod.outlook.com (2603:1096:404:8053::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.11; Mon, 18 Jan 2021 06:50:55 +0000 Received: from TY1PR01MB1852.jpnprd01.prod.outlook.com ([fe80::8453:2ddb:cf2b:d244]) by TY1PR01MB1852.jpnprd01.prod.outlook.com ([fe80::8453:2ddb:cf2b:d244%7]) with mapi id 15.20.3763.013; Mon, 18 Jan 2021 06:50:55 +0000 From: =?iso-2022-jp?B?SE9SSUdVQ0hJIE5BT1lBKBskQktZOH0hIUQ+TGkbKEIp?= To: Aili Yao CC: Oscar Salvador , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Thread-Topic: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Thread-Index: AQHW6xPOLHMPPWnvkk2J5KCsMbNbvqooYH4AgAAKVACABHEEgIAAC68AgAAO2wA= Date: Mon, 18 Jan 2021 06:50:54 +0000 Message-ID: <20210118065054.GA7447@hori.linux.bs1.fc.nec.co.jp> References: <20210115155506.2d59fe83.yaoaili@kingsoft.com> <20210115084920.GA4092@linux> <20210115172622.699d68e5.yaoaili@kingsoft.com> <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> <20210118135744.7413cd06.yaoaili@kingsoft.com> In-Reply-To: <20210118135744.7413cd06.yaoaili@kingsoft.com> Accept-Language: ja-JP, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: kingsoft.com; dkim=none (message not signed) header.d=none;kingsoft.com; dmarc=none action=none header.from=nec.com; x-originating-ip: [165.225.110.205] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 65dc42ad-df7b-4c6d-64af-08d8bb7d6745 x-ms-traffictypediagnostic: TYAPR01MB5820: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:7219; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: WxP1Ik3zzzJ/PDFGZTZaLJRiQrkFnw8x5jRfLkcPpiReyB5QZdYV5FkqAGYEhTOLTmvRTe86xSiAv8Birr5xUnCLuqriNzJnXd7cfXa4RKfy+6LoI0oZscUVHSLjUovLURG5h1yegK+NLFo4OPITXd3rv2TIsXDyV4Np3oRn3sGMJfBB3HXbqsNUxfLoCyWp/16UsiWcW68QOymrWDLoPprhaLp+RO1InZ5ESooEiv6Fs0yQTr8P4WDA6pyMAriRhd//PawgEr8ISnbanV+zQTO7osmsFKVBByUiAXeX3xO5tztCemPBJ2Lzc3JGieEZyLySwJX/Q7NHWdej67S2pIDL67pSC+PkIJpEk3lr43gNo6eMQW3DTgxZEVMwyFNCcFII/s+weoXNpOFWcyuj1HAjPXkFbdfHDrAaIV9X3nnE361kM5tMfAhrCqLHuEtH8I9oFcb9OdluehJSnNacMpRRmAQKW7ZuHytEg/UVdFCuHkN+uhib7Qz6ebScqPUNkvFiCnoFdo81ES7YdplsGA== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:TY1PR01MB1852.jpnprd01.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(366004)(39860400002)(136003)(396003)(346002)(376002)(6506007)(55236004)(64756008)(186003)(66476007)(66446008)(4326008)(66946007)(6512007)(86362001)(1076003)(66556008)(26005)(5660300002)(83380400001)(76116006)(33656002)(85182001)(2906002)(8676002)(54906003)(478600001)(316002)(71200400001)(8936002)(9686003)(6916009)(6486002);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-2022-jp?B?TmpUUUYxeCtNUWtrenFYMUg5YWFvTkJESXljU2lDaDMyKzNXbE1zOURi?= =?iso-2022-jp?B?LzNxTGVSbTRwVEN5Vk5iTGE0UjRXSEFBU3RIQ1Zic2N0bDF1VkJRdWpU?= =?iso-2022-jp?B?M0wrR3NqTnRqUWJTSmdBMHRjTGIzM1RMQVk4SExXV1dtVDJXL29VM3hM?= =?iso-2022-jp?B?UGlkVElIbjhCQldXYTlLbXJTV2djdGpYSG85bFZ4K2psQTArVDdNazdW?= =?iso-2022-jp?B?Si9OR29YTlExYWkrTDRjRGwxS3RPM1ZwT2ZZWVRuQURyUjV5YjRneFhI?= =?iso-2022-jp?B?SkVEekNrVXNmc0I5Y0lkN1hJN1dyQkZkckFmVkZTQUtvZUo3eWVicVhY?= =?iso-2022-jp?B?V2ErOHdQQ01OdHV4YkVnblV0WUdaa3hlaG92a0oydjZsVkk1Kzc0RkVE?= =?iso-2022-jp?B?cmh5ajdYUWR3TTZQcEpEYXptbjQ3ZXljWVY1YkFZUGkzUG9NOGlCVUZw?= =?iso-2022-jp?B?RHVzeFByWXd0MklnM1pLUElTc1BNdG5Hc0tUS1BLOHZqQ0ovZUovNGd6?= =?iso-2022-jp?B?QjFXejJuckRiN2ZxMC9ZNXhvUEdwdTFleUF2RGNMN0puOVJyZFJOUWZx?= =?iso-2022-jp?B?SzFkWTIzMDI2Ui9BYUxRUkMrY0tVOWtWeVVXeStjd255MTlpbUM5UmNL?= =?iso-2022-jp?B?Vm5BMC90azJZc2JUYlI0Ukl3ZVYxTE9KbjZJcHNOOWNlN2daSklhcy9Z?= =?iso-2022-jp?B?bDVUcFBoczhGYVNjYlVnTndhdWhGR3lveHFoQ0xBVVNCeDlTdXRpK25E?= =?iso-2022-jp?B?Y2NKOXBiN2lYSFIvZWZPRHMwalpseDM1bEtKZ3Y5Mk1UdnlJRmRZNmJD?= =?iso-2022-jp?B?WitLTEFNb3FNUEFyeEpqa29QMlFkREROY0FuQ01qNWN6amNCeTk2TzRK?= =?iso-2022-jp?B?NnB4L1VxZjZjVmNITERzVGhlbFhla3NGbVcxVFFXQnRqd0VEd3piSm5J?= =?iso-2022-jp?B?aWFRWGROaEN4N2UrYVkzRDlqZmRQQnUzMUxBMzNiVEN0b1VEdE8vNEtY?= =?iso-2022-jp?B?TEJOc1NWeXAzN2RMSDU5Y0pRRTNHNTBhY0w0WFZvcms3THByWVd5dlBr?= =?iso-2022-jp?B?UWxEc05kem1UUDlHYjEwakE3Rk5pbjc0QWw3OC9DS1F3cXMvQ09xVGFv?= =?iso-2022-jp?B?NitFNFphZ0gweTIxRVBlZVYrYVlYSk4vYVprTFNqaHhxejZzVVBzNTlO?= =?iso-2022-jp?B?cE9ndzV2WDNGZXVpZlpKSXVwMlRjWFE2SGlnVnZUaFZoVVBuUGtPd0RY?= =?iso-2022-jp?B?WXhLSHAydlFHV0JpM3BEZ1FLZFFQUzd6andPNkphQjFMOEJEVmhwdjZw?= =?iso-2022-jp?B?WlBZNituRHNIMGFqRS8xdnJKci9PZUdTZVNvcExITkJ4djZnS0ZTM2l4?= =?iso-2022-jp?B?QTVMMkxjNXUvcHFEaTNjUXo4RytTT2w1MERoS25RaThIVUROWXA1Zm1h?= =?iso-2022-jp?B?b1J1ZXZhUWE3NEtuWjUzMA==?= Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nec.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TY1PR01MB1852.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 65dc42ad-df7b-4c6d-64af-08d8bb7d6745 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jan 2021 06:50:55.0102 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: e67df547-9d0d-4f4d-9161-51c6ed1f7d11 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: +aOX2JtSc1i/nk13d2qmoAh2qxJZqG6UoF4A/IekpH4pImQn6JBUTuzIonFmup9UgxEYUR4WLd5NVSQNu+1KCw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB5820 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 18, 2021 at 01:57:44PM +0800, Aili Yao wrote: > On Mon, 18 Jan 2021 05:15:55 +0000 > HORIGUCHI NAOYA(=1B$BKY8}!!D>Li=1B(B) wrote: >=20 > > Hi Aili, > >=20 > > On Fri, Jan 15, 2021 at 05:26:22PM +0800, Aili Yao wrote: > > > On Fri, 15 Jan 2021 09:49:24 +0100 > > > Oscar Salvador wrote: > > > =20 > > > > I am having a hard time trying to grasp what are you trying to achi= eve here. > > > > Could you elaborate some more? Ideally stating what is the problem = you are > > > > fixing here. > > > > =20 > > > Sorry for confusion, example: there are four process A,B,C,D,which ma= p the same file into > > > there process space, which set there PF_MCE_KILL_EARLY flag to TRUE, = if process A trigger one > > > UE with MF_ACTION_REQUIRED set, in current code, only process A will= be killed, B,C,D remain > > > alive, but for the PF_MCE_KILL_EARLY we set, we want B,C,D also be ki= lled. =20 > >=20 > > This behavior seems not to me what PF_MCE_KILL_EARLY intends. This fla= g > > controls whether memory error handler kills processes immediately or no= t, > > and it only affects action optional cases (i.e. called without > > MF_ACTION_REQUIRED). In MF_ACTION_REQUIRED case, we have no such choic= e > > and affected processes should be always killed immediately. > >=20 > > We may also need to consider the difference in context of these two cas= es. > > Action optional case is called asynchronously by background process lik= e > > memory scrubbing, so all processes mapping the error memory are the aff= ected > > ones. Action required event is more synchronous, and is called when a > > process experiences memory access errors on data load and instruction f= etch > > instructions. So the affected process in this case is only the process= . > > So I still think the this background justifies the current behavior. > >=20 > > But my knowledge might be old, if you have newer hardwares which define > > other type of memory error and that doesn't fit with current implementa= tion, > > I'd like to extend code to support the new cases, so please let me know= . > >=20 > Sorry, I don't fully get your concern. >=20 > For Action optional cases, It's may from CE storm or patrol scrub, ... hwpoison is not about corrected errors, but about uncorrected errors. CE st= orm should be handled by CMCI and userspace tool like mcelog, although it seems= not current main topic, sorry for nitpick. > when the process want to process this condition, > it will set PF_MCE_KILL_EARLY, and it will be signaled for such case. > For Action Required cases,we must do something, I think it's more urgent = and serious, In the current code, the process triggered the Error > Should be signaled. but the process with PF_MCE_KILL_EARLY won't get sign= aled, just because PF_MCE_KILL_EARLY is for action optional case? I don't use PF_MCE_KILL_EARLY to justify current code. Let me explain more. For action optional cases, one error event kills *only one* process. If an error page are shared by multiple processes, these processes will be killed by separate error events, each of which is triggered when each process trie= s to access the error memory. So these processes would be killed immediately when accessing the error, but you don't have to kill all at the same time (or actually you might not even have to kill it at all if the process exits finally without accessing the error later). Maybe the function variable "force_early" is named confusingly (it sounds that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). I'll submit a fix later. (I'll add your "Reported-by" because you made me find it, thank you.) >=20 > Action Required is for current we must handle, the same Action Required i= ssue is Action optional for non-current processes, Right? Right. > I don't think Action Required is for all processes, For current processes= , it may be AR, for other process, it may be AO, and they should also > be signaled, I think this behavior its reasonable.=20 >=20 > And we can't determine which error will be triggered, the PF_MCE_KILL_EAR= LY fLAG is meant to handle memory error gracefully and won't be restricted > to explicitly declared AO errors. >=20 > Thanks! Thank you, too. - Naoya=