From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.6 required=3.0 tests=BAYES_00, CHARSET_FARAWAY_HEADER,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,PDS_BAD_THREAD_QP_64, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17642C433E0 for ; Tue, 19 Jan 2021 05:25:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 52320206DC for ; Tue, 19 Jan 2021 05:25:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 52320206DC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=nec.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 88F228D0031; Tue, 19 Jan 2021 00:25:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 818AD8D0030; Tue, 19 Jan 2021 00:25:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B9638D0031; Tue, 19 Jan 2021 00:25:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id 536CF8D0030 for ; Tue, 19 Jan 2021 00:25:44 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 280A0364D for ; Tue, 19 Jan 2021 05:25:44 +0000 (UTC) X-FDA: 77721387408.05.way67_5001fdb2754f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 0B4B81802B4C3 for ; Tue, 19 Jan 2021 05:25:44 +0000 (UTC) X-HE-Tag: way67_5001fdb2754f X-Filterd-Recvd-Size: 10707 Received: from JPN01-TY1-obe.outbound.protection.outlook.com (mail-eopbgr1400070.outbound.protection.outlook.com [40.107.140.70]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Tue, 19 Jan 2021 05:25:42 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eS79lSA9Isf9o3NNoaVdRoeD9lza/Gye5foqnV0KtyYFEMgIra1OPywrp6l/3LGDqP/s4XKX0pwSopwPha6ydNdbu0uCpceRyPCo1kGF6tHBD8Hjnn0sGYErKsefNp5qayjw6psKTdZpPP8t0ETMeAbc7AYkZxrj3Osx6ENIGb68FnL+LtPxvziWsvyE7L0jNEtpHvaBSsdsiHAA3VeCjnQTsiTMuzQjd4BRIIybiVWAVQvrU0QkMqh5iwMETiKVLKPfkkBbC6fphtrs2MO0LPntj1wiUuiEni022gXTDUheJZOxh/Ji6TOq3tzrKRvrNLPNd7W23VejEj+xkhR7Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QakqLnNXBPX1oapHivxV5NYOM86L5mxSAdZSNf2AdiA=; b=CWpln40AjlMk07ew5yTmyO4+O/xH+jUNbS66ZwIpqUkmvEaY8s0hqbclxG8ztneMiTIjw/87tzBDnZY4ygTudp09O5vMykpoK5KwmaQkj9X5xzaQDpQkQ/Cb+2ZcoRbv5Z4Tloelpq62OMeiR0TVmHU6Z0bWjmkRaVYDDxXxW0hqjcCVb07imLQneiMfJK0fLoklBpV41nQWlwHhYXXEHwtOUFU9z0kAniJT+q+QbFcVyMXuRYnVNvsrCDYik4uA0XArFCHRFnkfmJy/YNxKtd31sgqc4CoFU1zXAl9d9OVYZQHV8oxDGY7KYZ6E2ytIbXjdhheJGFS/1LbV6ZysKQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nec.com; dmarc=pass action=none header.from=nec.com; dkim=pass header.d=nec.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nec.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QakqLnNXBPX1oapHivxV5NYOM86L5mxSAdZSNf2AdiA=; b=jpmdC4nR134BY52kZaT2nilpTeil149S6XxDH37HctTyUg3BlUTZUjv0GqvXksKqFB3GEV7BxftgaCmWBAUEab0h6Rl55MMOo+ofXyTwoINEeaOrRKZ3HKUjhRVTT6u8z/K36zzA3yYn54uJn16d1W9FtXZtyJ+rRODf1HNku4M= Received: from TY1PR01MB1852.jpnprd01.prod.outlook.com (2603:1096:403:8::12) by TYCPR01MB5869.jpnprd01.prod.outlook.com (2603:1096:400:a::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.11; Tue, 19 Jan 2021 05:25:38 +0000 Received: from TY1PR01MB1852.jpnprd01.prod.outlook.com ([fe80::8453:2ddb:cf2b:d244]) by TY1PR01MB1852.jpnprd01.prod.outlook.com ([fe80::8453:2ddb:cf2b:d244%7]) with mapi id 15.20.3763.013; Tue, 19 Jan 2021 05:25:38 +0000 From: =?iso-2022-jp?B?SE9SSUdVQ0hJIE5BT1lBKBskQktZOH0hIUQ+TGkbKEIp?= To: Aili Yao CC: Oscar Salvador , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Thread-Topic: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Thread-Index: AQHW6xPOLHMPPWnvkk2J5KCsMbNbvqooYH4AgAAKVACABHEEgIAAC68AgAAO2wCAABeOAIAAC+aAgAADIgCAAVPrgA== Date: Tue, 19 Jan 2021 05:25:38 +0000 Message-ID: <20210119052537.GA1642@hori.linux.bs1.fc.nec.co.jp> References: <20210115155506.2d59fe83.yaoaili@kingsoft.com> <20210115084920.GA4092@linux> <20210115172622.699d68e5.yaoaili@kingsoft.com> <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> <20210118135744.7413cd06.yaoaili@kingsoft.com> <20210118065054.GA7447@hori.linux.bs1.fc.nec.co.jp> <20210118161512.701c94e7.yaoaili@kingsoft.com> <20210118085747.GA904@hori.linux.bs1.fc.nec.co.jp> <20210118170900.6fe9595a.yaoaili@kingsoft.com> In-Reply-To: <20210118170900.6fe9595a.yaoaili@kingsoft.com> Accept-Language: ja-JP, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: kingsoft.com; dkim=none (message not signed) header.d=none;kingsoft.com; dmarc=none action=none header.from=nec.com; x-originating-ip: [165.225.110.205] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 96b4e93a-9fd4-4e6d-def1-08d8bc3aa7de x-ms-traffictypediagnostic: TYCPR01MB5869: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:7219; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: xJOpzGn8bZY7hM5ZyO25T5SVB/k9AgNgeNC/wFY4TJU2t00Rm65Wh0lYRfQ945jbGUTKigH2FpkQ7TdbVuREoKpvPTtPiEJJE2G6d5Q6rLJ2Db1bRN3hqmWd8O+RtCVDZGgS59u7NoAQqK4Ki4RHagkFBIU/+66d3lhljTAPtDMQ+cj2yU5LHXzOvHM5mGrVW8QIycq0EcLN1A7qQhbqpNocMBsbuE8DQP9Z+bKfFGVgnFCbayBJLx2J8FSJweBBBg+Fj4GxaQyBjjhHCAcmJj8HsMzgrU2+sBRsw61Su1moiMZH1vMekK5ebo0XizmkTqHiY4BTohFlD/Dcha5sSfBMy0ttCZB0ojJY5b+Ty501ghXAuuRmg6VSxWzkaPEG1DegnWDd2oVYOuivb6rsqQ== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:TY1PR01MB1852.jpnprd01.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(396003)(346002)(366004)(376002)(136003)(39860400002)(6512007)(6486002)(26005)(8676002)(71200400001)(5660300002)(9686003)(2906002)(86362001)(316002)(8936002)(54906003)(33656002)(83380400001)(4326008)(6506007)(55236004)(66556008)(64756008)(1076003)(478600001)(66476007)(66446008)(6916009)(76116006)(186003)(85182001)(66946007);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-2022-jp?B?clZjc2lzRnFUSXZBUzlHaFV4dE5mUXQxY2FPYlJLcjZHaXYxOStRTHhL?= =?iso-2022-jp?B?ODFiRjZxanRuc3VDZ1ZzSHlBUSs3cGF0OVArdUF5REhOWXJOYW1PRjNp?= =?iso-2022-jp?B?R0lSU2JQUGZKU0xHd0NLQUNVYnZwMmlYSTJNRy9BTi83dkVHaVpkVjVX?= =?iso-2022-jp?B?MkN4WDFWNmkvMTBOelRzWFV4Y0hzZXM0cnN4M3pVdHFqTFAzL3daUDdW?= =?iso-2022-jp?B?ZitoUndQYkJLYTRDa1ArSjUzazJpRFhhSk5ScWU0RFZlNlZWYVFha0Ew?= =?iso-2022-jp?B?RC9sYjJZZXRBY3hKaHB4VnoxMHNFMDBPTzBJYzlLb1YwVzRzb2c4Z0R2?= =?iso-2022-jp?B?cTZUc1NYMkJybjNkOTZ3K3FUcW9LV1VYMExkZVJlUEJRTFZ4ZGNFRkFZ?= =?iso-2022-jp?B?ZEJEWkFUWGgvV0wrU3JnQk51WXQzamxGb0V0bFFNeXoyTVJzTXJvTWRB?= =?iso-2022-jp?B?NnZ0ZTFiRGlKMnBPQjRnN1MwaG5CSnM4VlVVT1FMOTNTMUltakprcnNL?= =?iso-2022-jp?B?ZUJwdS9vUW1LQjR3OTRRRXBYc0pWZkxYK1REa2E5bkNaVGJ6cXlCSzFP?= =?iso-2022-jp?B?Zk5QVnY1VmZkMGlFZmlWTjFrb2gwek0zQzlZalp1Y3JXem1HZlJSU3RB?= =?iso-2022-jp?B?M0t1d3RBcGZtRTZpazRpVHZmMnVtcmJjQ2dqaWhobmV3Tk9UWGFkSHgw?= =?iso-2022-jp?B?S2tmRUROOXJia1pJMnhVWFVCOUk3YzVaTFQ0TVFvc3hjZnFNQWlVVjVV?= =?iso-2022-jp?B?L3p5bVd6ODFzNXdvSDA5Z0RqY1ZDYk45R0ZJeUxKb05nZ0ZiaGFHV0F3?= =?iso-2022-jp?B?Mlk1QktMQU9iVGRFc1lhNU9kaWRITFk5Q2Zrck1oamNhTlhMWFRkYXhT?= =?iso-2022-jp?B?cnpFRjMwM09vMVpSTDQ1SkNpZVZ1dmxmTGt0ZEltRjdWaWdDYkVoQWdt?= =?iso-2022-jp?B?WEZlRHdIb3hxODk4cFBNVTdudkZzVGNSRTBFNmZQMlRJd2JYRVFBQmZu?= =?iso-2022-jp?B?bDNpOUYybVhhZkZ4SFBmbGFBcVc3YUNjdjRFM0tUU2FzMmdyd3dDK2Fl?= =?iso-2022-jp?B?TVVlemo1WVZobmVMMkxJUUJONmhYcmtQOUZpQ205aUZ0eVlFTXlTYUlC?= =?iso-2022-jp?B?a1hkWXY5U24vUjRWazZYQkpVTEtnMzkxM2NTY3NFOHpwTFZKb2haREtL?= =?iso-2022-jp?B?RmhERU9DSFpROUZvUXB1bXZxcjVpcGxSQmhXaWtJY29Jem02bC9wMnVa?= =?iso-2022-jp?B?azVkMXBBWEc1czF1K2xYUTVSZGk2UkhhMHlUdGhWTXhkRjZMUmVRRmto?= =?iso-2022-jp?B?aFpQL1llemsveGFQeUp6SkhqOGR3R3RrYTZwTmlrVVkxWS8wUFZ2TmFI?= =?iso-2022-jp?B?ckFyekxVdWNNMlhUaXY1WnZ2RXo0VHNVTEthNldOTzcvMFhkS0J1S3Fa?= =?iso-2022-jp?B?NWRUSFpHOGh2aFNOKy80Rw==?= Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nec.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TY1PR01MB1852.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 96b4e93a-9fd4-4e6d-def1-08d8bc3aa7de X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Jan 2021 05:25:38.2921 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: e67df547-9d0d-4f4d-9161-51c6ed1f7d11 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ci+qMdmN34BlTFFnsPddGq8RbT3JCk324Y+DWwtCzk8ce0Gir1aEjh4MUCb6rA8PFNmOAcSFNIr1prjn4Z+NOw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYCPR01MB5869 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000018, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 18, 2021 at 05:09:00PM +0800, Aili Yao wrote: > On Mon, 18 Jan 2021 08:57:47 +0000 > HORIGUCHI NAOYA(=1B$BKY8}!!D>Li=1B(B) wrote: >=20 > > > >=20 > > > > For action optional cases, one error event kills *only one* process= . If an > > > > error page are shared by multiple processes, these processes will b= e killed > > > > by separate error events, each of which is triggered when each proc= ess tries > > > > to access the error memory. So these processes would be killed imm= ediately > > > > when accessing the error, but you don't have to kill all at the sam= e time > > > > (or actually you might not even have to kill it at all if the proce= ss exits > > > > finally without accessing the error later). > > > >=20 > > > > Maybe the function variable "force_early" is named confusingly (it = sounds > > > > that it's related to PF_MCE_KILL_EARLY flag, but that's incorrect). > > > > I'll submit a fix later. (I'll add your "Reported-by" because you = made me > > > > find it, thank you.) > > > > =20 > > > I think we should do more for non current process error case, we shou= ld mark it AO for processes to be signaled > > > or we may take wrong action. =20 > >=20 > > I'm not sure what you mean by "non current process error case" and "we > > should mark it AO", so could you explain more specifically about your e= rror > > scenario? =20 > I will share my test code and i will submit another patch to this scena= rio. > please give me some time, thanks! > And I think you are right, AR is only current process. >=20 > > Especially I'd like to know about who triggers hard offline on > > what hardware events and what "wrong action" could happen. Maybe just > > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, becau= se > > it's not enough for us to see that your scenario is possible. Current > > implementation implicitly assumes some hardware behavior, and does not = work > > for the case which never happens under the assumption. > >=20 > This action is from mcelog daemon, normally softpage offlie is default,= but we can configure > hardpage offline for CE storms, to get related processes signaled. Thanks, so which interface did you use for error injection? I guess first you used /sys/devices/system/memory/hard_offline_page, but if it's true, then the error event should be action optional (no MF_ACTION_REQUIRED set). So now I'm wondering why you are observing action required events? My another guess is that you might have used mce-inject tool, if that's tru= e, please use hard_offline_page, then current kernel code should properly send SIGBUS to dedicated process. - Naoya=