From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71B1BC433DB for ; Mon, 18 Jan 2021 09:24:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 050AB22240 for ; Mon, 18 Jan 2021 09:24:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 050AB22240 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6BD5F8D000A; Mon, 18 Jan 2021 04:24:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 66EA68D0007; Mon, 18 Jan 2021 04:24:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ABE88D000A; Mon, 18 Jan 2021 04:24:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id 437108D0007 for ; Mon, 18 Jan 2021 04:24:28 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0FCB19999 for ; Mon, 18 Jan 2021 09:24:28 +0000 (UTC) X-FDA: 77718360216.09.food16_1f1101527548 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id E183F180AD830 for ; Mon, 18 Jan 2021 09:24:27 +0000 (UTC) X-HE-Tag: food16_1f1101527548 X-Filterd-Recvd-Size: 4032 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Mon, 18 Jan 2021 09:24:27 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 38D90B247; Mon, 18 Jan 2021 09:24:26 +0000 (UTC) Date: Mon, 18 Jan 2021 10:24:23 +0100 From: Oscar Salvador To: HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+j44CA55u05LmfKQ==?= Cc: Aili Yao , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" Subject: Re: [PATCH] mm,hwpoison: non-current task should be checked early_kill for force_early Message-ID: <20210118092419.GA4234@linux> References: <20210115155506.2d59fe83.yaoaili@kingsoft.com> <20210115084920.GA4092@linux> <20210115172622.699d68e5.yaoaili@kingsoft.com> <20210118051555.GA3585@hori.linux.bs1.fc.nec.co.jp> <20210118135744.7413cd06.yaoaili@kingsoft.com> <20210118065054.GA7447@hori.linux.bs1.fc.nec.co.jp> <20210118161512.701c94e7.yaoaili@kingsoft.com> <20210118085747.GA904@hori.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210118085747.GA904@hori.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.10.1 (2018-07-13) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 18, 2021 at 08:57:47AM +0000, HORIGUCHI NAOYA(=E5=A0=80=E5=8F= =A3 =E7=9B=B4=E4=B9=9F) wrote: > I'm not sure what you mean by "non current process error case" and "we > should mark it AO", so could you explain more specifically about your e= rror > scenario? Especially I'd like to know about who triggers hard offline = on > what hardware events and what "wrong action" could happen. Maybe just > "calling memory_failure() with MF_ACTION_REQUIRED" is not enough, becau= se > it's not enough for us to see that your scenario is possible. Current > implementation implicitly assumes some hardware behavior, and does not = work > for the case which never happens under the assumption. So, the scenario case is a multithread application with the same page map= ped. And PF_MCE_KILL_EARLY flag was set. IIUIC, Aili Yao concern is that when the MCE machinery calls memory_failu= re which MF_ACTION_REQUIRED, only the process that triggered the MCE excepti= on will receive a SIGBUG, and not the other threads that might have PF_MCE_E= ARLY. Aili Yao would like memory_failure() to also signal those threads who mig= ht have the flag set, in case they want to do something with that informatio= n. But reading the code, I do not think that is what the code expects. Looking at the comment above find_early_kill_thread: "/* * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_= AO) * on behalf of the thread group. Return task_struct of the (first found) * dedicated thread if found, and return NULL otherwise. * * We already hold read_lock(&tasklist_lock) in the caller, so we don't * have to call rcu_read_lock/unlock() in this function. */" What I understand from that is: " If memory_failure() was not triggered by any concrete process (aka: no o= ne was trying to manipulate the corrupted area), we need to find the main threa= d who might have set the MCE policy by pcrtl and see if they want to be signal= ed __before__ they access the corrupted area. =20 " Note that if the PF_MCE policy was not set, we check the global knob sysctm_memory_early_kill. And if that is not set either, we defer the signaling till later when a p= rocess actually tries to operate the corrupted area. Does that makes sense? Actually, unless I am mistaken, if a multithread process receives a signa= l, all threads belonging to the process will receive the signal as well: "The signal disposition is a per-process attribute: in a multithreaded application, the disposition of a particular signal is the same for all threads." --=20 Oscar Salvador SUSE L3