From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB1C4C433ED for ; Mon, 10 May 2021 08:00:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5F7EB613B6 for ; Mon, 10 May 2021 08:00:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F7EB613B6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C2F4F6B0036; Mon, 10 May 2021 04:00:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDF516B0070; Mon, 10 May 2021 04:00:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A587D6B0071; Mon, 10 May 2021 04:00:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 865206B0036 for ; Mon, 10 May 2021 04:00:30 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3C0F2A77C for ; Mon, 10 May 2021 08:00:30 +0000 (UTC) X-FDA: 78124574220.34.B2F124A Received: from mail.kingsoft.com (unknown [114.255.44.146]) by imf23.hostedemail.com (Postfix) with ESMTP id AE6C9A0009F3 for ; Mon, 10 May 2021 08:00:16 +0000 (UTC) X-AuditID: 0a580157-bebff70000027901-3d-6098e8167e7f Received: from mail.kingsoft.com (localhost [10.88.1.79]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-1-NODE-87) with SMTP id 59.16.30977.618E8906; Mon, 10 May 2021 16:00:22 +0800 (HKT) Received: from alex-virtual-machine (10.88.1.103) by KSBJMAIL4.kingsoft.cn (10.88.1.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Mon, 10 May 2021 16:00:22 +0800 Date: Mon, 10 May 2021 16:00:21 +0800 From: Aili Yao To: "HORIGUCHI =?UTF-8?B?TkFPWUE=?=(=?UTF-8?B?5aCA5Y+j44CA55u05Lmf?=)" CC: Naoya Horiguchi , "linux-mm@kvack.org" , Tony Luck , Andrew Morton , Oscar Salvador , "David Hildenbrand" , Borislav Petkov , "Andy Lutomirski" , Jue Wang , "linux-kernel@vger.kernel.org" , "yaoaili126@gmail.com" Subject: Re: [PATCH v4 2/2] mm,hwpoison: send SIGBUS when the page has already been poisoned Message-ID: <20210510160021.648b41db@alex-virtual-machine> In-Reply-To: <20210510072128.GA3504859@hori.linux.bs1.fc.nec.co.jp> References: <20210427062953.2080293-1-nao.horiguchi@gmail.com> <20210427062953.2080293-3-nao.horiguchi@gmail.com> <20210507173852.0adc5cc4@alex-virtual-machine> <20210510072128.GA3504859@hori.linux.bs1.fc.nec.co.jp> Organization: kingsoft X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.88.1.103] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL4.kingsoft.cn (10.88.1.79) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprKIsWRmVeSWpSXmKPExsXCFcHoryv2YkaCwaTHOhZz1q9hs/i84R+b xdf1v5gtbt4ysbi8aw6bxb01/1ktzu9ay2qxat4dZouLjQcYLc5MK7J4c+Eei8Wz1qssDjwe 31v7WDx2zrrL7rFgU6nH4j0vmTw2repk89j0aRK7x4kZv1k8XlzdyOLxft9VNo/Np6s9Pm+S C+CO4rJJSc3JLEst0rdL4MpYuew6W8Eq44onMx6wNjDO0+hi5OSQEDCRmD6zn6WLkYtDSGA6 k8Sx2y9YIZznjBLT191hA6liEVCV2D9vITOIzQZk77o3ixXEFhFIklg8+ysTSAOzwGlmiYtN PUwgCWGBOIlHa36zdzFycPAKWElMmu8GEuYUcJboO7GbHWLBXUaJqRuvgdXzC4hJ9F75zwRS LyFgL/F4vSJImFdAUOLkzCcsIDazgKZE63aQkSC2tsSyha/B7hESUJQ4vOQXO8Q38hJ3f09n hLBjJZoO3GKbwCg8C8moWUhGzUIyagEj8ypGluLcdMNNjJCYC9/BOK/po94hRiYOxkOMEhzM SiK8oh3TEoR4UxIrq1KL8uOLSnNSiw8xSnOwKInzKm2ZkSAkkJ5YkpqdmlqQWgSTZeLglGpg cpq1Vvh//PM7Ke6Wob6m3+89XenyOcyiZB9PcqB84oFPXDuDai/EfQ8rvtAZrJccnim5+47o /gspbWHZa6054jfO15Zr+Nn5vPLdD9sj1y/Ytc2aqzFx5trbPIv6+Ct1HN8HH/bW3638f09R VMh5tVX8gs/fWWRFxaVMLp9z2GIGD4/SFE5znsiGtxO5z26LXMZme9H6zX2vB5ITGJTEHzvu WbyEU8689u+/+zeDZ3p3XXm8jLO5NbtT821SieRUDjbWDf4Zh7IOL7Rcfa50ovoOse8/M7b7 7rqRsyh49gHZHP5M3dZdnMtLrXeyls5inB60zfx1p5OCodXCha0PN8oUlEXp/ml/vGK/9JwY JZbijERDLeai4kQA5OcA2igDAAA= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: AE6C9A0009F3 Authentication-Results: imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of yaoaili@kingsoft.com designates 114.255.44.146 as permitted sender) smtp.mailfrom=yaoaili@kingsoft.com; dmarc=none X-Stat-Signature: 7exe1xh99mbxjb7c9ayzuqiyfbubw1hf Received-SPF: none (kingsoft.com>: No applicable sender policy available) receiver=imf23; identity=mailfrom; envelope-from=""; helo=mail.kingsoft.com; client-ip=114.255.44.146 X-HE-DKIM-Result: none/none X-HE-Tag: 1620633616-613023 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 10 May 2021 07:21:28 +0000 HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F) wrote: > On Fri, May 07, 2021 at 05:38:52PM +0800, Aili Yao wrote: > > On Tue, 27 Apr 2021 15:29:53 +0900 > > Naoya Horiguchi wrote: > > =20 > > > From: Naoya Horiguchi > > >=20 > > > When memory_failure() is called with MF_ACTION_REQUIRED on the > > > page that has already been hwpoisoned, memory_failure() could fail > > > to send SIGBUS to the affected process, which results in infinite > > > loop of MCEs. > > >=20 > > > Currently memory_failure() returns 0 if it's called for already > > > hwpoisoned page, then the caller, kill_me_maybe(), could return > > > without sending SIGBUS to current process. An action required MCE > > > is raised when the current process accesses to the broken memory, > > > so no SIGBUS means that the current process continues to run and > > > access to the error page again soon, so running into MCE loop. > > >=20 > > > This issue can arise for example in the following scenarios: > > >=20 > > > - Two or more threads access to the poisoned page concurrently. > > > If local MCE is enabled, MCE handler independently handles the > > > MCE events. So there's a race among MCE events, and the > > > second or latter threads fall into the situation in question. > > >=20 > > > - If there was a precedent memory error event and memory_failure() > > > for the event failed to unmap the error page for some reason, > > > the subsequent memory access to the error page triggers the > > > MCE loop situation. > > >=20 > > > To fix the issue, make memory_failure() return some error code when t= he > > > error page has already been hwpoisoned. This allows memory error > > > handler to control how it sends signals to userspace. And make sure > > > that any process touching a hwpoisoned page should get a SIGBUS (if > > > possible) with the error virtual address, even in "already hwpoisoned" > > > path of memory_failure() as is done in page fault path. > > >=20 > > > kill_accessing_process() does pagetable walk to find the error virtual > > > address. If multiple virtual addresses are found in the pagetable wa= lk, > > > no one knows which address is the correct one, so we fall back to sen= ding > > > SIGBUS in kill_me_maybe() without error address info as we do now. > > > This corner case is left to be solved in the future. > > >=20 > > > Signed-off-by: Naoya Horiguchi =20 > >=20 > > Sorry for my late response, I just get time to rethink the pagewalk pat= ch. Please let me share my thoughts,=20 > > If anything wrong, just point out, thanks! =20 >=20 > Thank you for the feedback. >=20 > >=20 > > This whole pagewalk patch is meant to fix invalid virtual address along= SIGBUS, For this invalid virtual address issue, > > It seems this is one existing issue before this race issue is posted. w= hile the issue is not fixed for a long time. > >=20 > > Then I think why this issue is not fixed, maybe just no process will ca= re this virtual address as it will be killed. > > Maybe virtual guest will need this address to forward it to vCPU, but u= ntill now the memory recovery function in the VM doesn't > > work at all, and without this address, It seems not a big impact though. > >=20 > > Maybe there are some other cases will care the virtual address, if anyo= ne knows, just point out. > >=20 > > But invalid virtual address is still no good. > >=20 > > Before this, I post one RFC patch try to fix this issue with one knowin= g issue:it failed for mutiple pte entry; > > Then this patch is posted trying to address this. > >=20 > > First I read this patch, I think this method is good and right and i te= st it. But now I think it again, I am wondering even the process > > have multi pte entry and wrong virtuall address, but it still pointing = to the same page, right? =20 >=20 > Yes, it is. >=20 > > If the process won't exit and get the wrong virtual address, what wrong= action will it do? =20 >=20 > I have no clear idea. Typical action for the SIGBUS is to kill the proce= ss with > some logging, so the obviously wrong action like killing wrong process ne= ver happens. > A possible wrong result is invalid address in log, which might not be cri= tical. >=20 > > while I can just think the virtual machine example, but the qemu will t= ranslate the wrong virtual address to right guest physical address?=20 > > I am not sure VM will have multi pte entry? =20 >=20 > As long as I know, qemu maintains one-to-one mapping between host virtual= address > and guest physical address, so no multi entry issue should happen around = qemu. >=20 > >=20 > > And I think the virtual address along SIGBUS is not mean to backtrace t= he code, it just want to tell where the error memory is, for multi pte > > entry, one virtual address for the same physical page is not enough? > >=20 > > Compare this patch with my RFC patch, difference: > > 1.This patch will just fix the race issue's invalid virtual address. wh= ile my RFC patch will cover all the error case for recovery; > > 2.For multi entry, this patch will do one force_sig with no other infom= ation, But the RFC patch will take one possible right address, I don't know= which one is better. > >=20 > > And if this multi pte entry is one real issue, it seems the normal reco= vey work will aslo trigger this, would it be better to fix that first? =20 >=20 > Assuming that your RFC is https://lore.kernel.org/lkml/20210317162304.58f= f188c@alex-virtual-machine/, > it simply uses the first-found virtual address. I start thinking that th= is > approach could be fine. And it's easy to change the patch with this appr= oach. > I have no preference, so if you like, I switch to the "first-found" appro= ach. Hi Naoya: Thanks for your reply! Yes, you can change to that RFC approach, but there may be some un-indent= ified issuees, and need more considerations though. And there may be other method to address this, you can also dig into that= , get it realized and posted. I am OK with any option. But for here, From the beginning, I thinks the invalid address issue and = race issue are two different issues, may have some relationship but still two issues in my mind. whould you please seperate this series patches into three again? Great Thanks! Aili Yao!