From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F75EC3064D for ; Wed, 26 Jun 2024 06:03:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22B3A6B008C; Wed, 26 Jun 2024 02:03:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DB586B0093; Wed, 26 Jun 2024 02:03:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 054386B0095; Wed, 26 Jun 2024 02:03:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DA1B26B008C for ; Wed, 26 Jun 2024 02:03:19 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5C99B1C1BAB for ; Wed, 26 Jun 2024 06:03:19 +0000 (UTC) X-FDA: 82271997318.24.CF8A694 Received: from esa8.hc1455-7.c3s2.iphmx.com (esa8.hc1455-7.c3s2.iphmx.com [139.138.61.253]) by imf27.hostedemail.com (Postfix) with ESMTP id E2BC440004 for ; Wed, 26 Jun 2024 06:03:15 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=CR6k8gxD; spf=pass (imf27.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 139.138.61.253 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com; dmarc=pass (policy=quarantine) header.from=fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719381788; a=rsa-sha256; cv=none; b=r4uJ+wWk1nQU06b6KxwNc6kjKghrwAicGtcvXE10XdnD1O2u/Ssr0ZrM1oGEiRuxi1ecM4 7161Bd7EBq9ViWBlBEqeUWdN1kOMmDkIPMXD6b/Qd8t9RHpi2fVmk3m1l9Xdwms76x9W7c 3AbTckRtBiRiLJ00djilumM7TmGu/hQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=CR6k8gxD; spf=pass (imf27.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 139.138.61.253 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com; dmarc=pass (policy=quarantine) header.from=fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719381788; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gBTapXvRPbj6HDhbZ/rA5KTqK1VDNG/t+kpOu+3Cah8=; b=tcV1247Wn49cGkJk7h5/M4B2hvQXc/CYVQetc74agyJEOjCzr14U39C98hAG0n14NLwAdl F7Iye+iD/NnyXWASzVxg9ZkJGW69BjmbSMfTWXmhfwHxfWRJkKHtIF/bhT4Q63yMX/l+Fy WlbYeTeNQu7VudXpVESeRiAiQUa5A8I= DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1719381795; x=1750917795; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=m8YTY6GpEs3iSnFV8kd7Br7h+HDW2hkTnhnYCYUuL6g=; b=CR6k8gxDLIurrRy5qUTi5mEJsDIej0y3cXu6nHu8zlab+UTnwOQpH0s0 NV34+ncKWc0bBShQyekHnKGqke28K2wQo7J4PcjGjRMw4Y/1EgYRSVL23 E0I/5hHPUnt3aYLtqQqUyrgoM78JwU16kAMMmUSWTek+Gth1EL9o8fil9 oZJXjFzQaYMZOCUYthOu+7wll7MPnxhO46SV0O8mer0zddmyFq2CI5eX+ 994xFoEOGbWqRRRSxs6uTCItqfj6/xxNGI2zLGCoF/P/58XTYvKnW79le 4CgRBsN2cZh/sOToG9RZJwKCe4c3lBXZ4ja/Nol/wHdY6pRN+tbivy78S Q==; X-IronPort-AV: E=McAfee;i="6700,10204,11114"; a="152976454" X-IronPort-AV: E=Sophos;i="6.08,266,1712588400"; d="scan'208";a="152976454" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa8.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2024 15:03:13 +0900 Received: from yto-m2.gw.nic.fujitsu.com (yto-nat-yto-m2.gw.nic.fujitsu.com [192.168.83.65]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 81C31D4F56 for ; Wed, 26 Jun 2024 15:03:08 +0900 (JST) Received: from kws-ab4.gw.nic.fujitsu.com (kws-ab4.gw.nic.fujitsu.com [192.51.206.22]) by yto-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id 554D8D510D for ; Wed, 26 Jun 2024 15:03:06 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab4.gw.nic.fujitsu.com (Postfix) with ESMTP id D23831EAB1E for ; Wed, 26 Jun 2024 15:03:05 +0900 (JST) Received: from [192.168.50.5] (unknown [10.167.226.114]) by edo.cn.fujitsu.com (Postfix) with ESMTP id 087111A0002; Wed, 26 Jun 2024 14:03:03 +0800 (CST) Message-ID: Date: Wed, 26 Jun 2024 14:03:03 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device To: "Luck, Tony" , Jonathan Cameron , "Williams, Dan J" Cc: "qemu-devel@nongnu.org" , "linux-cxl@vger.kernel.org" , "dave@stgolabs.net" , "Weiny, Ira" , "Schofield, Alison" , "Jiang, Dave" , "Verma, Vishal L" , Borislav Petkov , James Morse , Mauro Carvalho Chehab , Robert Richter , "linux-edac@vger.kernel.org" , Miaohe Lin , Naoya Horiguchi , "linux-mm@kvack.org" References: <20240618165310.877974-1-ruansy.fnst@fujitsu.com> <20240620180239.00004d41@Huawei.com> <6675bf92116ed_57ac294a@dwillia2-xfh.jf.intel.com.notmuch> <20240621194506.000024aa@Huawei.com> From: Shiyang Ruan In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28482.005 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28482.005 X-TMASE-Result: 10--14.782900-10.000000 X-TMASE-MatchedRID: bdIiGNtle6uPvrMjLFD6eKn9fPsu8s0a2q80vLACqaeqvcIF1TcLYPAF 43IXaj2gSY/hjDx7hppvUDqCNlsvKH+zsg6kp2C3Q0Xm0pWWLkroUwvpyt4rucg9ufahCGm1l2i SdQmYgPCf4Zlhm+r+lc5cp47XA8AiC9QTSuTOQRl+J3gtIe0gA8qspZV+lCSLdBaEtWosUzVYTF /5quaSLwftggnq5tKUMTii0wFdgxqOeQ6RXnGCFkX/j4QZJ10NajzNTFMlQCNtfzoljzPXO9F8e 0i2JFlZ371UTvxX45vRKmOlruuzzop+5WdOMDCgv8fLAX0P50B2ZYwNBqM6IlLvEapiw2T1hXAr +h4GfTAIZNHliKo/PSm+XCxBE3RsKgAlgjPhYpaOtWfhyZ77Dn0tCKdnhB581B0Hk1Q1KyIOsEC O9s+GHnQdJ7XfU86eOwBXM346/+z07YdcTiNsP7Uv9Q5rrJhWezfWWH34ZgZxRwXGk1PHIsR47n 50KUDY X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-Stat-Signature: 4qfmsfnjz8kk81xfrcdb8mkicwankwuz X-Rspamd-Queue-Id: E2BC440004 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719381795-298870 X-HE-Meta: U2FsdGVkX18rkDWYU6mInNNwCOafWF4hdnqnCOioLpXPV02GbawCDjmBxuiwORtiBsGHzr1PuulwugX10w42Dq3DzJL+4rEaiE9R2sAS/fvPauIVtTsOm/tH8aV3gK66F4xy5EDKKta/LAZV4fJ/tLNmUb1GWVNkN4W1li1MNPSkleG5xYwqwVDQ/uDLDHxvlZlgo6uqNtMxoTFWSTap0xOkofg5nVL1cosz9rfiG25VKUly07v6qACip7f3GrcbhzEb61QmtU9chbZ8uyzOGSWNwNqnpLl92P9zMituCdVjhoqeUw+f7XOB82+0O1ckcKwdUytBnKhMia9cg8aAoL8v2b28XwrZVduMxPhhQmSAt6d1wjj8hPTKWIQxSxnIiv5RRSh9x6bxm0enUVKgFZwDQ6ubI7m7oCDKwsY+tFMt6962LQIcn56MVfUhLJSqRe8bn1qZPBAUOuOC5vsB2jgA7HW5HLf6DXkc/LSxj423JH//0CwdQ6FpHtpxmWR/kzsT5+1AQ4TfZNgpUzMIkeumCjLxO8UpFbEvKv3wRDzLJMaHS8iHT6lUak27qOrW7betydr1PBGew0Rs7UoNOj4/oQ1Sq2UaWDJ7hcC/fUOAYGuBXI/XV5lWwz0iT+KJK+44r5Y2zSyHaeJunA+SeTzNhbGrbOWxFQIYlsmVvNui4ViKuZTK0h/tOkfX8uhRqWfZ9VRU5BCU8hgSYt9RL9yKsuMCFdNKTjBErFv62036dFq5/OdeM7DVFjkiOMoTWzeCt5O/cUTlhjYTBmLCqApQz2gYG/YjLmF+gtHik1X+Go8Rz7Q2UxyJX7w9yvMQE4+rASUtyTlX1Pf/+2Aio3AotpLUavN+f+viXKw+GizE88r6zyE0g6s/UtKV7PXYOvDN4At0XuXOWsg1yp5dTZS+Gb81ishstF+co2DqmDoKT4P5W0zy2VgQ+rLFC4A+R/a7c57uofIRSlg/JD4 AVMyswCr VxMPn17ijlpvDmCJBwZDphXHp150kUm/rQYkFRmcW06pGrUNqgbrzx0Dxdk2ccxtVDafn5cGI+yEN4tqDlPXjBJE09NdBIfy9LKAVUdKIqspucyzs4NXSoTaOc2KkXkktU1eiqqLTvKSv2cK4zX8ICaAKT8ILgRqzlKlLFXDfNG4w6FNrthoYw/R/RaitUKAPQXpbcpjQTUBEQYjSiIMlCCDUhPQGeonUTMd7lDE0xl+PzpYCQi6pQHkeT4T5Rgoh8eRoiw3PX28W9uQ7qaNUrNg6tpLuz1gbCcb5QvrQGlbiO5XD81tHiofp8AMtcVqlfoCHFERk75o+xykG9hcN2us/nrN1J6F/juZgMnhGPXqKDKQK+9AA3olS4gBxsS4WB2T6IZ1eXPdZMjUQ8225h8Dm51sEtVy0pgEV X-Bogosity: Ham, tests=bogofilter, spamicity=0.002404, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/6/22 4:44, Luck, Tony 写道: >> So who actually cares about recovering poisoned volatile memory? >> I'd like to understand more on how significant a use case this is. >> Whilst I can conjecture that its an extreme case of wanting to avoid >> loosing the ability to create 1GiB or larger pages due to poison >> is that a real problem for anyone today? Note this is just the case >> where you've reached an actual uncorrectable error and probably >> / possibly killed something, not the more common soft offlining >> of memory due to correctable errors being detected. > > I guess you really need a reply from someone with a data center > with thousands of machines, since that's where this question > may be important. > > My humble opinion is that, outside of the huge page issue, nobody > should try to recover a poisoned page. Systems that can report > and recover from poison have tens, hundreds, or more GBytes > of memory. Dropping 4K pages will not have any measurable > impact on a system (even if there are hundreds of pages dropped). > > There's no reliable way to determine whether the poisoned page > was due to some transient issue, or a permanent defect. Recovering > a poisoned page runs the risk that the poison will re-occur. Perhaps > next use of the page will be in some unrecoverable (kernel) context. > > So recovery has some risk, but very little upside benefit. Since the hardware provides the instruction(CPU)/command(CXL) to clear the poison, we could make the function work, at least as an optional feature. Then users could decide to use it or not after evaluating the risk and benefit. I think doing recovery is an improvement step, and may need a lot of discussion. I'm not sure if we could reach a conclusion in this thread. Just hope more comments on the original problem (duplicate report) to solve in this patch. -- Thanks, Ruan. > > -Tony