From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EFE1C5472F for ; Tue, 27 Aug 2024 15:53:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E28FF6B008C; Tue, 27 Aug 2024 11:53:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB22C6B0092; Tue, 27 Aug 2024 11:53:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2BC06B0093; Tue, 27 Aug 2024 11:53:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 97BF56B008C for ; Tue, 27 Aug 2024 11:53:01 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 46D37A1796 for ; Tue, 27 Aug 2024 15:53:01 +0000 (UTC) X-FDA: 82498468962.25.4E2D802 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf24.hostedemail.com (Postfix) with ESMTP id 461D718001B for ; Tue, 27 Aug 2024 15:52:59 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf24.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724773909; a=rsa-sha256; cv=none; b=Lrl1f1Wpj5LpJtju4ZziaDfkohjiPr57No/YSY/ekH1pplIiZsRhJ7qxrBjfVV37tZgGm8 KWdfIEbExtuwq4sgMDxQhjLGh3aPGJ8HFwmS6gqSkF0Gd2VavYTBR7qs8ude4rQjJSlHRK lbnOk0AfHuU34CXYa8JitAmCwL3AGcw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf24.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724773909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G2wLrv+xg6Iax86Qjnw9T3Nepc3SKq/omC7RKzaF6wk=; b=1fN7vEeLZS/+bfbUzppm1eN2/arDUD/PvftWcTKN5smTA6VrE9f8TcdWf3H1lrVDHgPy7B Vrl+OCNJCC0tJthN7DqI75v6OceMvDFNBt9X5d2uhlTutXHKCVc0MdIJgLLZhQCIhhn31U HfDBvOf3Z2XnXEz5bgUhWnZrMqlcZso= Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4WtX6X4WzNz6K9Bp; Tue, 27 Aug 2024 23:49:40 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id 6F013140B38; Tue, 27 Aug 2024 23:52:56 +0800 (CST) Received: from localhost (10.203.177.66) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 27 Aug 2024 16:52:55 +0100 Date: Tue, 27 Aug 2024 16:52:55 +0100 From: Jonathan Cameron To: Shiyang Ruan CC: , , , , , , , , , , , , , , , Subject: Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Message-ID: <20240827165255.00003184@Huawei.com> In-Reply-To: <20240808151328.707869-3-ruansy.fnst@fujitsu.com> References: <20240808151328.707869-1-ruansy.fnst@fujitsu.com> <20240808151328.707869-3-ruansy.fnst@fujitsu.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml100002.china.huawei.com (7.191.160.241) To lhrpeml500005.china.huawei.com (7.191.163.240) X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 461D718001B X-Stat-Signature: 5ba9sznddjzh39fsedxr9it4xayxdai4 X-Rspam-User: X-HE-Tag: 1724773979-629425 X-HE-Meta: U2FsdGVkX19hbLGV3vTVinG0lnXThAQjOUID3Ou4+LEOKYRHetO/+17Lw7SiPc8vv0HuKw1ZXr6Um1s2FAevwhZYb5jkm4EQM6aVTPOlgDXDi8TC5Jkacdxhd7lhCnhCL/eB4VG0qOzO2orT0nrbhv3zSe5db9JMETbv07Zc9Tox6FRUaKZhu5Timf6ty54r05tFzc8N9k4gZ5drIFbCBKwoTElRvxUdcnQ4VYny2KbNql6tuFpWVfUcNVoOTLiwRWxPfg/TTO7uReSf7w1kz3fLz5GCZqjbDhIE1t+eVUmA8B7Bt0hDcXeUoUg15rV4jvWui9m4x2Mo3fk7MOew96h+jjLQfo0xzFHRGA8Nztrmlu2acDRVEv1iXVg/MVsUa1gvAW3kr+XI2xEeJd5EFQ/UR+XJPwp9DCuMhZxD1FaVAhxkeuyTz3ly0Wf0/w5Ldd53PgxMXYCKue/QWFKu/rO/pSC4B7khZ3ufkh8q2c0+fs24coe86K5llmzNg+AC0sDjcV9goEpR687faR60rQnDD7BD9+ruFA+mxeSkUWxjGexTE21kLk38TJ48C818HraYHyrhlep3cttur5G8pQZX4LcHNJUnlrp4lgWyRypNvwmLCEIMl9L1vsIJiCWif+yBkSbFq9OOKJDus54Legkx3Jyb5Z2PZrl2DQdRjnSr0z2g1mERePp9jDyItQ06CSsiMUbBFw4CwipULLGUguZT/kbPVOL2jTYcB7A8VWFr8DqIa3LSCsv6sPKHWhFOOSiq7qhlP60mgBzsrCEJkYdYuH8AMxjnocWRUrg/1JaDLgD7gJOBmR/oOJoqsU6aOwJzH9y20Kr/n8JdFSWGwAgyU+ARgDAJOZqoZ8E06rFS9ZJlWGDpTg5dt3DnS5jskXDWfYN6Z9vQ9dlrHdV/LEL7i6B+JoTOay9svYA+lkVnqL9Hn3c7lfOHNDVVLPtN3TLAEkPYY+6ibF25txV GNRH2JNv JlLhQvPOgwqEhqas2GkMOOwh7cWRmGR+hc6w4QCEHXK7iSxqBTXaJ5rKyc5nQXyecsj90QtJ8onuSBmvXsSY+VnsxY3eZh3dOgfFdEL6ssWNCqSBfwAQjeymGnxgDMN/91+8WVMLxpjsDmnSz2WDhGCppCr62DFZisaUurN7zeagV+ngfNb/C0fS61xUSTb14/xZWZ+AuUERmjVOMJTcLp0xLPNUXog0S1ebH2uhR6ZPADx+KkDX5LMNcL0rBga/MKS3TwOHMjhV16VdlTzkJie6J6HLxAuHpxqB6HzOjzppuUCo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000049, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 8 Aug 2024 23:13:28 +0800 Shiyang Ruan wrote: > Since CXL device is a memory device, while CPU is consuming a poison > page of CXL device, it always triggers a MCE (via interrupt #18) and > calls memory_failure() to handle POISON page, no matter which-First path > is configured. CXL device could also find and report the POISON, kernel > now not only traces but also calls memory_failure() to handle it, which > is marked as "NEW" in the figure blow. > ``` > 1. MCE (interrupt #18, while CPU consuming POISON) > -> do_machine_check() > -> mce_log() > -> notify chain (x86_mce_decoder_chain) > -> memory_failure() <---------------------------- EXISTS > 2.a FW-First (optional, CXL device proactively find&report) > -> CXL device -> Firmware > -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace > \-> memory_failure() > ^----- NEW > 2.b OS-First (optional, CXL device proactively find&report) > -> CXL device -> MSI > -> OS: CXL driver -> trace > \-> memory_failure() > ^------------------------------- NEW > ``` > > But in this way, the memory_failure() could be called twice or even at > same time, as is shown in the figure above: (1.) and (2.a or 2.b), > before the POISON page is cleared. memory_failure() has it own mutex > lock so it actually won't be called at same time and the later call > could be avoided because HWPoison bit has been set. However, assume > such a scenario, "CXL device reports POISON error" triggers 1st call, > user see it from log and want to clear the poison by executing `cxl > clear-poison` command, and at the same time, a process tries to access > this POISON page, which triggers MCE (it's the 2nd call). Attempting to clear poison in a page that is online seems unwise. Does that ever make sense today? > Since there > is no lock between the 2nd call with clearing poison operation, race > condition may happen, which may cause HWPoison bit of the page in an > unknown state. As long as that state is always wrong in the sense we think it's poisoned when it isn't we don't care. > > Thus, we have to avoid the 2nd call. This patch[2] introduces a new > notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to > stop the 2nd call of memory_failure(). It checks whether the current > poison page has been reported (if yes, stop the notifier chain, don't > call the following memory_failure() to report again). > If we do want to do this, it belongs in the generic code, not arch specific part. Can we do similar in memory failure? To RAS reviewers, this isn't a new problem unique to CXL. Does a solution like this make sense in practice, or are we fine to always let two reports for the same error get handled? Jonathan