From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 933CEC61DF7 for ; Thu, 23 Nov 2023 15:07:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AC1A6B06C5; Thu, 23 Nov 2023 10:07:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15DB96B06C7; Thu, 23 Nov 2023 10:07:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 024A86B06C8; Thu, 23 Nov 2023 10:07:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E52F76B06C5 for ; Thu, 23 Nov 2023 10:07:56 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B2DB31A12C0 for ; Thu, 23 Nov 2023 15:07:56 +0000 (UTC) X-FDA: 81489548952.24.54F22E9 Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf26.hostedemail.com (Postfix) with ESMTP id 91CC614002E for ; Thu, 23 Nov 2023 15:07:53 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b="SM/6TxLm"; spf=pass (imf26.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700752074; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GZioNhBqPoaVW6r9VNgUDTxDc/9WQFahtmtDcDv1jQk=; b=gUBlPq+0+5Bjnf9PRmal0F1oj2gHSewYHRWcN57YrPfc5ajhN5yv+eCgveAN1SOFbG3l1k UaCUkFovmmafNKKgGvdGvd+02Mp7A3mPaIC8EQLSLqDyT/mI8Bv7ks7b7l3b3b+MpO0GRw Lpfw6XN/JaoQAlMMOnTXuao8JGrv4pQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700752074; a=rsa-sha256; cv=none; b=euT9/3JeIxxn9bvM9zfCV79jLyn3CjRMReXYaw2JTI4mYosoWUlh0PXZicZU1QVyJDwW/B a0blax9VObZQ1iwbweRr5PHI8wDKkooR3QYGARImG0TgsTU+Z+x24vupl/0LyctCO3fUfN iLjRB0nQ1qWu9UzNEnnVdkeb5/gFh8Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b="SM/6TxLm"; spf=pass (imf26.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id F2CDC40E0257; Thu, 23 Nov 2023 15:07:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id m5sjdodFMcPV; Thu, 23 Nov 2023 15:07:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1700752069; bh=GZioNhBqPoaVW6r9VNgUDTxDc/9WQFahtmtDcDv1jQk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SM/6TxLmCLeeCMJ4E7eAn6aIauEZQYDF368OVQdaxtJMkCf3CvIxti5slID2BVG8T 7vlw2LVVbk1bS3DnCn8J6Tm8te6+efi6FDcjzlIjoC5+f+QG27JxLjHfBHheYGcQut bcOXtX8ftoC4YUDrQFAKRPkmwTfJpRrkbI/0OTy7nU2hjzvU4t6tY69JbHucTrYPhW 8wP8EdCI9dHTom0ATgsb5RTUjJiC/9JbjriV6R+OowpPSQCgBYWZzYiUbUXaXXGBTV Lnp1aVp6+K9DwYxKXo/kCkQMMAl0tSwCDo0vI+gV7IYDVnOj/tgn58Qmck734jvle8 s1qX7K27uPhTo8tutx/KBwdLgam7VR46wBo204py/FLWXvyjkS5KbwYeqcRf4ds/vv 189WaejXxC4zyMAngHPMVWdfiH/srNz8tUIUbyYdk5Sw02kWFghaX6yseQsJKvSDmy n7CbsMRUZtQ/55hbbKcNiSVnH6BwIDxaRfmIxV2EURLNtKr/8PjKsI4/oozYuoREDy WYRCFGP/z/hq+4RWfu0JFOyZywYRt0OoPMUhYmMyFHV0v7q+T4g8v19Ulkm8Kfu1Gn n5kP3VMwL0iPvOkOfUoEqDF875T1CllORUq1wJebvge1tYzhwHHP8sFcRsrR3YZ3oE ej22VW8rwDIqG1Ea0/a5GaHE= Received: from zn.tnic (pd95304da.dip0.t-ipconnect.de [217.83.4.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 9922540E0195; Thu, 23 Nov 2023 15:07:14 +0000 (UTC) Date: Thu, 23 Nov 2023 16:07:10 +0100 From: Borislav Petkov To: Shuai Xue Cc: rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task work with proper si_code Message-ID: <20231123150710.GEZV9qnkWMBWrggGc1@fat_crate.local> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> <20231007072818.58951-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231007072818.58951-1-xueshuai@linux.alibaba.com> X-Stat-Signature: 9rjm1cnqguih43bxadnx9gtgw45e1wkf X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 91CC614002E X-Rspam-User: X-HE-Tag: 1700752073-216370 X-HE-Meta: U2FsdGVkX192S4oTCB7EFo6x2subU6AZOZYHx9+Hu+7ijXE0cQE/W0CuCbjzNjSX99UQBx9nFxMctl8mRiITd4FZxw2Y8pjtgInQNN9RxpUUc/Nlm4yFqsLw6JZoOUNZPKOslRq2SJo5hKw+M+LAduhBIrX1vdE/ZLdfSREN4ww6zVSL38xJNS3kVxkGPK9d9blEFwLmogYSTB9fACNDFM9834Q6aqWcd6rRvDYnwaN5cULtFidw+ErZcq9UDHQY9idjhkmSPqV6WfhiP8i9weD+ebgR+T69FGX7HO0sqZ2mpRvMfcZzP2ZWw3NpfRx9o9HhUYYVshJgiajSwXOyo+tIZXhzz9b4tkqu2ISMcX2OEeNLFonofmdmFLj7bOeEVwlvSYQjBvXDOhPc7KRl6/DjmoaIFCZvmHFqQ0Q0AvT95SMLt39rTY5EPJ3kk212APko81/Y5IFjThrMrhRFvnIgl0SidwJmgiC+XtnC2xrpEgRT84qQ5UHlwelloE01G0usFiI1i9HYWA9/erU4kc7q5sdNg23I+UVn6bK/I2UGn+169KRAHKc+HIMjhptI38J9Jo18pfjU6YxTkdvGUGJS4wq8V0R60LToxi1/5WGHM6bp/Kn/iWZeYBZSspWEs9UiYJL2pxnoX6iv9MG6V7GMh7ST27so/8itGgPL7+myPfTcdt1mzORBmP6XgFOfzKWSI8K2a3FwC3YuRZ7YgbgYSrmsV14+9MNSFj8m7No8ln19N5ek+3fdQ3H4Vrs0N1A6wMbTpKMiPM1DyjkS/u2PE6ITPj8JVvaOkxpt7AyjlCRxZCz4G7iplnMrOZWd82NT80ElLT/++5gIzsQ1RhD7MSijXmDqz/aYs6TwFFhyGmk8dnkcePugoAIeBio+bmbHpSS9he8J8JM24cRoQuYgEuuSjGHEiFA1Ai9bXgfuythWTIUq6J20GS87+70+6KUdxh4ZKL24NWQnMRm Zkg58tR+ 3YAVxJNkVOUPLi6Qzih/69sXLSzimkxzc1soloY5HOuMExt+Dc3KzHf0bmMLBFwu5t0UsZpy+30OgQ7KDGnLSx1Oioor7jU1Azj5i8AUIhg9RUQf528kXsYa+mJ38AwLbveDgh9Qt0KLW52CIjGeZzsDAn1A9HlORaNYsVG8m2NQbRV7SaDKYPlNkulaIG9xthj27ICItAwNHYXgsLWmjlIZAWiW/hJ+A4gCcnacVx+TqjZZWKkgTIDsleQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Oct 07, 2023 at 03:28:16PM +0800, Shuai Xue wrote: > However, this trick is not always be effective So far so good. What's missing here is why "this trick" is not always effective. Basically to explain what exactly the problem is. > For example, hwpoison-aware user-space processes use the si_code: > BUS_MCEERR_AO for 'action optional' early notifications, and BUS_MCEERR_AR > for 'action required' synchronous/late notifications. Specifically, when a > signal with SIGBUS_MCEERR_AR is delivered to QEMU, it will inject a vSEA to > Guest kernel. In contrast, a signal with SIGBUS_MCEERR_AO will be ignored > by QEMU.[1] > > Fix it by seting memory failure flags as MF_ACTION_REQUIRED on synchronous events. (PATCH 1) So you're fixing qemu by "fixing" the kernel? This doesn't make any sense. Make errors which are ACPI_HEST_NOTIFY_SEA type return MF_ACTION_REQUIRED so that it *happens* to fix your use case. Sounds like a lot of nonsense to me. What is the issue here you're trying to solve? > 2. Handle memory_failure() abnormal fails to avoid a unnecessary reboot > > If process mapping fault page, but memory_failure() abnormal return before > try_to_unmap(), for example, the fault page process mapping is KSM page. > In this case, arm64 cannot use the page fault process to terminate the > synchronous exception loop.[4] > > This loop can potentially exceed the platform firmware threshold or even trigger > a kernel hard lockup, leading to a system reboot. However, kernel has the > capability to recover from this error. > > Fix it by performing a force kill when memory_failure() abnormal fails or when > other abnormal synchronous errors occur. Just like that? Without giving the process the opportunity to even save its other data? So this all is still very confusing, patches definitely need splitting and this whole thing needs restraint. You go and do this: you split *each* issue you're addressing into a separate patch and explain it like this: --- 1. Prepare the context for the explanation briefly. 2. Explain the problem at hand. 3. "It happens because of <...>" 4. "Fix it by doing X" 5. "(Potentially do Y)." --- and each patch explains *exactly* *one* issue, what happens, why it happens and just the fix for it and *why* it is needed. Otherwise, this is unreviewable. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette