From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ABA0C021A4 for ; Mon, 24 Feb 2025 21:50:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B961D28000E; Mon, 24 Feb 2025 16:50:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B458728000D; Mon, 24 Feb 2025 16:50:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0DCF28000E; Mon, 24 Feb 2025 16:50:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7BA6A28000D for ; Mon, 24 Feb 2025 16:50:47 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EF115140219 for ; Mon, 24 Feb 2025 21:50:46 +0000 (UTC) X-FDA: 83156183292.24.00235C3 Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf17.hostedemail.com (Postfix) with ESMTP id 3C6F64000D for ; Mon, 24 Feb 2025 21:50:43 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=IMYdPg76; spf=pass (imf17.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740433845; a=rsa-sha256; cv=none; b=abP6E/03rcJcd+0wA68BmsA4hWfgia9HW9AMywEKWGwEi9i7w9ezpNBLt0fvISgtYIQjIj 0HL+rBmBgDo7i6LSfbn5fFMLqbypLviryHzevS16Us0cDahr9GRvP1olcNbyczjLrClPuW EvlH/x5gETYDKL6yeCnKuK8ymplvTUQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=IMYdPg76; spf=pass (imf17.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740433845; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IBiSY8I0yRdORrcUhePsGnU14FqUW8tYtf0UXHbwTEM=; b=2iP2T34ug/Qxr6f7CpYwK2sJ1i58VPK58AI/eRYmIUBBJQvnwfFxDR9rphOaVHXLGgdMs+ QvU+idZFDYACz0S2LB6rkDdK8CVkXHuK6WaOdc0aW5dzhdhyah0eQ+Gq1LLP46Ikittwhm /7PIsh4Yp8jqETn8TagsuOe1/GO+Azc= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 78E4E40E01B2; Mon, 24 Feb 2025 21:50:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 0UtbYBCWaCP1; Mon, 24 Feb 2025 21:50:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1740433836; bh=IBiSY8I0yRdORrcUhePsGnU14FqUW8tYtf0UXHbwTEM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IMYdPg76FPfiBY4mH52UhP63MaNp7Df/NIHmZlHpBdeuynYoF628p13zS9YBjnBwn PNmGWF+0QxCY+QnxjV12BjnMI88RHY9oMwjsuVbEvnjUe/aHW5JE7Bmf/GE2n85DRs faXTXHVhZfOVREqxRGDOFSM9z5D7iGmAI85iKXsfNSN85/bb8dPCLZiWRHHT64+DBr eELaWVwPLyxHNv5rtjJ8h4wn5VRakJBwU46+kBpnezgbmk+q2JoiRzFfYaQC5SYi7e /3kYy6uAc9oL8cp4SVMqWQr5GKlU2Y0146YKq2Ybk1eEySQaHQHeb8gEjbIgYOq7Rn qq9PPpfdVyWRAWdIfNZF1n7L62whbU5k2vqnWdkxJc4zHOI+4+VvxJsaWFoe1IWQ0C M2ndc/FouxTxqfYnBo/gk5eH1X/szMxmARANyrCeZjhXHtL/cJiT/JxSmDcBnUwvTW EDQHdWwGlYafi5gsWVCHFaETBLJPF8Sjx8pvPaPjW33XOmUoQN1IpURKOPN+TlR1pv 2yElDZncrrC2IkDuPv8C8RZn5+bLVkslYbvHOATrh80QNFua3iysQ6rZWazKyZJ3ur O6es0U+aCbpfu5uZq2zxUo6bcK2A6z4UzsLiroo+5+k1dOBowoYInWGH8Ka88H+kgK b1FbOe1/DUoytlbc3w2aqPYI= Received: from zn.tnic (pd95303ce.dip0.t-ipconnect.de [217.83.3.206]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 5369A40E015F; Mon, 24 Feb 2025 21:50:19 +0000 (UTC) Date: Mon, 24 Feb 2025 22:50:13 +0100 From: Borislav Petkov To: "Luck, Tony" Cc: Shuai Xue , "nao.horiguchi@gmail.com" , "tglx@linutronix.de" , "mingo@redhat.com" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "linmiaohe@huawei.com" , "akpm@linux-foundation.org" , "peterz@infradead.org" , "jpoimboe@kernel.org" , "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "baolin.wang@linux.alibaba.com" , "tianruidong@linux.alibaba.com" Subject: Re: [PATCH v2 0/5] mm/hwpoison: Fix regressions in memory failure handling Message-ID: <20250224215013.GAZ7zplS6XmgL9h9w0@fat_crate.local> References: <20250217063335.22257-1-xueshuai@linux.alibaba.com> <20250218082727.GCZ7REb7OG6NTAY-V-@fat_crate.local> <7393bcfb-fe94-4967-b664-f32da19ae5f9@linux.alibaba.com> <20250218122417.GHZ7R78fPm32jKYUlx@fat_crate.local> <20250219081037.GAZ7WR_YmRtRvN_LKA@fat_crate.local> <20250220111903.GDZ7cPp1qVq3t9Jgs6@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Stat-Signature: kkde79q1b8rjd1hnmf3kjqmhxppr53er X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3C6F64000D X-Rspam-User: X-HE-Tag: 1740433843-795285 X-HE-Meta: U2FsdGVkX1+kDzo1oRgK0Z8KLcVwB347BmDtPlBCtN3cASx53Z9Xh5JkBqvyjSsB3OAMHJPfOVRDlsE9PBcaNqG8S9CcZtMm6Wh4RdeQBSKMvNrhc/Ibv535MYfGlOdKc28jYE4l0jxC43MACHDMNkjzZ6v0mgGwuAMK5sdvMdaJ39RWGaQV28KQG5wg2eV+m7BFbc+xUi+DBOYmDwmRwOHf5UgU8WU4Pw8IStQdL7vizmGXqIe3oJImg1Xql0gUQ5QSMO483O3T7TpNFhqUCWkcJPvwT/6BKB2JKrs2jTCR4ezKX22VaJF6q4qrJoqtUESYq2MwPdVnuXJXu6FNIHwTMtw3W9P3F4vgBhN2YbspgtoFqmopA2/uKnJnI4H5HGkzIRbmR9ML/uHQxNMzm7OYPGDyD6m/ZXZnSlvdypAf5NswCSSXa/bXw+sTPohQideTOk4A4P9s2stl9Hb77O/Ag0NyLTKGswBPgAIELX6hmD4xpS0+xtTW6HMpTf0IrSJGk3GFO/riomDQHsxAjMRMWLsN0aoi5tULWdgP7tEfYI76BVEYXHIQSIeYoqqVwyNXGDYwH2f4A+6DIsspsQXZIOK5C3ALhtIoMIeij8SuGVh4KiRvuCgNt3rGRz13JqVE32cwAxKpM+vsxOOMuLoQQb4JxtyVteky/i9NMtADbOx5SEgOBAuQqr8PTapxyId7Jg8poyUqrKWrZ/5BTOEe3o9U069HvtNncpfIgzr1S87rHROYm5FkQyx4mNBkQOi2x/xF63YZ+9RRKxfNYTZBklPigWjLXsw/VZ9A1SKg/U1fWkDr5+OSndVPhVe1WF64vFAG2LlN++/LaX+kyDExHW3tyy20/RFhGDIZKoDsyIcZVzA1FWDlR13BEx32HPbv7hTI2NM5qNxk4suJ/xTGclGcokrzaS2o8aLKJmXIVQJLoCX/IqJUUrN5W4HqAAz/ltNEU1dbPGmKZPc jyvP9JU3 HcaodktHnsmil5NeipxpBoMPNXvfk+xnf9hYaTvAEcZJ7Q7r10i1BCKj51OBIF4R/w6SGFSs/gAcdskmGvWxI3OT+xzC09SyglUZhbgcQx0rQlR9ZT2QhKxhPWZpyiFy56MiWBiDvjYpa6rvfMme3t+H5M87P1ZnP1rvNaMhFcePUAgoJENWgZ+qyfYFTf5acwxDidk8pOR6cP+h9eWTYD7XuRsDxWY6/swHf84WlQXxd+Jw+byF+VbneWVxupUdY7IrgtkdfLjv1rfqJds6PUcRxQrqXwcCLDrPIfLd+1HDuTTimUKkQvLuntnANj2fy809j4PNbNTsrg7yNE1++C03DQUR3ZhU+fnyFZFwMGB1sGSBD7BcH0+Ik1U7EU9zZKrWz X-Bogosity: Ham, tests=bogofilter, spamicity=0.045140, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 20, 2025 at 05:50:14PM +0000, Luck, Tony wrote: > Agreed. Shaui needs to harvest this thread to fill out the details in the commit > messages. Yap. > There are probably other races. Two CPUs both take local #MC on the same page > (maybe not all that rare in threaded processes ... or even with some hot code in > a shared library). Yap, exactly. And I think there's nothing we can do - the hw is out there so the sw needs to handle them cases correctly. > Hmmm indeed. Needs some thought. Though failing to kill a process likely means > it retries the access and comes right back to try again (without the race this time). What happens if it fails to kill the process? It'll return to it, it'll try to touch the faulty memory and raise another #MC? Right, I think so. > > > On Intel that would mean not registering the notifier at all. What about AMD? > > > Do you have similar races for MCE_DEFERRED_SEVERITY errors? > > > > Probably. Lemme ask around. After talking to folks internally, yeah, I think we'll probably have a similar thing. Haven't seen it happen yet. > Linux tries to enable if LMCE is supported, but BIOS has veto power. > See the bit in lmce_supported() that checks MSR_IA32_FEAT_CTL I'm trying to educate our hw folks to not rely on OEM BIOS if possible. For every chance I get. Otherwise you get crap like that and this is never getting better. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette