From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16B57C021AA for ; Tue, 18 Feb 2025 11:31:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 869F228011B; Tue, 18 Feb 2025 06:31:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 81A3E28011A; Tue, 18 Feb 2025 06:31:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E1C628011B; Tue, 18 Feb 2025 06:31:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4D27928011A for ; Tue, 18 Feb 2025 06:31:46 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ECD2B1C8882 for ; Tue, 18 Feb 2025 11:31:45 +0000 (UTC) X-FDA: 83132850570.16.5942728 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by imf25.hostedemail.com (Postfix) with ESMTP id 3644DA0009 for ; Tue, 18 Feb 2025 11:31:41 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=xuyMYXoH; spf=pass (imf25.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739878304; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EWVJ71id1yIhfjixFRHSzlaInctlB+NfwpsqQ3pyVes=; b=cOLdNxoKQOAuux3F3LlHn0cA5v24YeYHEhnii9ySJgJGYDEW6ALXU2ev+scmmpps5VmHLH LefJWRNy1pLw9t975OupfZ40d9aFGk2vPqnkEQGm06dIFadC1wKqbnhQF+5sslNqvYXJTa us42qaH4y2DYdykXhuQhjHA9vWi7siw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=xuyMYXoH; spf=pass (imf25.hostedemail.com: domain of xueshuai@linux.alibaba.com designates 115.124.30.100 as permitted sender) smtp.mailfrom=xueshuai@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739878304; a=rsa-sha256; cv=none; b=gQq3KeqD1eGvDuHBc5dN5vcYbEKSnWRDORHBD2oQh2pKVyvBpTMs0bLJdsL/bq7/Yd7g/z 6RcJuLy03OCF49nUa3pXCPP9Ghdtmfw2KruQyHXVdKbVKojbRF7vYkGkucZ2jt//z/VS9Y DAAXTsMwfvVY6it/Bz2bTXcBqW76yb0= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1739878297; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=EWVJ71id1yIhfjixFRHSzlaInctlB+NfwpsqQ3pyVes=; b=xuyMYXoHSxU3DQE+OkRpoe/6CQe2mOGDiRdqK9pVl204soK2i/4Qk8whblj5Uownr8Y9y+X8XjtsxQCsgorrL2Rej/qoNt1bmy0CvbT6fkmlknz8Umbp5zgD8g8firFzaWsCRYHfYln9zyAFgkiihCclygkyA9GN7peA7p3Njjo= Received: from 30.246.161.128(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WPlhBQs_1739878295 cluster:ay36) by smtp.aliyun-inc.com; Tue, 18 Feb 2025 19:31:36 +0800 Message-ID: <7393bcfb-fe94-4967-b664-f32da19ae5f9@linux.alibaba.com> Date: Tue, 18 Feb 2025 19:31:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 0/5] mm/hwpoison: Fix regressions in memory failure handling To: Borislav Petkov Cc: tony.luck@intel.com, nao.horiguchi@gmail.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, linmiaohe@huawei.com, akpm@linux-foundation.org, peterz@infradead.org, jpoimboe@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, tianruidong@linux.alibaba.com References: <20250217063335.22257-1-xueshuai@linux.alibaba.com> <20250218082727.GCZ7REb7OG6NTAY-V-@fat_crate.local> From: Shuai Xue In-Reply-To: <20250218082727.GCZ7REb7OG6NTAY-V-@fat_crate.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 3644DA0009 X-Stat-Signature: 3esgt4qmwdm6hx74qxbqwd56atwiwwuz X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739878301-675316 X-HE-Meta: U2FsdGVkX18EqcXzWKfOr0AWqn90ZHALGtsyFDWwDkPe8T2rCGQ7Fw3uCZdPzvrXkbnlpxATs5mUH+SSVnKlm4lfULZTDsjZrv9dubpInM7nJav/elzwXjP0ktw/TEKkRPttDxHTdEbDqcWnnVl4117kZMxHCKHFGT6WpZZiWk/Em5G02MRSEvjBYGm6UZyltvu6RW1CaaBqG1hBfXDfaDRKjqx9NLXZ/5OGPImRtMbWow3f4MK11k3phqFPW5vDbtpQgTqsfvSYgpPczfM8TCyh99lib3GTxP1sfW2RiPz41ioEe9QRgPNuc5oVMwYPIk4lKH7O0kpiInMH9u+Nz0DS71wHQWf9SmfeX1uEZ1e5jurdh+9T8WTl1jb0qQOW+tYwKg1rAQpiRe7LgFM6rbxkxHnBCg1NK6FXmbb/K+ITMCdeXt1stcorPS382EzfLujUq9bsVsHA3aVusLMcGdLbhOajENEoFaE9WRUTJom8BJjGg30Bah96EaG1qd21RfjA1Vte8uLZ1t5m92+Z46osGXEcreUzN6msLQp1vqj2Y+1M72yJZ+10P1TxVewqNI+RvE9z2G31BnhlaFnD1Hyqi9JyvHg6vi+Ib3W1/e2jfYaWRv6uVrzsoOW4dF8VoR74XuhC0NLaMIYeRMVNZKjzEcb29JlIn5gC3BIBOyp0o+6oXEy6Jkrg9LIa501IVVNj65BH8uVapASJUCuu91x2Ovch0nWxebM8gOTMpFsMSNV9d/tqjXTtacPxgIhjtjDFvuNTckAnLvMsm/KuqaUHo48Hiu55Yx5auj20p25m+5rJhNQT4+9rJgyvL9m7oRKrRBSwF5K6i5iGp1unxf93YSjHgFDgziSBz0xaLShRDje21x7pZe2KdkNNEsVuiT81hz9X7bDIhoJFZnJnTx5NGiQMg785Gmy6rPhnOCkf74SYooc8j4pQvumBd+NitkbUei9qnw/q8wT+b5k E8tgib1k O372HqstLnSyHbs6cDePCB9ZYqxmxKpaB53h0YoqtgGD8fUSaoLoD5fTX7GSLyoUY1nkoJNe1d+HBk6ZWb+byZUNYyKZizrfLSLRv3WfE37l3zaPz/KpPdrTGMCq66SeRcl+Q2DfRxT6tc/+SUy4nrfm/+H+ttSfvMieMUXZJsINPwcwk6mnoXNNKQF9MMzEVaXONohNtuxSTXD43xcf9C+9h1uUQwyRxk/KZLc3zwplOZXUoeX3CYyu2YIiiLyVHbbMQI4hh6eGoLJuYcGaj9wIIkj/7SbzDBBO/eM62Ux5SRtITeWN7/kQyw+Uu2yRFpgxVO86L5B4QJ1VVLrmVoSC9y3+OyJY6Lsw5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000053, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/2/18 16:27, Borislav Petkov 写道: > On Mon, Feb 17, 2025 at 02:33:30PM +0800, Shuai Xue wrote: >> changes singce v1: >> - Patch 1: Fix cur_sev and sev type to `int` per Tony >> - Patch 4: Fix return value to 0 for clean pages per Miaohe >> - Patch 5: pick return value comments of memory-failure() >> >> This patch addresses three regressions identified in memory failure >> handling, as discovered using ras-tools[1]: >> >> - `./einj_mem_uc copyin -f` >> - `./einj_mem_uc futex -f` >> - `./einj_mem_uc instr` > > This is not how you write a problem statement and explain why your patches > exist. > > You need to state: > > 1. What are you trying to do > 2. What is the expected outcome and why > 3. What actually happens and why > 4. The fix, in your opinion, should be X or Y > > Not quote some ras tools commands. Show me that you actually know what you're > doing and explain the problem in human understandable way. And then we can > talk fixes. > > Thx. > Sorry for the confusion. > 1. What are you trying to do I am tring to fix two memory failure regression in upstream kernel compared with 5.10 LTS. - copyin case: poison found while copying from user space. - instr case: poison found while instruction fetching in user space > 2. What is the expected outcome and why For copyin case: Kernel can recover from poison found while copying from user space. MCE check the fixup handler type to decide whether an in kernel #MC can be recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space. For instr case: If a poison found while instruction fetching in user space, full recovery is possible. User process takes #PF, Linux allocates a new page and fills by reading from storage. > 3. What actually happens and why For copyin case: kernel panic since v5.17 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. For instr case: user process is killed by a SIGBUS signal Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") introduced a bug that kill_accessing_process() return -EHWPOISON for instr case, as result, kill_me_maybe() send a SIGBUS to user process. > 4. The fix, in your opinion, should be X or Y For copyin case: add EX_TYPE_EFAULT_REG as a recovery type. For instr case: let kill_accessing_process return 0 to prevent a SIGBUS. For patch 1 and 2: While debuging the two regression, I found `msg` in predefined `severities`, e.g. MCESEV( AO, "Action optional: last level cache writeback error", SER, MASK(MCI_UC_AR|MCACOD, MCI_STATUS_UC|MCACOD_L3WB) ), is helpful for me to know what kind of MCE is happened. For a fatal machine check, kernel panic use the message and I want to extend to collect the message and print it out for non-fatal one. For patch 5: The return value of memory_failure() is quite important while discussed instr case regression with Tony and Miaohe for patch 4, so move comment to the place it belongs to. I hope the information provided above effectively addresses your concerns. Please feel free to let me know if you have any further questions or need additional clarification. Thanks. Shuai