From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4553FF8D753 for ; Thu, 16 Apr 2026 15:33:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BD2F6B0095; Thu, 16 Apr 2026 11:33:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5942D6B0096; Thu, 16 Apr 2026 11:33:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D1806B0099; Thu, 16 Apr 2026 11:33:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3718A6B0095 for ; Thu, 16 Apr 2026 11:33:16 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E1A3B1B8CD7 for ; Thu, 16 Apr 2026 15:33:15 +0000 (UTC) X-FDA: 84664812750.07.B3C4D27 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) by imf04.hostedemail.com (Postfix) with ESMTP id E90624001C for ; Thu, 16 Apr 2026 15:33:13 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=debian.org header.s=smtpauto.stravinsky header.b=Dnf4Oxh2 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1776353594; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KunAmUKSTj1m7i2sFZ/KMt7l4R69/TMP3N1aLPhKL0k=; b=TBAdbD32A2gjfxPArY2GAxPzYiBz3YZQlFMoO39z+LFbO1upy0ffyb8b9O2Rvuxb+ds0GL g3QcF0kK0iBjZV2kY56kL3O9Avf4FiMjj1E6c5YO2AN4LvLtlk4TbsjQfhBvxWpQYv6EPk xfLqHgizOgA0E/ibwMulQ8PMP0N4uI8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1776353594; a=rsa-sha256; cv=none; b=Mn9R3jg+uo1Apxb8ALNg2yZtfOBZgvAL15U5PcSm3hvGZZjHohAoxzRISwHKF/jBVIB1Wg OAbjxXnnz/vHovDsmNgtsplT/H0hDV04we1OdxiXKYqzbIwe7kXFDbnB1KKbOlv5y25EM9 IKwhNJg4fBvrT5ANCmzI8lFQoxc2++s= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=debian.org header.s=smtpauto.stravinsky header.b=Dnf4Oxh2; spf=none (imf04.hostedemail.com: domain of leitao@debian.org has no SPF policy when checking 82.195.75.108) smtp.mailfrom=leitao@debian.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Reply-To:Content-ID:Content-Description; bh=KunAmUKSTj1m7i2sFZ/KMt7l4R69/TMP3N1aLPhKL0k=; b=Dnf4Oxh2QVWd2T2bW3peV2Ew5U qTEy5oCfwnAhwpXJCiLx1fwxs/MBBwmaviqTVlttG5IERQ8sP53eru7ZB1q/kjSeQ+cBj7O+MRVug eMUZ/p6hgOG9Zd7WYfJVRvMI5hWFOn5VFxpqLEw52VgB344yJHW51yQ/WS3cGg7eecsEfsKzmMhca dwrmu4BWbt+wt+cwKcXRr0fvoYcno2CBOP+rUSXQhMjswr2u/2NVqo6ihVyWnualvHBvtMMXG17Bz t91keM9LlIZpGj+cUTDjewQfCyHRBUJ8KwPYiQSLYuA95wUJ5ViqI9YtUziLknmEBoI+sBRU3+tHV Ybvay2dQ==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wDOhy-00Ehzh-1E; Thu, 16 Apr 2026 15:32:46 +0000 Date: Thu, 16 Apr 2026 08:32:39 -0700 From: Breno Leitao To: Jiaqi Yan Cc: Miaohe Lin , Naoya Horiguchi , Andrew Morton , Jonathan Corbet , Shuah Khan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH v4 0/3] mm/memory-failure: add panic option for unrecoverable pages Message-ID: References: <20260415-ecc_panic-v4-0-2d0277f8f601@debian.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Debian-User: leitao X-Stat-Signature: 57j581fhdz94bham1pu9hwkifee4h74c X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E90624001C X-Rspam-User: X-HE-Tag: 1776353593-123328 X-HE-Meta: U2FsdGVkX1+5DHlNYqvV9buYa9sjvuqjRTCdG72mEybIMyYz3GZ3rRzwR67uheHGGSwfuR8FgZbKRitUarE15h7/8dfpKjfeTc38nd/q4IHjBXzakaE+SHWLnPjNNQUL5kgljjnQHsC7en5Rv2dr4bopLMDZaJIfiByeVexBq33EQ5zvkOKt1/xpiZfM1nPBSd/tqCSvp13u93jvJXEHuoC+UlukoWH+epym1rTr4XE/TlPBPSa8m/deFjqjO2aEwdgGxGA3u7wJw4IfPJyyFxEX8iuzab6XwFHVDEzQQVFBt/UGscRnCy6rFnOaSZYWO9yjM6J4qcU4FiP35doOQm/hpXQxEHRxPMv2Ig042YzUHxX4xBgt0NfdEsJeRqbUtwxwBqOi1HgoZHDq9FBfG5250rYb4+bZE690rXLejpQW18c9/Ysri9SAPzvV1PSAxQ9t46itGDyJB0AMaurTFGfPfuCU+Dw6YTf2/ivJh7CeNY4JR1F+kPnXMDV0vXx6KkbabAMXxu788922mYTRrYMPezZHrz8+406PrfhrSHmamtF/00dAX1UWbkQYsTj8XBv/6DFaFyn49j3k6TZJSZ7J/DoQWfrQ2y2TB6lUzA+dghMLbB9SA3gVxzSRS2U1RkdubSpj1v0PC9ERba07w3PfJSh3WLU3raWYTE06ccH7lACBxX2EBH6UzjPlqTdN8mMhQXWKkNJdU+fwRx7m8QmiUgE/6ru1KX2l+YlXtlL9/hV92h3xoXTYm6mAV/5S9cPv4Q0qclevyCVsohPTn3Lw4Dcbwm2njdfTq8OurFsczbqN/G8gYg9o1GXVf8yraZZ9XrUwFd4ESDwd6Wuf8cIQZ30j3o2295tbKGzOXVffOeTEYSN6riB4GqTz/0Jy6gnuZaGopnNNHXp6grprL+Xrf8bEBIufCoTN8kI7p0oML9bE3gzv4XDWi/moxMnxEaVNxuHg0eO31+0Skvu eXHEM1mY nBzy3Jevz0+7o9rHOKwY0wh4jfImHY6ljSUFSGPRPZagDGgVO657Ie1EoK8pk9N0gVedOsBno3uharPnTr8xo39h7eKAOcLURFPcNan6bzmXl+2n03/mnRHNut2FkQJeMzP7h8R0v/nt1B/PJ3NQ5OQwlkSF8VCHA5tuMtPj/uWMdwFCUWFMzCVHfs/ajPGrCxe0qPoaG57SF4mT+0gEATdCsKk/kN15WGncumcvrpxgZYw9nNxhALCG9iA== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jiaqi, On Wed, Apr 15, 2026 at 01:56:35PM -0700, Jiaqi Yan wrote: > On Wed, Apr 15, 2026 at 5:55 AM Breno Leitao wrote: > > > > When the memory failure handler encounters an in-use kernel page that it > > cannot recover (slab, page tables, kernel stacks, vmalloc, etc.), it > > currently logs the error as "Ignored" and continues operation. > > > > This leaves corrupted data accessible to the kernel, which will inevitably > > cause either silent data corruption or a delayed crash when the poisoned memory > > is next accessed. > > > > This is a common problem on large fleets. We frequently observe multi-bit ECC > > errors hitting kernel slab pages, where memory_failure() fails to recover them > > and the system crashes later at an unrelated code path, making root cause > > analysis unnecessarily difficult. > > > > Here is one specific example from production on an arm64 server: a multi-bit > > ECC error hit a dentry cache slab page, memory_failure() failed to recover it > > (slab pages are not supported by the hwpoison recovery mechanism), and 67 > > seconds later d_lookup() accessed the poisoned cache line causing > > a synchronous external abort: > > > > [88690.479680] [Hardware Error]: error_type: 3, multi-bit ECC > > [88690.498473] Memory failure: 0x40272d: unhandlable page. > > [88690.498619] Memory failure: 0x40272d: recovery action for > > get hwpoison page: Ignored > > ... > > [88757.847126] Internal error: synchronous external abort: > > 0000000096000410 [#1] SMP > > [88758.061075] pc : d_lookup+0x5c/0x220 > > > > This series adds a new sysctl vm.panic_on_unrecoverable_memory_failure > > (default 0) that, when enabled, panics immediately on unrecoverable > > memory failures. This provides a clean crash dump at the time of the > > I get the fail-fast part, but wonder will kernel really be able to > provide clean crash dump useful for diagnosis? Yes, the kernel does provide a useful crash dump. With the sysctl enabled, here's what I observe: Kernel panic - not syncing: Memory failure: 0x1: unrecoverable page CPU: 40 UID: 0 PID: 682 Comm: bash Tainted: G B 7.0.0-next-20260414-upstream-00004-gcbb3af7bfd3b #93 Tainted: [B]=BAD_PAGE Call Trace: vpanic+0x399/0x700 panic+0xb4/0xc0 action_result+0x278/0x340 ← your new panic call site memory_failure+0x152b/0x1c80 Without the patch (or with the sysctl disabled), you only get: Memory failure: 0x1: unhandlable page. Memory failure: 0x1: recovery action for reserved kernel page: Ignored Then the host continues running until it eventually accesses that poisoned memory, triggering a generic error similar to the d_lookup() case mentioned above. > In your example at 88757.847126, kernel was handling SEA and because > we are under kernel context, eventually has to die(). Apparently not > only your patch, but also memory-failure has no role to play there. > But at least SEA handling tried its best to show the kernel code that > consumed the memory error. > > So your code should apply to the memory failure handling at > 88690.498473, which is likely triggered from APEI GHES for poison > detection (I guess the example is from ARM64). Anything except SEA is > considered not synchronous (by APEI is_hest_sync_notify()). If kernel > panics there, I guess it will be in a random process context or a > kworker thread? How useful is it for diagnosis? Just the exact time an > error detected (which is already logged by kernel)? The kernel panics with a clear stack trace and explicit reason, making it straightforward to correlate and analyze the failure. My objective is to have a clean, immediate crash rather than allowing the system to continue running and potentially crash later (if at all). Working at a hyperscaler, I regularly see thousands of these "unhandlable page" messages, followed by later kernel crashes when the corrupted memory is eventually accessed. > On X86, for UCNA or SRAO type machine check exceptions, I think with > your patch the panic would also happen in random process context or > kworker thread, > > Can you share some clean crash dumps from your testing that show they > are more useful than the crash at SEA? Thanks! Certainly, here is the complete crash dump from the example above. This happened on a real production hardware: [88690.478913] [ T593001] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 784 [88690.479097] [ T593001] {1}[Hardware Error]: event severity: recoverable [88690.479184] [ T593001] {1}[Hardware Error]: imprecise tstamp: 2026-03-20 13:13:08 [88690.479282] [ T593001] {1}[Hardware Error]: Error 0, type: recoverable [88690.479359] [ T593001] {1}[Hardware Error]: section_type: memory error [88690.479424] [ T593001] {1}[Hardware Error]: physical_address: 0x00000040272d5080 [88690.479503] [ T593001] {1}[Hardware Error]: physical_address_mask: 0xfffffffffffff000 [88690.479606] [ T593001] {1}[Hardware Error]: node:0 card:0 module:1 rank:1 bank:13 device:6 row:64114 column:832 requestor_id:0x0000000000000027 [88690.479680] [ T593001] {1}[Hardware Error]: error_type: 3, multi-bit ECC [88690.479754] [ T593001] {1}[Hardware Error]: DIMM location: not present. DMI handle: 0x000e [88690.479882] [ T593001] EDAC MC0: 1 UE multi-bit ECC on unknown memory (node:0 card:0 module:1 rank:1 bank:13 device:6 row:64114 column:832 requestor_id:0x0000000000000027 DIMM location: not present. DMI handle: 0x000e page:0x40272d offset:0x5080 grain:4096 - APEI location: node:0 card:0 module:1 rank:1 bank:13 device:6 row:64114 column:832 requestor_id:0x0000000000000027 DIMM location: not present. DMI handle: 0x000e) [88690.498473] [ T593001] Memory failure: 0x40272d: unhandlable page. [88690.498619] [ T593001] Memory failure: 0x40272d: recovery action for get hwpoison page: Ignored [88757.847126] [ T640437] Internal error: synchronous external abort: 0000000096000410 [#1] SMP [88757.867131] [ T640437] Modules linked in: ghes_edac(E) act_gact(E) sch_fq(E) tcp_diag(E) inet_diag(E) cls_bpf(E) mlx5_ib(E) sm3_ce(E) sha3_ce(E) sha512_ce(E) ipmi_ssif(E) ipmi_devintf(E) nvidia_cspmu(E) ib_uverbs(E) cppc_cpufreq(E) coresight_etm4x(E) coresight_stm(E) ipmi_msghandler(E) coresight_trbe(E) arm_cspmu_module(E) arm_smmuv3_pmu(E) arm_spe_pmu(E) stm_core(E) coresight_tmc(E) coresight_funnel(E) coresight(E) bpf_preload(E) sch_fq_codel(E) ip_tables(E) ip6_tables(E) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) tap(E) tls(E) mpls_gso(E) mpls_iptunnel(E) mpls_router(E) fou(E) acpi_power_meter(E) loop(E) drm(E) backlight(E) drm_panel_orientation_quirks(E) autofs4(E) raid0(E) efivarfs(E) dm_crypt(E) [88757.991191] [ T640437] CPU: 70 UID: 34133 PID: 640437 Comm: Collection-20 Kdump: loaded Tainted: G M E 6.16.1-0_fbk2_0_gf40efc324cc8 #1 NONE [88758.017569] [ T640437] Tainted: [M]=MACHINE_CHECK, [E]=UNSIGNED_MODULE [88758.028860] [ T640437] Hardware name: .... [88758.046969] [ T640437] pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [88758.061075] [ T640437] pc : d_lookup+0x5c/0x220 [88758.068392] [ T640437] lr : try_lookup_noperm+0x30/0x50 [88758.077088] [ T640437] sp : ffff800138cafc30 [88758.083827] [ T640437] x29: ffff800138cafc40 x28: ffff0001dcfe8bc0 x27: 00000000bc0a11f7 [88758.098321] [ T640437] x26: 00000000000ee00c x25: ffffffffffffffff x24: 0000000000000001 [88758.112807] [ T640437] x23: ffff003fa14d0000 x22: ffff8000828d3740 x21: ffff800138cafde8 [88758.127281] [ T640437] x20: ffff0000d0316fc0 x19: ffff800138cafce0 x18: 0001000000000000 [88758.141753] [ T640437] x17: 0000000000000001 x16: 0000000001ffffff x15: dfc038a300003936 [88758.156226] [ T640437] x14: 00000000fffffffa x13: ffffffffffffffff x12: ffff0000d0316fc0 [88758.170695] [ T640437] x11: 61c8864680b583eb x10: 0000000000000039 x9 : ffff800080fcfd68 [88758.185170] [ T640437] x8 : ffff003fa72d5088 x7 : 0000000000000000 x6 : ffff800138cafd58 [88758.199645] [ T640437] x5 : ffff0001dcfe8bc0 x4 : ffff80008104a330 x3 : 0000000000000002 [88758.214111] [ T640437] x2 : ffff800138cafd4d x1 : ffff800138cafce0 x0 : ffff0000d0316fc0 [88758.228579] [ T640437] Call trace: [88758.233565] [ T640437] d_lookup+0x5c/0x220 (P) [88758.240864] [ T640437] try_lookup_noperm+0x30/0x50 [88758.248868] [ T640437] proc_fill_cache+0x54/0x140 [88758.256696] [ T640437] proc_readfd_common+0x138/0x1e8 [88758.265222] [ T640437] proc_fd_iterate.llvm.7260857650841435759+0x1c/0x30 [88758.277248] [ T640437] iterate_dir+0x84/0x228 [88758.284354] [ T640437] __arm64_sys_getdents64+0x5c/0x110 [88758.293383] [ T640437] invoke_syscall+0x4c/0xd0 [88758.300843] [ T640437] do_el0_svc+0x80/0xb8 [88758.307599] [ T640437] el0_svc+0x30/0xf0 [88758.313820] [ T640437] el0t_64_sync_handler+0x70/0x100 [88758.322497] [ T640437] el0t_64_sync+0x17c/0x180 ... And my clear crash would look like the following: [ 1096.480523] Memory failure: 0x2: recovery action for reserved kernel page: Ignored [ 1096.480751] Kernel panic - not syncing: Memory failure: 0x2: unrecoverable page [ 1096.480760] CPU: 5 UID: 0 PID: 683 Comm: bash Tainted: G B 7.0.0-next-20260414-upstream-00004-gcbb3af7bfd3b #93 PREEMPTLAZY [ 1096.480768] Tainted: [B]=BAD_PAGE [ 1096.480774] Call Trace: [ 1096.480778] [ 1096.480782] vpanic+0x399/0x700 [ 1096.480821] panic+0xb4/0xc0 [ 1096.480849] action_result+0x278/0x340 [ 1096.480857] memory_failure+0x152b/0x1c80 [ 1096.480925] hwpoison_inject+0x3a6/0x3f0 [hwpoison_inject] .... Isn't the clean approach way better than the random one? For testing, I use this simple procedure, in case you want to play with it: # modprobe hwpoison-inject # sysctl -w vm.panic_on_unrecoverable_memory_failure=0 # echo 1 > /sys/kernel/debug/hwpoison/corrupt-pfn Thanks for the review and good discussion, --breno