From: Breno Leitao <leitao@debian.org>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, kernel-team@meta.com,
Naoya Horiguchi <nao.horiguchi@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v4 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl
Date: Wed, 22 Apr 2026 08:23:10 -0700 [thread overview]
Message-ID: <aejnmh3xlHsuKfP3@gmail.com> (raw)
In-Reply-To: <7b4a6659-e2e5-5e63-2952-c7a840ffcdec@huawei.com>
On Wed, Apr 22, 2026 at 11:43:16AM +0800, Miaohe Lin wrote:
> On 2026/4/15 20:55, Breno Leitao wrote:
> > Add documentation for the new vm.panic_on_unrecoverable_memory_failure
> > sysctl, describing the three categories of failures that trigger a
> > panic and noting which kernel page types are not yet covered.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++++++++++++++
> > 1 file changed, 37 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> > index 97e12359775c9..592ce9ec38c4b 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
> > - page-cluster
> > - page_lock_unfairness
> > - panic_on_oom
> > +- panic_on_unrecoverable_memory_failure
> > - percpu_pagelist_high_fraction
> > - stat_interval
> > - stat_refresh
> > @@ -925,6 +926,42 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
> > why oom happens. You can get snapshot.
> >
> >
> > +panic_on_unrecoverable_memory_failure
> > +======================================
> > +
> > +When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
> > +that cannot be recovered by the memory failure handler, the default
> > +behaviour is to ignore the error and continue operation. This is
> > +dangerous because the corrupted data remains accessible to the kernel,
> > +risking silent data corruption or a delayed crash when the poisoned
> > +memory is next accessed.
> > +
> > +When enabled, this sysctl triggers a panic on three categories of
> > +unrecoverable failures: reserved kernel pages, non-buddy kernel pages
> > +with zero refcount (e.g. tail pages of high-order allocations), and
> > +pages whose state cannot be classified as recoverable.
> > +
> > +Note that some kernel page types — such as slab objects, vmalloc
> > +allocations, kernel stacks, and page tables — share a failure path
> > +with transient refcount races and are not currently covered by this
> > +option. I.e, do not panic when not confident of the page status.
> > +
> > +For many environments it is preferable to panic immediately with a clean
> > +crash dump that captures the original error context, rather than to
> > +continue and face a random crash later whose cause is difficult to
> > +diagnose.
>
> Should we add some userful cases to show the real-world application scenarios?
Yes, good idea. What about something like:
Use cases
---------
This option is most useful in environments where unattributed crashes
are expensive to debug or where data integrity must take precedence
over availability:
* Large fleets, where multi-bit ECC errors on kernel pages are observed
regularly and post-mortem analysis of an unrelated downstream crash
(often seconds to minutes after the original error) consumes
significant engineering effort.
* Systems configured with kdump, where panicking at the moment of the
hardware error produces a vmcore that still contains the faulting
address, the affected page state, and the originating MCE/GHES
record — context that is typically lost by the time a delayed crash
occurs.
* High-availability clusters that rely on fast, deterministic node
failure for failover, and prefer an immediate panic over silent data
corruption propagating to replicas or persistent storage.
next prev parent reply other threads:[~2026-04-22 15:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 12:54 [PATCH v4 0/3] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-04-15 12:55 ` [PATCH v4 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Breno Leitao
2026-04-22 2:50 ` Miaohe Lin
2026-04-15 12:55 ` [PATCH v4 2/3] mm/memory-failure: add panic option for unrecoverable pages Breno Leitao
2026-04-22 3:36 ` Miaohe Lin
2026-04-22 15:21 ` Breno Leitao
2026-04-15 12:55 ` [PATCH v4 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl Breno Leitao
2026-04-22 3:43 ` Miaohe Lin
2026-04-22 15:23 ` Breno Leitao [this message]
2026-04-15 20:56 ` [PATCH v4 0/3] mm/memory-failure: add panic option for unrecoverable pages Jiaqi Yan
2026-04-16 15:32 ` Breno Leitao
2026-04-16 16:26 ` Jiaqi Yan
2026-04-17 9:10 ` Breno Leitao
2026-04-18 0:18 ` Jiaqi Yan
2026-04-22 2:49 ` Miaohe Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aejnmh3xlHsuKfP3@gmail.com \
--to=leitao@debian.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=kernel-team@meta.com \
--cc=linmiaohe@huawei.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=nao.horiguchi@gmail.com \
--cc=rppt@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox