linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Xie Yuanbin <xieyuanbin1@huawei.com>
To: <david@kernel.org>
Cc: <Liam.Howlett@oracle.com>, <akpm@linux-foundation.org>,
	<ardb@kernel.org>, <arnd@arndb.de>, <dave@vasilevsky.ca>,
	<david@redhat.com>, <ebiggers@kernel.org>, <kees@kernel.org>,
	<liaohua4@huawei.com>, <lilinjie8@huawei.com>,
	<linmiaohe@huawei.com>, <linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux@armlinux.org.uk>, <lorenzo.stoakes@oracle.com>,
	<mhocko@suse.com>, <nao.horiguchi@gmail.com>, <nathan@kernel.org>,
	<peterz@infradead.org>, <rmk+kernel@armlinux.org.uk>,
	<rostedt@goodmis.org>, <rppt@kernel.org>, <surenb@google.com>,
	<vbabka@suse.cz>, <will@kernel.org>, <xieyuanbin1@huawei.com>
Subject: Re: [RFC PATCH 1/2] ARM: mm: support memory-failure
Date: Tue, 4 Nov 2025 21:48:31 +0800	[thread overview]
Message-ID: <20251104134831.147584-1-xieyuanbin1@huawei.com> (raw)
In-Reply-To: <e323f1f3-f543-4e81-af6b-243fcf9ba750@kernel.org>

On Mon, 3 Nov 2025 17:53:18 +0100, David Hildenbrand wrote:
> Can you go into more details which exact functionality in
> memory-failure.c you would be interested in using?
>
> Only soft-offlining or also the other (possibly architecture-specific)
> handling?

Thanks! Let me describe it in as much detail as possible.

The functions in memory-failure.c are currently used in three ways:
1. When the application is using memory, and ECC detects a UE
(Uncorrectable Errors) bit flip from DRAM (the detection is performed by
hardware and is not perceived by software), it reports an interrupt to the
CPU. The relevant driver (a third-party module) has already
registered the interrupt callback function.
Based on the configuration, the driver calls `memory_failure_queue()`
inside callback function, or wakes up the related kthread to call
`soft_offline_page()`/`memory_failure()` to take the affected memory
offline or kill the process.

2. Hardware memory scanning function: The hardware periodically performs
read/write tests on some memory (This hardware is not a standard hardware,
so it is not included in the ARM spec. The scanning is not perceived by
software) If bit flip is detected during the test, an interrupt is
reported to the operating system to do the memory-failure,
just like what described earlier.

3. Software memory scanning function: The software (such as kthread/
work-queue) periodically use `soft_offline_page()` to isolate some free
memory and performs read/write tests. If bit flip is detected during the
test, it is considered a failure, and the memory will not be recovered.
Otherwise, use `unpoison_memory()` to recover the memory.

Unfortunately, the driver code for these three methods is difficult to
open-source. I have also been thinking about whether there is a
general-purpose function that could use memory-failure, but I haven't
come up with a good idea yet.

> Cheers
>
> David

Thanks!

Xie Yuanbin


  reply	other threads:[~2025-11-04 13:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-22  2:14 Xie Yuanbin
2025-09-22  2:14 ` [RFC PATCH 2/2] ARM: memory-failure: not select RAS and MEMORY_ISOLATION Xie Yuanbin
2025-09-22  8:15   ` David Hildenbrand
2025-09-22  8:47     ` [RFC PATCH 1/2] ARM: mm: support memory-failure Xie Yuanbin
2025-09-22  6:37 ` Arnd Bergmann
2025-09-22  8:28   ` Xie Yuanbin
2025-09-22 12:51     ` Arnd Bergmann
2025-09-23  4:10       ` Xie Yuanbin
2025-11-03 16:53         ` David Hildenbrand (Red Hat)
2025-11-04 13:48           ` Xie Yuanbin [this message]
2025-10-22  3:58 ` Xie Yuanbin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251104134831.147584-1-xieyuanbin1@huawei.com \
    --to=xieyuanbin1@huawei.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=dave@vasilevsky.ca \
    --cc=david@kernel.org \
    --cc=david@redhat.com \
    --cc=ebiggers@kernel.org \
    --cc=kees@kernel.org \
    --cc=liaohua4@huawei.com \
    --cc=lilinjie8@huawei.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=nathan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox