linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jiaqi Yan <jiaqiyan@google.com>
To: Andi Kleen <ak@linux.intel.com>
Cc: nao.horiguchi@gmail.com, linmiaohe@huawei.com,
	jane.chu@oracle.com,  osalvador@suse.de, muchun.song@linux.dev,
	akpm@linux-foundation.org,  shuah@kernel.org, corbet@lwn.net,
	rientjes@google.com, duenwen@google.com,  fvdl@google.com,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	 linux-doc@vger.kernel.org
Subject: Re: [PATCH v4 0/4] Userspace controls soft-offline pages
Date: Fri, 21 Jun 2024 16:53:41 -0700	[thread overview]
Message-ID: <CACw3F51QadqESg2a8Lb_A+PCH7TH0W8BqwNKCyOX4nyeeP1wAw@mail.gmail.com> (raw)
In-Reply-To: <87msnfusyw.fsf@linux.intel.com>

Thanks for your comment, Andi.

On Thu, Jun 20, 2024 at 3:53 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> Jiaqi Yan <jiaqiyan@google.com> writes:
>
> > Correctable memory errors are very common on servers with large
> > amount of memory, and are corrected by ECC, but with two
> > pain points to users:
> > 1. Correction usually happens on the fly and adds latency overhead
> > 2. Not-fully-proved theory states excessive correctable memory
> >    errors can develop into uncorrectable memory error.
>
> This patchkit is amusing (or maybe sad) because it basically tries to
> reconstruct the original soft offline design using a user space daemon
> instead of doing policy badly in the kernel.

Some clarifications. I don't intend to reconstruct. I think this
patchset can also be treated as "patch some missing places so that
kernel doesn't soft offline behind the back of userspace daemon".
I agree with you (IIUC) that the policy for corrected memory errors
should exist in userspace. But the situation is that some behaviors in
the kernel don't respect that (they either have a reason to not
respect, or just forget to respect). enable_soft_offline is basically
the big button in userspace to block these kernel violators.

>
> You can still have it by enabling CONFIG_X86_MCELOG_LEGACY and
> use http://www.mcelog.org or an equivalent daemon of your chosing
> that listens to /dev/mcelog.

If I didn't miss anything important in
https://github.com/andikleen/mcelog and
arch/x86/kernel/cpu/mce/dev-mcelog.c, I don't think /dev/mcelog works
on ARM platforms where CPER is used to convey hw errors from platform
to OS.

In addition, again taking an ARM platform as an example, I don't think
any userspace daemon has the way to stop the GHES driver from soft
offlining memory pages:
https://github.com/torvalds/linux/blob/master/drivers/acpi/apei/ghes.c#L521.
But of course it is not a problem if userspace always wants soft
offline to happen.

>
> -Andi
>
>


  reply	other threads:[~2024-06-21 23:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 18:48 Jiaqi Yan
2024-06-20 18:48 ` [PATCH v4 1/4] mm/memory-failure: refactor log format in soft offline code Jiaqi Yan
2024-06-24  3:08   ` Miaohe Lin
2024-06-20 18:48 ` [PATCH v4 2/4] mm/memory-failure: userspace controls soft-offlining pages Jiaqi Yan
2024-06-24  3:41   ` Miaohe Lin
2024-06-24 16:18     ` Jiaqi Yan
2024-06-20 18:48 ` [PATCH v4 3/4] selftest/mm: test enable_soft_offline behaviors Jiaqi Yan
2024-06-21  5:08   ` Muhammad Usama Anjum
2024-06-21 14:43     ` Jiaqi Yan
2024-06-20 18:48 ` [PATCH v4 4/4] docs: mm: add enable_soft_offline sysctl Jiaqi Yan
2024-06-20 22:53 ` [PATCH v4 0/4] Userspace controls soft-offline pages Andi Kleen
2024-06-21 23:53   ` Jiaqi Yan [this message]
2024-06-22 16:49     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACw3F51QadqESg2a8Lb_A+PCH7TH0W8BqwNKCyOX4nyeeP1wAw@mail.gmail.com \
    --to=jiaqiyan@google.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=duenwen@google.com \
    --cc=fvdl@google.com \
    --cc=jane.chu@oracle.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox