linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kyle Meyer <kyle.meyer@hpe.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: corbet@lwn.net, david@redhat.com, linmiaohe@huawei.com,
	shuah@kernel.org, tony.luck@intel.com, jane.chu@oracle.com,
	jiaqiyan@google.com, Liam.Howlett@oracle.com, bp@alien8.de,
	hannes@cmpxchg.org, jack@suse.cz, joel.granados@kernel.org,
	laoar.shao@gmail.com, lorenzo.stoakes@oracle.com,
	mclapinski@google.com, mhocko@suse.com, nao.horiguchi@gmail.com,
	osalvador@suse.de, rafael.j.wysocki@intel.com, rppt@kernel.org,
	russ.anderson@hpe.com, shawn.fan@intel.com, surenb@google.com,
	vbabka@suse.cz, linux-acpi@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2] mm/memory-failure: Support disabling soft offline for HugeTLB pages
Date: Tue, 16 Sep 2025 02:14:17 -0500	[thread overview]
Message-ID: <aMkOCmGBhZKhKPrI@hpe.com> (raw)
In-Reply-To: <20250915201618.7d9d294a6b22e0f71540884b@linux-foundation.org>

On Mon, Sep 15, 2025 at 08:16:18PM -0700, Andrew Morton wrote:
> On Mon, 15 Sep 2025 19:27:41 -0500 Kyle Meyer <kyle.meyer@hpe.com> wrote:
> 
> > Soft offlining a HugeTLB page reduces the HugeTLB page pool.
> > 
> > Commit 56374430c5dfc ("mm/memory-failure: userspace controls soft-offlining pages")
> > introduced the following sysctl interface to control soft offline:
> > 
> > /proc/sys/vm/enable_soft_offline
> > 
> > The interface does not distinguish between page types:
> > 
> >     0 - Soft offline is disabled
> >     1 - Soft offline is enabled
> > 
> > Convert enable_soft_offline to a bitmask and support disabling soft
> > offline for HugeTLB pages:
> > 
> > Bits:
> > 
> >     0 - Enable soft offline
> >     1 - Disable soft offline for HugeTLB pages
> > 
> > Supported values:
> > 
> >     0 - Soft offline is disabled
> >     1 - Soft offline is enabled
> >     3 - Soft offline is enabled (disabled for HugeTLB pages)
> > 
> > Existing behavior is preserved.
> 
> um, why?  What benefit does this patch provide to our users? 
> Use-cases, before-and-after scenarios, etc?

Thank you for the feedback.

Some BIOS suppress ("cloak") corrected memory errors until a threshold
is reached. Once that threshold is reached, BIOS reports a CPER with the
"error threshold exceeded" bit set via GHES and the corresponding page is
soft offlined.

BIOS does not know the page type of the corresponding page. If the
corresponding page happens to be a HugeTLB page, it will be dissolved,
permanently reducing the HugeTLB page pool. This can be problematic for
workloads that depend on a fixed number of HugeTLB pages.

Currently, soft offline must be disabled to prevent HugeTLB pages from
being soft offlined.

This patch provides a middle ground. Soft offline can be disabled for
HugeTLB pages while remaining enabled for non-HugeTLB pages, preserving
the benefits of soft offline without the risk of BIOS soft offlining
HugeTLB pages.

> > Update documentation and HugeTLB soft offline self tests.
> > 
> > Reported-by: Shawn Fan <shawn.fan@intel.com>
> 
> Interesting.  What did Shawn report? (Closes:!).

Tony or Shawn, could you please point me to the original report? Thanks!

> > Suggested-by: Tony Luck <tony.luck@intel.com>
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> >
> > ...
> >
> >  .../ABI/testing/sysfs-memory-page-offline     |  3 ++
> >  Documentation/admin-guide/sysctl/vm.rst       | 28 ++++++++++++++++---
> >  mm/memory-failure.c                           | 17 +++++++++--
> >  .../selftests/mm/hugetlb-soft-offline.c       | 19 ++++++++++---
> >  4 files changed, 56 insertions(+), 11 deletions(-)
> 
> I'll add it because testing, but please do explain why I added it?

Thanks,
Kyle Meyer


  reply	other threads:[~2025-09-16  7:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-16  0:27 Kyle Meyer
2025-09-16  3:16 ` Andrew Morton
2025-09-16  7:14   ` Kyle Meyer [this message]
2025-09-16 15:20     ` Luck, Tony
2025-09-16 17:59       ` Kyle Meyer
2025-09-16 18:08         ` Luck, Tony
2025-09-17  6:35           ` Fan, Shawn
2025-09-17 18:59             ` Kyle Meyer
2025-09-18  8:34               ` Shuai Xue
2025-09-18 15:48                 ` Jiaqi Yan
2025-09-16 10:12 ` Anshuman Khandual
2025-09-17  7:02   ` David Hildenbrand
2025-09-17 18:51     ` Kyle Meyer
2025-09-17 19:05       ` David Hildenbrand
2025-09-17 19:32         ` Jiaqi Yan
2025-09-17 19:54           ` Luck, Tony
2025-09-17 21:39             ` Kyle Meyer
2025-09-17 22:15               ` Jiaqi Yan
2025-09-21 11:36             ` Anshuman Khandual
2025-09-23  6:03               ` Kyle Meyer
2025-09-21 11:25         ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMkOCmGBhZKhKPrI@hpe.com \
    --to=kyle.meyer@hpe.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=jane.chu@oracle.com \
    --cc=jiaqiyan@google.com \
    --cc=joel.granados@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mclapinski@google.com \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rppt@kernel.org \
    --cc=russ.anderson@hpe.com \
    --cc=shawn.fan@intel.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox