linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
@ 2025-09-10 16:15 Kyle Meyer
  2025-09-10 16:44 ` Jiaqi Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Kyle Meyer @ 2025-09-10 16:15 UTC (permalink / raw)
  To: akpm, corbet, david, linmiaohe, shuah, tony.luck
  Cc: Liam.Howlett, bp, hannes, jack, jane.chu, jiaqiyan,
	joel.granados, kyle.meyer, laoar.shao, lorenzo.stoakes,
	mclapinski, mhocko, nao.horiguchi, osalvador, rafael.j.wysocki,
	rppt, russ.anderson, shawn.fan, surenb, vbabka, linux-acpi,
	linux-doc, linux-kernel, linux-kselftest, linux-mm

Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
Since HugeTLB pages are preallocated, reducing the available HugeTLB
page pool can cause allocation failures.

/proc/sys/vm/enable_soft_offline provides a sysctl interface to
disable/enable soft offline:

0 - Soft offline is disabled.
1 - Soft offline is enabled.

The current sysctl interface does not distinguish between HugeTLB pages
and other page types.

Disable soft offline for HugeTLB pages by default (1) and extend the
sysctl interface to preserve existing behavior (2):

0 - Soft offline is disabled.
1 - Soft offline is enabled (excluding HugeTLB pages).
2 - Soft offline is enabled (including HugeTLB pages).

Update documentation for the sysctl interface, reference the sysctl
interface in the sysfs ABI documentation, and update HugeTLB soft
offline selftests.

Reported-by: Shawn Fan <shawn.fan@intel.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
---

Tony's original patch disabled soft offline for HugeTLB pages when
a correctable memory error reported via GHES (with "error threshold
exceeded" set) happened to be on a HugeTLB page:

https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com

This patch disables soft offline for HugeTLB pages by default
(not just from GHES).

---
 .../ABI/testing/sysfs-memory-page-offline     |  6 ++++
 Documentation/admin-guide/sysctl/vm.rst       | 18 ++++++++---
 mm/memory-failure.c                           | 21 ++++++++++--
 .../selftests/mm/hugetlb-soft-offline.c       | 32 +++++++++++++------
 4 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
index 00f4e35f916f..befb89ae39ec 100644
--- a/Documentation/ABI/testing/sysfs-memory-page-offline
+++ b/Documentation/ABI/testing/sysfs-memory-page-offline
@@ -20,6 +20,12 @@ Description:
 		number, or a error when the offlining failed.  Reading
 		the file is not allowed.
 
+		Soft-offline can be disabled/enabled via sysctl:
+		/proc/sys/vm/enable_soft_offline
+
+		For details, see:
+		Documentation/admin-guide/sysctl/vm.rst
+
 What:		/sys/devices/system/memory/hard_offline_page
 Date:		Sep 2009
 KernelVersion:	2.6.33
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..ae56372bd604 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and
 HugeTLB cases.
 
 For all architectures, enable_soft_offline controls whether to soft offline
-memory pages.  When set to 1, kernel attempts to soft offline the pages
-whenever it thinks needed.  When set to 0, kernel returns EOPNOTSUPP to
-the request to soft offline the pages.  Its default value is 1.
+memory pages:
+
+- 0: Soft offline is disabled.
+- 1: Soft offline is enabled (excluding HugeTLB pages).
+- 2: Soft offline is enabled (including HugeTLB pages).
+
+The default is 1.
+
+If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
 
 It is worth mentioning that after setting enable_soft_offline to 0, the
 following requests to soft offline pages will not be performed:
 
+- Request to soft offline from sysfs (soft_offline_page).
+
 - Request to soft offline pages from RAS Correctable Errors Collector.
 
-- On ARM, the request to soft offline pages from GHES driver.
+- On ARM and X86, the request to soft offline pages from GHES driver.
 
 - On PARISC, the request to soft offline pages from Page Deallocation Table.
 
+Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool.
+
 extfrag_threshold
 =================
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc30ca4804bf..cb59a99b48c5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -64,11 +64,18 @@
 #include "internal.h"
 #include "ras/ras_event.h"
 
+enum soft_offline {
+	SOFT_OFFLINE_DISABLED = 0,
+	SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
+	SOFT_OFFLINE_ENABLED
+};
+
 static int sysctl_memory_failure_early_kill __read_mostly;
 
 static int sysctl_memory_failure_recovery __read_mostly = 1;
 
-static int sysctl_enable_soft_offline __read_mostly = 1;
+static int sysctl_enable_soft_offline __read_mostly =
+	SOFT_OFFLINE_ENABLED_SKIP_HUGETLB;
 
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
@@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
+		.extra2		= SYSCTL_TWO,
 	}
 };
 
@@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags)
 		return -EIO;
 	}
 
-	if (!sysctl_enable_soft_offline) {
+	if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) {
 		pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
 		put_ref_page(pfn, flags);
 		return -EOPNOTSUPP;
 	}
 
+	if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) {
+		if (folio_test_hugetlb(pfn_folio(pfn))) {
+			pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
+			put_ref_page(pfn, flags);
+			return -EOPNOTSUPP;
+		}
+	}
+
 	mutex_lock(&mf_mutex);
 
 	if (PageHWPoison(page)) {
diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
index f086f0e04756..7e2873cd0a6d 100644
--- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
+++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
@@ -1,10 +1,15 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Test soft offline behavior for HugeTLB pages:
- * - if enable_soft_offline = 0, hugepages should stay intact and soft
- *   offlining failed with EOPNOTSUPP.
- * - if enable_soft_offline = 1, a hugepage should be dissolved and
- *   nr_hugepages/free_hugepages should be reduced by 1.
+ *
+ * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages
+ *   should stay intact and soft offlining failed with EOPNOTSUPP.
+ *
+ * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages
+ *   should stay intact and soft offlining failed with EOPNOTSUPP.
+ *
+ * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be
+ *   dissolved and nr_hugepages/free_hugepages should be reduced by 1.
  *
  * Before running, make sure more than 2 hugepages of default_hugepagesz
  * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
@@ -32,6 +37,12 @@
 
 #define EPREFIX " !!! "
 
+enum soft_offline {
+	SOFT_OFFLINE_DISABLED = 0,
+	SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
+	SOFT_OFFLINE_ENABLED
+};
+
 static int do_soft_offline(int fd, size_t len, int expect_errno)
 {
 	char *filemap = NULL;
@@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value)
 	char cmd[256] = {0};
 	FILE *cmdfile = NULL;
 
-	if (value != 0 && value != 1)
+	if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED)
 		return -EINVAL;
 
 	sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
@@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
 static void test_soft_offline_common(int enable_soft_offline)
 {
 	int fd;
-	int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
+	int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP;
 	struct statfs file_stat;
 	unsigned long hugepagesize_kb = 0;
 	unsigned long nr_hugepages_before = 0;
@@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline)
 	// No need for the hugetlbfs file from now on.
 	close(fd);
 
-	if (enable_soft_offline) {
+	if (enable_soft_offline == SOFT_OFFLINE_ENABLED) {
 		if (nr_hugepages_before != nr_hugepages_after + 1) {
 			ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
 			return;
@@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline)
 int main(int argc, char **argv)
 {
 	ksft_print_header();
-	ksft_set_plan(2);
+	ksft_set_plan(3);
 
-	test_soft_offline_common(1);
-	test_soft_offline_common(0);
+	test_soft_offline_common(SOFT_OFFLINE_ENABLED);
+	test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB);
+	test_soft_offline_common(SOFT_OFFLINE_DISABLED);
 
 	ksft_finished();
 }
-- 
2.51.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-10 16:15 [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default Kyle Meyer
@ 2025-09-10 16:44 ` Jiaqi Yan
  2025-09-10 17:50   ` Kyle Meyer
  2025-09-10 18:05 ` jane.chu
  2025-09-11  8:46 ` David Hildenbrand
  2 siblings, 1 reply; 10+ messages in thread
From: Jiaqi Yan @ 2025-09-10 16:44 UTC (permalink / raw)
  To: Kyle Meyer
  Cc: akpm, corbet, david, linmiaohe, shuah, tony.luck, Liam.Howlett,
	bp, hannes, jack, jane.chu, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On Wed, Sep 10, 2025 at 9:16 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>
> Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> Since HugeTLB pages are preallocated, reducing the available HugeTLB
> page pool can cause allocation failures.
>
> /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> disable/enable soft offline:
>
> 0 - Soft offline is disabled.
> 1 - Soft offline is enabled.
>
> The current sysctl interface does not distinguish between HugeTLB pages
> and other page types.
>
> Disable soft offline for HugeTLB pages by default (1) and extend the
> sysctl interface to preserve existing behavior (2):
>
> 0 - Soft offline is disabled.
> 1 - Soft offline is enabled (excluding HugeTLB pages).
> 2 - Soft offline is enabled (including HugeTLB pages).
>
> Update documentation for the sysctl interface, reference the sysctl
> interface in the sysfs ABI documentation, and update HugeTLB soft
> offline selftests.
>
> Reported-by: Shawn Fan <shawn.fan@intel.com>
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
>
> Tony's original patch disabled soft offline for HugeTLB pages when
> a correctable memory error reported via GHES (with "error threshold
> exceeded" set) happened to be on a HugeTLB page:
>
> https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
>
> This patch disables soft offline for HugeTLB pages by default
> (not just from GHES).
>
> ---
>  .../ABI/testing/sysfs-memory-page-offline     |  6 ++++
>  Documentation/admin-guide/sysctl/vm.rst       | 18 ++++++++---
>  mm/memory-failure.c                           | 21 ++++++++++--
>  .../selftests/mm/hugetlb-soft-offline.c       | 32 +++++++++++++------
>  4 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
> index 00f4e35f916f..befb89ae39ec 100644
> --- a/Documentation/ABI/testing/sysfs-memory-page-offline
> +++ b/Documentation/ABI/testing/sysfs-memory-page-offline
> @@ -20,6 +20,12 @@ Description:
>                 number, or a error when the offlining failed.  Reading
>                 the file is not allowed.
>
> +               Soft-offline can be disabled/enabled via sysctl:
> +               /proc/sys/vm/enable_soft_offline
> +
> +               For details, see:
> +               Documentation/admin-guide/sysctl/vm.rst
> +
>  What:          /sys/devices/system/memory/hard_offline_page
>  Date:          Sep 2009
>  KernelVersion: 2.6.33
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 4d71211fdad8..ae56372bd604 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and
>  HugeTLB cases.
>
>  For all architectures, enable_soft_offline controls whether to soft offline
> -memory pages.  When set to 1, kernel attempts to soft offline the pages
> -whenever it thinks needed.  When set to 0, kernel returns EOPNOTSUPP to
> -the request to soft offline the pages.  Its default value is 1.
> +memory pages:
> +
> +- 0: Soft offline is disabled.
> +- 1: Soft offline is enabled (excluding HugeTLB pages).
> +- 2: Soft offline is enabled (including HugeTLB pages).

Would it be better to keep/inherit the previous documented behavior "1
- Soft offline is enabled (no matter what type of the page is)"? Thus
it will have no impact to users that are very nervous about corrected
memory errors and willing to lose hugetlb page. Something like:

  enum soft_offline {
      SOFT_OFFLINE_DISABLED = 0,
      SOFT_OFFLINE_ENABLED,
      SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
      // SOFT_OFFLINE_ENABLED_SKIP_XXX...
  };

> +
> +The default is 1.
> +
> +If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
>
>  It is worth mentioning that after setting enable_soft_offline to 0, the
>  following requests to soft offline pages will not be performed:
>
> +- Request to soft offline from sysfs (soft_offline_page).
> +
>  - Request to soft offline pages from RAS Correctable Errors Collector.
>
> -- On ARM, the request to soft offline pages from GHES driver.
> +- On ARM and X86, the request to soft offline pages from GHES driver.
>
>  - On PARISC, the request to soft offline pages from Page Deallocation Table.
>
> +Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool.
> +
>  extfrag_threshold
>  =================
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fc30ca4804bf..cb59a99b48c5 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -64,11 +64,18 @@
>  #include "internal.h"
>  #include "ras/ras_event.h"
>
> +enum soft_offline {
> +       SOFT_OFFLINE_DISABLED = 0,
> +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> +       SOFT_OFFLINE_ENABLED
> +};
> +
>  static int sysctl_memory_failure_early_kill __read_mostly;
>
>  static int sysctl_memory_failure_recovery __read_mostly = 1;
>
> -static int sysctl_enable_soft_offline __read_mostly = 1;
> +static int sysctl_enable_soft_offline __read_mostly =
> +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB;
>
>  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>
> @@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec_minmax,
>                 .extra1         = SYSCTL_ZERO,
> -               .extra2         = SYSCTL_ONE,
> +               .extra2         = SYSCTL_TWO,
>         }
>  };
>
> @@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags)
>                 return -EIO;
>         }
>
> -       if (!sysctl_enable_soft_offline) {
> +       if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) {
>                 pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
>                 put_ref_page(pfn, flags);
>                 return -EOPNOTSUPP;
>         }
>
> +       if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) {
> +               if (folio_test_hugetlb(pfn_folio(pfn))) {
> +                       pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
> +                       put_ref_page(pfn, flags);
> +                       return -EOPNOTSUPP;
> +               }
> +       }
> +
>         mutex_lock(&mf_mutex);
>
>         if (PageHWPoison(page)) {
> diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> index f086f0e04756..7e2873cd0a6d 100644
> --- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
> +++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> @@ -1,10 +1,15 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Test soft offline behavior for HugeTLB pages:
> - * - if enable_soft_offline = 0, hugepages should stay intact and soft
> - *   offlining failed with EOPNOTSUPP.
> - * - if enable_soft_offline = 1, a hugepage should be dissolved and
> - *   nr_hugepages/free_hugepages should be reduced by 1.
> + *
> + * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages
> + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> + *
> + * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages
> + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> + *
> + * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be
> + *   dissolved and nr_hugepages/free_hugepages should be reduced by 1.
>   *
>   * Before running, make sure more than 2 hugepages of default_hugepagesz
>   * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
> @@ -32,6 +37,12 @@
>
>  #define EPREFIX " !!! "
>
> +enum soft_offline {
> +       SOFT_OFFLINE_DISABLED = 0,
> +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> +       SOFT_OFFLINE_ENABLED
> +};
> +
>  static int do_soft_offline(int fd, size_t len, int expect_errno)
>  {
>         char *filemap = NULL;
> @@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value)
>         char cmd[256] = {0};
>         FILE *cmdfile = NULL;
>
> -       if (value != 0 && value != 1)
> +       if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED)
>                 return -EINVAL;
>
>         sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
> @@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
>  static void test_soft_offline_common(int enable_soft_offline)
>  {
>         int fd;
> -       int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
> +       int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP;
>         struct statfs file_stat;
>         unsigned long hugepagesize_kb = 0;
>         unsigned long nr_hugepages_before = 0;
> @@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline)
>         // No need for the hugetlbfs file from now on.
>         close(fd);
>
> -       if (enable_soft_offline) {
> +       if (enable_soft_offline == SOFT_OFFLINE_ENABLED) {
>                 if (nr_hugepages_before != nr_hugepages_after + 1) {
>                         ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
>                         return;
> @@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline)
>  int main(int argc, char **argv)
>  {
>         ksft_print_header();
> -       ksft_set_plan(2);
> +       ksft_set_plan(3);
>
> -       test_soft_offline_common(1);
> -       test_soft_offline_common(0);
> +       test_soft_offline_common(SOFT_OFFLINE_ENABLED);
> +       test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB);
> +       test_soft_offline_common(SOFT_OFFLINE_DISABLED);

Thanks for updating the test code! Looks good to me.

>
>         ksft_finished();
>  }
> --
> 2.51.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-10 16:44 ` Jiaqi Yan
@ 2025-09-10 17:50   ` Kyle Meyer
  2025-09-11 21:26     ` Jiaqi Yan
  0 siblings, 1 reply; 10+ messages in thread
From: Kyle Meyer @ 2025-09-10 17:50 UTC (permalink / raw)
  To: Jiaqi Yan
  Cc: akpm, corbet, david, linmiaohe, shuah, tony.luck, Liam.Howlett,
	bp, hannes, jack, jane.chu, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On Wed, Sep 10, 2025 at 09:44:24AM -0700, Jiaqi Yan wrote:
> On Wed, Sep 10, 2025 at 9:16 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> >
> > Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> > Since HugeTLB pages are preallocated, reducing the available HugeTLB
> > page pool can cause allocation failures.
> >
> > /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> > disable/enable soft offline:
> >
> > 0 - Soft offline is disabled.
> > 1 - Soft offline is enabled.
> >
> > The current sysctl interface does not distinguish between HugeTLB pages
> > and other page types.
> >
> > Disable soft offline for HugeTLB pages by default (1) and extend the
> > sysctl interface to preserve existing behavior (2):
> >
> > 0 - Soft offline is disabled.
> > 1 - Soft offline is enabled (excluding HugeTLB pages).
> > 2 - Soft offline is enabled (including HugeTLB pages).
> >
> > Update documentation for the sysctl interface, reference the sysctl
> > interface in the sysfs ABI documentation, and update HugeTLB soft
> > offline selftests.
> >
> > Reported-by: Shawn Fan <shawn.fan@intel.com>
> > Suggested-by: Tony Luck <tony.luck@intel.com>
> > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > ---
> >
> > Tony's original patch disabled soft offline for HugeTLB pages when
> > a correctable memory error reported via GHES (with "error threshold
> > exceeded" set) happened to be on a HugeTLB page:
> >
> > https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com 
> >
> > This patch disables soft offline for HugeTLB pages by default
> > (not just from GHES).
> >
> > ---
> >  .../ABI/testing/sysfs-memory-page-offline     |  6 ++++
> >  Documentation/admin-guide/sysctl/vm.rst       | 18 ++++++++---
> >  mm/memory-failure.c                           | 21 ++++++++++--
> >  .../selftests/mm/hugetlb-soft-offline.c       | 32 +++++++++++++------
> >  4 files changed, 60 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
> > index 00f4e35f916f..befb89ae39ec 100644
> > --- a/Documentation/ABI/testing/sysfs-memory-page-offline
> > +++ b/Documentation/ABI/testing/sysfs-memory-page-offline
> > @@ -20,6 +20,12 @@ Description:
> >                 number, or a error when the offlining failed.  Reading
> >                 the file is not allowed.
> >
> > +               Soft-offline can be disabled/enabled via sysctl:
> > +               /proc/sys/vm/enable_soft_offline
> > +
> > +               For details, see:
> > +               Documentation/admin-guide/sysctl/vm.rst
> > +
> >  What:          /sys/devices/system/memory/hard_offline_page
> >  Date:          Sep 2009
> >  KernelVersion: 2.6.33
> > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> > index 4d71211fdad8..ae56372bd604 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and
> >  HugeTLB cases.
> >
> >  For all architectures, enable_soft_offline controls whether to soft offline
> > -memory pages.  When set to 1, kernel attempts to soft offline the pages
> > -whenever it thinks needed.  When set to 0, kernel returns EOPNOTSUPP to
> > -the request to soft offline the pages.  Its default value is 1.
> > +memory pages:
> > +
> > +- 0: Soft offline is disabled.
> > +- 1: Soft offline is enabled (excluding HugeTLB pages).
> > +- 2: Soft offline is enabled (including HugeTLB pages).
> 
> Would it be better to keep/inherit the previous documented behavior "1
> - Soft offline is enabled (no matter what type of the page is)"? Thus
> it will have no impact to users that are very nervous about corrected
> memory errors and willing to lose hugetlb page. Something like:
> 
>   enum soft_offline {
>       SOFT_OFFLINE_DISABLED = 0,
>       SOFT_OFFLINE_ENABLED,
>       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
>       // SOFT_OFFLINE_ENABLED_SKIP_XXX...
>   };

I don't have a strong opinion on the default because there's a sysctl
interface, but that seems reasonable. I'll wait for more feedback before
putting together a v2.

> > +
> > +The default is 1.
> > +
> > +If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
> >
> >  It is worth mentioning that after setting enable_soft_offline to 0, the
> >  following requests to soft offline pages will not be performed:
> >
> > +- Request to soft offline from sysfs (soft_offline_page).
> > +
> >  - Request to soft offline pages from RAS Correctable Errors Collector.
> >
> > -- On ARM, the request to soft offline pages from GHES driver.
> > +- On ARM and X86, the request to soft offline pages from GHES driver.
> >
> >  - On PARISC, the request to soft offline pages from Page Deallocation Table.
> >
> > +Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool.
> > +
> >  extfrag_threshold
> >  =================
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index fc30ca4804bf..cb59a99b48c5 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -64,11 +64,18 @@
> >  #include "internal.h"
> >  #include "ras/ras_event.h"
> >
> > +enum soft_offline {
> > +       SOFT_OFFLINE_DISABLED = 0,
> > +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> > +       SOFT_OFFLINE_ENABLED
> > +};
> > +
> >  static int sysctl_memory_failure_early_kill __read_mostly;
> >
> >  static int sysctl_memory_failure_recovery __read_mostly = 1;
> >
> > -static int sysctl_enable_soft_offline __read_mostly = 1;
> > +static int sysctl_enable_soft_offline __read_mostly =
> > +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB;
> >
> >  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> >
> > @@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = {
> >                 .mode           = 0644,
> >                 .proc_handler   = proc_dointvec_minmax,
> >                 .extra1         = SYSCTL_ZERO,
> > -               .extra2         = SYSCTL_ONE,
> > +               .extra2         = SYSCTL_TWO,
> >         }
> >  };
> >
> > @@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags)
> >                 return -EIO;
> >         }
> >
> > -       if (!sysctl_enable_soft_offline) {
> > +       if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) {
> >                 pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
> >                 put_ref_page(pfn, flags);
> >                 return -EOPNOTSUPP;
> >         }
> >
> > +       if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) {
> > +               if (folio_test_hugetlb(pfn_folio(pfn))) {
> > +                       pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
> > +                       put_ref_page(pfn, flags);
> > +                       return -EOPNOTSUPP;
> > +               }
> > +       }
> > +
> >         mutex_lock(&mf_mutex);
> >
> >         if (PageHWPoison(page)) {
> > diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > index f086f0e04756..7e2873cd0a6d 100644
> > --- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > +++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > @@ -1,10 +1,15 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  /*
> >   * Test soft offline behavior for HugeTLB pages:
> > - * - if enable_soft_offline = 0, hugepages should stay intact and soft
> > - *   offlining failed with EOPNOTSUPP.
> > - * - if enable_soft_offline = 1, a hugepage should be dissolved and
> > - *   nr_hugepages/free_hugepages should be reduced by 1.
> > + *
> > + * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages
> > + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> > + *
> > + * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages
> > + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> > + *
> > + * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be
> > + *   dissolved and nr_hugepages/free_hugepages should be reduced by 1.
> >   *
> >   * Before running, make sure more than 2 hugepages of default_hugepagesz
> >   * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
> > @@ -32,6 +37,12 @@
> >
> >  #define EPREFIX " !!! "
> >
> > +enum soft_offline {
> > +       SOFT_OFFLINE_DISABLED = 0,
> > +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> > +       SOFT_OFFLINE_ENABLED
> > +};
> > +
> >  static int do_soft_offline(int fd, size_t len, int expect_errno)
> >  {
> >         char *filemap = NULL;
> > @@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value)
> >         char cmd[256] = {0};
> >         FILE *cmdfile = NULL;
> >
> > -       if (value != 0 && value != 1)
> > +       if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED)
> >                 return -EINVAL;
> >
> >         sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
> > @@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
> >  static void test_soft_offline_common(int enable_soft_offline)
> >  {
> >         int fd;
> > -       int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
> > +       int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP;
> >         struct statfs file_stat;
> >         unsigned long hugepagesize_kb = 0;
> >         unsigned long nr_hugepages_before = 0;
> > @@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline)
> >         // No need for the hugetlbfs file from now on.
> >         close(fd);
> >
> > -       if (enable_soft_offline) {
> > +       if (enable_soft_offline == SOFT_OFFLINE_ENABLED) {
> >                 if (nr_hugepages_before != nr_hugepages_after + 1) {
> >                         ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
> >                         return;
> > @@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline)
> >  int main(int argc, char **argv)
> >  {
> >         ksft_print_header();
> > -       ksft_set_plan(2);
> > +       ksft_set_plan(3);
> >
> > -       test_soft_offline_common(1);
> > -       test_soft_offline_common(0);
> > +       test_soft_offline_common(SOFT_OFFLINE_ENABLED);
> > +       test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB);
> > +       test_soft_offline_common(SOFT_OFFLINE_DISABLED);
> 
> Thanks for updating the test code! Looks good to me.
> 
> >
> >         ksft_finished();
> >  }
> > --
> > 2.51.0
> >


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-10 16:15 [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default Kyle Meyer
  2025-09-10 16:44 ` Jiaqi Yan
@ 2025-09-10 18:05 ` jane.chu
  2025-09-11  8:46 ` David Hildenbrand
  2 siblings, 0 replies; 10+ messages in thread
From: jane.chu @ 2025-09-10 18:05 UTC (permalink / raw)
  To: Kyle Meyer, akpm, corbet, david, linmiaohe, shuah, tony.luck
  Cc: Liam.Howlett, bp, hannes, jack, jiaqiyan, joel.granados,
	laoar.shao, lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi,
	osalvador, rafael.j.wysocki, rppt, russ.anderson, shawn.fan,
	surenb, vbabka, linux-acpi, linux-doc, linux-kernel,
	linux-kselftest, linux-mm


On 9/10/2025 9:15 AM, Kyle Meyer wrote:
> Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> Since HugeTLB pages are preallocated, reducing the available HugeTLB
> page pool can cause allocation failures.
> 
> /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> disable/enable soft offline:
> 
> 0 - Soft offline is disabled.
> 1 - Soft offline is enabled.
> 
> The current sysctl interface does not distinguish between HugeTLB pages
> and other page types.
> 
> Disable soft offline for HugeTLB pages by default (1) and extend the
> sysctl interface to preserve existing behavior (2):
> 
> 0 - Soft offline is disabled.
> 1 - Soft offline is enabled (excluding HugeTLB pages).
> 2 - Soft offline is enabled (including HugeTLB pages).
> 
> Update documentation for the sysctl interface, reference the sysctl
> interface in the sysfs ABI documentation, and update HugeTLB soft
> offline selftests.
> 
> Reported-by: Shawn Fan <shawn.fan@intel.com>
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> ---
> 
> Tony's original patch disabled soft offline for HugeTLB pages when
> a correctable memory error reported via GHES (with "error threshold
> exceeded" set) happened to be on a HugeTLB page:
> 
> https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
> 
> This patch disables soft offline for HugeTLB pages by default
> (not just from GHES).
> 
> ---
>   .../ABI/testing/sysfs-memory-page-offline     |  6 ++++
>   Documentation/admin-guide/sysctl/vm.rst       | 18 ++++++++---
>   mm/memory-failure.c                           | 21 ++++++++++--
>   .../selftests/mm/hugetlb-soft-offline.c       | 32 +++++++++++++------
>   4 files changed, 60 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
> index 00f4e35f916f..befb89ae39ec 100644
> --- a/Documentation/ABI/testing/sysfs-memory-page-offline
> +++ b/Documentation/ABI/testing/sysfs-memory-page-offline
> @@ -20,6 +20,12 @@ Description:
>   		number, or a error when the offlining failed.  Reading
>   		the file is not allowed.
>   
> +		Soft-offline can be disabled/enabled via sysctl:
> +		/proc/sys/vm/enable_soft_offline
> +
> +		For details, see:
> +		Documentation/admin-guide/sysctl/vm.rst
> +
>   What:		/sys/devices/system/memory/hard_offline_page
>   Date:		Sep 2009
>   KernelVersion:	2.6.33
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 4d71211fdad8..ae56372bd604 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and
>   HugeTLB cases.
>   
>   For all architectures, enable_soft_offline controls whether to soft offline
> -memory pages.  When set to 1, kernel attempts to soft offline the pages
> -whenever it thinks needed.  When set to 0, kernel returns EOPNOTSUPP to
> -the request to soft offline the pages.  Its default value is 1.
> +memory pages:
> +
> +- 0: Soft offline is disabled.
> +- 1: Soft offline is enabled (excluding HugeTLB pages).
> +- 2: Soft offline is enabled (including HugeTLB pages).
> +
> +The default is 1.
> +
> +If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
>   
>   It is worth mentioning that after setting enable_soft_offline to 0, the
>   following requests to soft offline pages will not be performed:
>   
> +- Request to soft offline from sysfs (soft_offline_page).
> +
>   - Request to soft offline pages from RAS Correctable Errors Collector.
>   
> -- On ARM, the request to soft offline pages from GHES driver.
> +- On ARM and X86, the request to soft offline pages from GHES driver.
>   
>   - On PARISC, the request to soft offline pages from Page Deallocation Table.
>   
> +Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool.
> +
>   extfrag_threshold
>   =================
>   
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fc30ca4804bf..cb59a99b48c5 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -64,11 +64,18 @@
>   #include "internal.h"
>   #include "ras/ras_event.h"
>   
> +enum soft_offline {
> +	SOFT_OFFLINE_DISABLED = 0,
> +	SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> +	SOFT_OFFLINE_ENABLED
> +};
> +
>   static int sysctl_memory_failure_early_kill __read_mostly;
>   
>   static int sysctl_memory_failure_recovery __read_mostly = 1;
>   
> -static int sysctl_enable_soft_offline __read_mostly = 1;
> +static int sysctl_enable_soft_offline __read_mostly =
> +	SOFT_OFFLINE_ENABLED_SKIP_HUGETLB;
>   
>   atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>   
> @@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_minmax,
>   		.extra1		= SYSCTL_ZERO,
> -		.extra2		= SYSCTL_ONE,
> +		.extra2		= SYSCTL_TWO,
>   	}
>   };
>   
> @@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags)
>   		return -EIO;
>   	}
>   
> -	if (!sysctl_enable_soft_offline) {
> +	if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) {
>   		pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
>   		put_ref_page(pfn, flags);
>   		return -EOPNOTSUPP;
>   	}
>   
> +	if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) {
> +		if (folio_test_hugetlb(pfn_folio(pfn))) {
> +			pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
> +			put_ref_page(pfn, flags);
> +			return -EOPNOTSUPP;
> +		}
> +	}
> +
>   	mutex_lock(&mf_mutex);
>   
>   	if (PageHWPoison(page)) {
> diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> index f086f0e04756..7e2873cd0a6d 100644
> --- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
> +++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> @@ -1,10 +1,15 @@
>   // SPDX-License-Identifier: GPL-2.0
>   /*
>    * Test soft offline behavior for HugeTLB pages:
> - * - if enable_soft_offline = 0, hugepages should stay intact and soft
> - *   offlining failed with EOPNOTSUPP.
> - * - if enable_soft_offline = 1, a hugepage should be dissolved and
> - *   nr_hugepages/free_hugepages should be reduced by 1.
> + *
> + * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages
> + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> + *
> + * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages
> + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> + *
> + * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be
> + *   dissolved and nr_hugepages/free_hugepages should be reduced by 1.
>    *
>    * Before running, make sure more than 2 hugepages of default_hugepagesz
>    * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
> @@ -32,6 +37,12 @@
>   
>   #define EPREFIX " !!! "
>   
> +enum soft_offline {
> +	SOFT_OFFLINE_DISABLED = 0,
> +	SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> +	SOFT_OFFLINE_ENABLED
> +};
> +
>   static int do_soft_offline(int fd, size_t len, int expect_errno)
>   {
>   	char *filemap = NULL;
> @@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value)
>   	char cmd[256] = {0};
>   	FILE *cmdfile = NULL;
>   
> -	if (value != 0 && value != 1)
> +	if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED)
>   		return -EINVAL;
>   
>   	sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
> @@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
>   static void test_soft_offline_common(int enable_soft_offline)
>   {
>   	int fd;
> -	int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
> +	int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP;
>   	struct statfs file_stat;
>   	unsigned long hugepagesize_kb = 0;
>   	unsigned long nr_hugepages_before = 0;
> @@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline)
>   	// No need for the hugetlbfs file from now on.
>   	close(fd);
>   
> -	if (enable_soft_offline) {
> +	if (enable_soft_offline == SOFT_OFFLINE_ENABLED) {
>   		if (nr_hugepages_before != nr_hugepages_after + 1) {
>   			ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
>   			return;
> @@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline)
>   int main(int argc, char **argv)
>   {
>   	ksft_print_header();
> -	ksft_set_plan(2);
> +	ksft_set_plan(3);
>   
> -	test_soft_offline_common(1);
> -	test_soft_offline_common(0);
> +	test_soft_offline_common(SOFT_OFFLINE_ENABLED);
> +	test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB);
> +	test_soft_offline_common(SOFT_OFFLINE_DISABLED);
>   
>   	ksft_finished();
>   }

Looks good to me.
Reviewed-by: Jane Chu <jane.chu@oracle.com>

Thanks!
-jane




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-10 16:15 [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default Kyle Meyer
  2025-09-10 16:44 ` Jiaqi Yan
  2025-09-10 18:05 ` jane.chu
@ 2025-09-11  8:46 ` David Hildenbrand
  2025-09-11 17:56   ` Luck, Tony
  2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-09-11  8:46 UTC (permalink / raw)
  To: Kyle Meyer, akpm, corbet, linmiaohe, shuah, tony.luck
  Cc: Liam.Howlett, bp, hannes, jack, jane.chu, jiaqiyan,
	joel.granados, laoar.shao, lorenzo.stoakes, mclapinski, mhocko,
	nao.horiguchi, osalvador, rafael.j.wysocki, rppt, russ.anderson,
	shawn.fan, surenb, vbabka, linux-acpi, linux-doc, linux-kernel,
	linux-kselftest, linux-mm

On 10.09.25 18:15, Kyle Meyer wrote:
> Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> Since HugeTLB pages are preallocated, reducing the available HugeTLB
> page pool can cause allocation failures.
> 
> /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> disable/enable soft offline:
> 
> 0 - Soft offline is disabled.
> 1 - Soft offline is enabled.
> 
> The current sysctl interface does not distinguish between HugeTLB pages
> and other page types.
> 
> Disable soft offline for HugeTLB pages by default (1) and extend the
> sysctl interface to preserve existing behavior (2):
> 
> 0 - Soft offline is disabled.
> 1 - Soft offline is enabled (excluding HugeTLB pages).
> 2 - Soft offline is enabled (including HugeTLB pages).
> 
> Update documentation for the sysctl interface, reference the sysctl
> interface in the sysfs ABI documentation, and update HugeTLB soft
> offline selftests.

I'm sure you spotted that the documentation for 
"/sys/devices/system/memory/soft_offline_pag" resides under "testing".

If your read about MADV_SOFT_OFFLINE in the man page it clearly says:

"This feature is intended for testing of memory error-handling code; it 
is available  only if the kernel was configured with CONFIG_MEMORY_FAILURE."

So I'm sorry to say: I miss why we should add all this complexity to 
make a feature used for testing soft-offlining work differently for 
hugetlb folios -- with a testing interface.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-11  8:46 ` David Hildenbrand
@ 2025-09-11 17:56   ` Luck, Tony
  2025-09-11 20:56     ` Kyle Meyer
  2025-09-12  7:53     ` David Hildenbrand
  0 siblings, 2 replies; 10+ messages in thread
From: Luck, Tony @ 2025-09-11 17:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Kyle Meyer, akpm, corbet, linmiaohe, shuah, Liam.Howlett, bp,
	hannes, jack, jane.chu, jiaqiyan, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On Thu, Sep 11, 2025 at 10:46:10AM +0200, David Hildenbrand wrote:
> On 10.09.25 18:15, Kyle Meyer wrote:
> > Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> > Since HugeTLB pages are preallocated, reducing the available HugeTLB
> > page pool can cause allocation failures.
> > 
> > /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> > disable/enable soft offline:
> > 
> > 0 - Soft offline is disabled.
> > 1 - Soft offline is enabled.
> > 
> > The current sysctl interface does not distinguish between HugeTLB pages
> > and other page types.
> > 
> > Disable soft offline for HugeTLB pages by default (1) and extend the
> > sysctl interface to preserve existing behavior (2):
> > 
> > 0 - Soft offline is disabled.
> > 1 - Soft offline is enabled (excluding HugeTLB pages).
> > 2 - Soft offline is enabled (including HugeTLB pages).
> > 
> > Update documentation for the sysctl interface, reference the sysctl
> > interface in the sysfs ABI documentation, and update HugeTLB soft
> > offline selftests.
> 
> I'm sure you spotted that the documentation for
> "/sys/devices/system/memory/soft_offline_pag" resides under "testing".

But that is only one of several places in the kernel that
feed into the page offline code.

This patch was motivated by the GHES path where BIOS indicates
a corrected error threshold was exceeded. There's also the
drivers/ras/cec.c path where Linux does it's own threshold
counting.
> 
> If your read about MADV_SOFT_OFFLINE in the man page it clearly says:
> 
> "This feature is intended for testing of memory error-handling code; it is
> available  only if the kernel was configured with CONFIG_MEMORY_FAILURE."

Agreed that this all depends on CONFIG_MEMORY_FAILURE ... so if any
part of the flow is compiled in when that is "=n" then some
changes are needed to fix that.

> 
> So I'm sorry to say: I miss why we should add all this complexity to make a
> feature used for testing soft-offlining work differently for hugetlb folios
> -- with a testing interface.
> 
> -- 
> Cheers
> 
> David / dhildenb

-Tony


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-11 17:56   ` Luck, Tony
@ 2025-09-11 20:56     ` Kyle Meyer
  2025-09-12  7:53     ` David Hildenbrand
  1 sibling, 0 replies; 10+ messages in thread
From: Kyle Meyer @ 2025-09-11 20:56 UTC (permalink / raw)
  To: Luck, Tony
  Cc: David Hildenbrand, akpm, corbet, linmiaohe, shuah, Liam.Howlett,
	bp, hannes, jack, jane.chu, jiaqiyan, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On Thu, Sep 11, 2025 at 10:56:36AM -0700, Luck, Tony wrote:
> On Thu, Sep 11, 2025 at 10:46:10AM +0200, David Hildenbrand wrote:
> > On 10.09.25 18:15, Kyle Meyer wrote:
> > > Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> > > Since HugeTLB pages are preallocated, reducing the available HugeTLB
> > > page pool can cause allocation failures.
> > > 
> > > /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> > > disable/enable soft offline:
> > > 
> > > 0 - Soft offline is disabled.
> > > 1 - Soft offline is enabled.
> > > 
> > > The current sysctl interface does not distinguish between HugeTLB pages
> > > and other page types.
> > > 
> > > Disable soft offline for HugeTLB pages by default (1) and extend the
> > > sysctl interface to preserve existing behavior (2):
> > > 
> > > 0 - Soft offline is disabled.
> > > 1 - Soft offline is enabled (excluding HugeTLB pages).
> > > 2 - Soft offline is enabled (including HugeTLB pages).
> > > 
> > > Update documentation for the sysctl interface, reference the sysctl
> > > interface in the sysfs ABI documentation, and update HugeTLB soft
> > > offline selftests.
> > 
> > I'm sure you spotted that the documentation for
> > "/sys/devices/system/memory/soft_offline_pag" resides under "testing".
> 
> But that is only one of several places in the kernel that
> feed into the page offline code.
> 
> This patch was motivated by the GHES path where BIOS indicates
> a corrected error threshold was exceeded. There's also the
> drivers/ras/cec.c path where Linux does it's own threshold
> counting.
> > 
> > If your read about MADV_SOFT_OFFLINE in the man page it clearly says:
> > 
> > "This feature is intended for testing of memory error-handling code; it is
> > available  only if the kernel was configured with CONFIG_MEMORY_FAILURE."
> 
> Agreed that this all depends on CONFIG_MEMORY_FAILURE ... so if any
> part of the flow is compiled in when that is "=n" then some
> changes are needed to fix that.
> 
> > 
> > So I'm sorry to say: I miss why we should add all this complexity to make a
> > feature used for testing soft-offlining work differently for hugetlb folios
> > -- with a testing interface.

I would also like to note that the current sysctl interface already affects
testing interfaces. Please see the following commit:

56374430c5dfc ("mm/memory-failure: userspace controls soft-offlining pages")

The sysctl interface should probably be mentioned in
sysfs-memory-page-offline with or without this patch.

Thanks,
Kyle Meyer


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-10 17:50   ` Kyle Meyer
@ 2025-09-11 21:26     ` Jiaqi Yan
  0 siblings, 0 replies; 10+ messages in thread
From: Jiaqi Yan @ 2025-09-11 21:26 UTC (permalink / raw)
  To: Kyle Meyer
  Cc: akpm, corbet, david, linmiaohe, shuah, tony.luck, Liam.Howlett,
	bp, hannes, jack, jane.chu, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On Wed, Sep 10, 2025 at 10:50 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
>
> On Wed, Sep 10, 2025 at 09:44:24AM -0700, Jiaqi Yan wrote:
> > On Wed, Sep 10, 2025 at 9:16 AM Kyle Meyer <kyle.meyer@hpe.com> wrote:
> > >
> > > Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> > > Since HugeTLB pages are preallocated, reducing the available HugeTLB
> > > page pool can cause allocation failures.
> > >
> > > /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> > > disable/enable soft offline:
> > >
> > > 0 - Soft offline is disabled.
> > > 1 - Soft offline is enabled.
> > >
> > > The current sysctl interface does not distinguish between HugeTLB pages
> > > and other page types.
> > >
> > > Disable soft offline for HugeTLB pages by default (1) and extend the
> > > sysctl interface to preserve existing behavior (2):
> > >
> > > 0 - Soft offline is disabled.
> > > 1 - Soft offline is enabled (excluding HugeTLB pages).
> > > 2 - Soft offline is enabled (including HugeTLB pages).
> > >
> > > Update documentation for the sysctl interface, reference the sysctl
> > > interface in the sysfs ABI documentation, and update HugeTLB soft
> > > offline selftests.
> > >
> > > Reported-by: Shawn Fan <shawn.fan@intel.com>
> > > Suggested-by: Tony Luck <tony.luck@intel.com>
> > > Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
> > > ---
> > >
> > > Tony's original patch disabled soft offline for HugeTLB pages when
> > > a correctable memory error reported via GHES (with "error threshold
> > > exceeded" set) happened to be on a HugeTLB page:
> > >
> > > https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
> > >
> > > This patch disables soft offline for HugeTLB pages by default
> > > (not just from GHES).
> > >
> > > ---
> > >  .../ABI/testing/sysfs-memory-page-offline     |  6 ++++
> > >  Documentation/admin-guide/sysctl/vm.rst       | 18 ++++++++---
> > >  mm/memory-failure.c                           | 21 ++++++++++--
> > >  .../selftests/mm/hugetlb-soft-offline.c       | 32 +++++++++++++------
> > >  4 files changed, 60 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
> > > index 00f4e35f916f..befb89ae39ec 100644
> > > --- a/Documentation/ABI/testing/sysfs-memory-page-offline
> > > +++ b/Documentation/ABI/testing/sysfs-memory-page-offline
> > > @@ -20,6 +20,12 @@ Description:
> > >                 number, or a error when the offlining failed.  Reading
> > >                 the file is not allowed.
> > >
> > > +               Soft-offline can be disabled/enabled via sysctl:
> > > +               /proc/sys/vm/enable_soft_offline
> > > +
> > > +               For details, see:
> > > +               Documentation/admin-guide/sysctl/vm.rst
> > > +
> > >  What:          /sys/devices/system/memory/hard_offline_page
> > >  Date:          Sep 2009
> > >  KernelVersion: 2.6.33
> > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> > > index 4d71211fdad8..ae56372bd604 100644
> > > --- a/Documentation/admin-guide/sysctl/vm.rst
> > > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > > @@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and
> > >  HugeTLB cases.
> > >
> > >  For all architectures, enable_soft_offline controls whether to soft offline
> > > -memory pages.  When set to 1, kernel attempts to soft offline the pages
> > > -whenever it thinks needed.  When set to 0, kernel returns EOPNOTSUPP to
> > > -the request to soft offline the pages.  Its default value is 1.
> > > +memory pages:
> > > +
> > > +- 0: Soft offline is disabled.
> > > +- 1: Soft offline is enabled (excluding HugeTLB pages).
> > > +- 2: Soft offline is enabled (including HugeTLB pages).
> >
> > Would it be better to keep/inherit the previous documented behavior "1
> > - Soft offline is enabled (no matter what type of the page is)"? Thus
> > it will have no impact to users that are very nervous about corrected
> > memory errors and willing to lose hugetlb page. Something like:
> >
> >   enum soft_offline {
> >       SOFT_OFFLINE_DISABLED = 0,
> >       SOFT_OFFLINE_ENABLED,
> >       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> >       // SOFT_OFFLINE_ENABLED_SKIP_XXX...
> >   };
>
> I don't have a strong opinion on the default because there's a sysctl
> interface, but that seems reasonable. I'll wait for more feedback before
> putting together a v2.

Yeah, no strong opinion from me either, as long as
SOFT_OFFLINE_DISABLED is still 0 (used by our fleet).

In case you don't need to send out v2:

Reviewed-by: Jiaqi Yan <jiaqiyan@google.com>

>
> > > +
> > > +The default is 1.
> > > +
> > > +If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
> > >
> > >  It is worth mentioning that after setting enable_soft_offline to 0, the
> > >  following requests to soft offline pages will not be performed:
> > >
> > > +- Request to soft offline from sysfs (soft_offline_page).
> > > +
> > >  - Request to soft offline pages from RAS Correctable Errors Collector.
> > >
> > > -- On ARM, the request to soft offline pages from GHES driver.
> > > +- On ARM and X86, the request to soft offline pages from GHES driver.
> > >
> > >  - On PARISC, the request to soft offline pages from Page Deallocation Table.
> > >
> > > +Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool.
> > > +
> > >  extfrag_threshold
> > >  =================
> > >
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index fc30ca4804bf..cb59a99b48c5 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -64,11 +64,18 @@
> > >  #include "internal.h"
> > >  #include "ras/ras_event.h"
> > >
> > > +enum soft_offline {
> > > +       SOFT_OFFLINE_DISABLED = 0,
> > > +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> > > +       SOFT_OFFLINE_ENABLED
> > > +};
> > > +
> > >  static int sysctl_memory_failure_early_kill __read_mostly;
> > >
> > >  static int sysctl_memory_failure_recovery __read_mostly = 1;
> > >
> > > -static int sysctl_enable_soft_offline __read_mostly = 1;
> > > +static int sysctl_enable_soft_offline __read_mostly =
> > > +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB;
> > >
> > >  atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> > >
> > > @@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = {
> > >                 .mode           = 0644,
> > >                 .proc_handler   = proc_dointvec_minmax,
> > >                 .extra1         = SYSCTL_ZERO,
> > > -               .extra2         = SYSCTL_ONE,
> > > +               .extra2         = SYSCTL_TWO,
> > >         }
> > >  };
> > >
> > > @@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags)
> > >                 return -EIO;
> > >         }
> > >
> > > -       if (!sysctl_enable_soft_offline) {
> > > +       if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) {
> > >                 pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
> > >                 put_ref_page(pfn, flags);
> > >                 return -EOPNOTSUPP;
> > >         }
> > >
> > > +       if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) {
> > > +               if (folio_test_hugetlb(pfn_folio(pfn))) {
> > > +                       pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
> > > +                       put_ref_page(pfn, flags);
> > > +                       return -EOPNOTSUPP;
> > > +               }
> > > +       }
> > > +
> > >         mutex_lock(&mf_mutex);
> > >
> > >         if (PageHWPoison(page)) {
> > > diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > > index f086f0e04756..7e2873cd0a6d 100644
> > > --- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > > +++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
> > > @@ -1,10 +1,15 @@
> > >  // SPDX-License-Identifier: GPL-2.0
> > >  /*
> > >   * Test soft offline behavior for HugeTLB pages:
> > > - * - if enable_soft_offline = 0, hugepages should stay intact and soft
> > > - *   offlining failed with EOPNOTSUPP.
> > > - * - if enable_soft_offline = 1, a hugepage should be dissolved and
> > > - *   nr_hugepages/free_hugepages should be reduced by 1.
> > > + *
> > > + * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages
> > > + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> > > + *
> > > + * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages
> > > + *   should stay intact and soft offlining failed with EOPNOTSUPP.
> > > + *
> > > + * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be
> > > + *   dissolved and nr_hugepages/free_hugepages should be reduced by 1.
> > >   *
> > >   * Before running, make sure more than 2 hugepages of default_hugepagesz
> > >   * are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
> > > @@ -32,6 +37,12 @@
> > >
> > >  #define EPREFIX " !!! "
> > >
> > > +enum soft_offline {
> > > +       SOFT_OFFLINE_DISABLED = 0,
> > > +       SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
> > > +       SOFT_OFFLINE_ENABLED
> > > +};
> > > +
> > >  static int do_soft_offline(int fd, size_t len, int expect_errno)
> > >  {
> > >         char *filemap = NULL;
> > > @@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value)
> > >         char cmd[256] = {0};
> > >         FILE *cmdfile = NULL;
> > >
> > > -       if (value != 0 && value != 1)
> > > +       if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED)
> > >                 return -EINVAL;
> > >
> > >         sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
> > > @@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
> > >  static void test_soft_offline_common(int enable_soft_offline)
> > >  {
> > >         int fd;
> > > -       int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
> > > +       int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP;
> > >         struct statfs file_stat;
> > >         unsigned long hugepagesize_kb = 0;
> > >         unsigned long nr_hugepages_before = 0;
> > > @@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline)
> > >         // No need for the hugetlbfs file from now on.
> > >         close(fd);
> > >
> > > -       if (enable_soft_offline) {
> > > +       if (enable_soft_offline == SOFT_OFFLINE_ENABLED) {
> > >                 if (nr_hugepages_before != nr_hugepages_after + 1) {
> > >                         ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
> > >                         return;
> > > @@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline)
> > >  int main(int argc, char **argv)
> > >  {
> > >         ksft_print_header();
> > > -       ksft_set_plan(2);
> > > +       ksft_set_plan(3);
> > >
> > > -       test_soft_offline_common(1);
> > > -       test_soft_offline_common(0);
> > > +       test_soft_offline_common(SOFT_OFFLINE_ENABLED);
> > > +       test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB);
> > > +       test_soft_offline_common(SOFT_OFFLINE_DISABLED);
> >
> > Thanks for updating the test code! Looks good to me.
> >
> > >
> > >         ksft_finished();
> > >  }
> > > --
> > > 2.51.0
> > >


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-11 17:56   ` Luck, Tony
  2025-09-11 20:56     ` Kyle Meyer
@ 2025-09-12  7:53     ` David Hildenbrand
  2025-09-12 15:17       ` Kyle Meyer
  1 sibling, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-09-12  7:53 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Kyle Meyer, akpm, corbet, linmiaohe, shuah, Liam.Howlett, bp,
	hannes, jack, jane.chu, jiaqiyan, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On 11.09.25 19:56, Luck, Tony wrote:
> On Thu, Sep 11, 2025 at 10:46:10AM +0200, David Hildenbrand wrote:
>> On 10.09.25 18:15, Kyle Meyer wrote:
>>> Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
>>> Since HugeTLB pages are preallocated, reducing the available HugeTLB
>>> page pool can cause allocation failures.
>>>
>>> /proc/sys/vm/enable_soft_offline provides a sysctl interface to
>>> disable/enable soft offline:
>>>
>>> 0 - Soft offline is disabled.
>>> 1 - Soft offline is enabled.
>>>
>>> The current sysctl interface does not distinguish between HugeTLB pages
>>> and other page types.
>>>
>>> Disable soft offline for HugeTLB pages by default (1) and extend the
>>> sysctl interface to preserve existing behavior (2):
>>>
>>> 0 - Soft offline is disabled.
>>> 1 - Soft offline is enabled (excluding HugeTLB pages).
>>> 2 - Soft offline is enabled (including HugeTLB pages).
>>>
>>> Update documentation for the sysctl interface, reference the sysctl
>>> interface in the sysfs ABI documentation, and update HugeTLB soft
>>> offline selftests.
>>
>> I'm sure you spotted that the documentation for
>> "/sys/devices/system/memory/soft_offline_pag" resides under "testing".
> 
> But that is only one of several places in the kernel that
> feed into the page offline code.

Right, I can see one more call to soft_offline_page() from 
arch/parisc/kernel/pdt.c.

And there is memory_failure_work_func() that I missed.

So agreed that this goes beyond testing.

It caught my attention because you ended up modifying documentation 
residing in Documentation/ABI/testing/sysfs-memory-page-offline.

Reading 56374430c5dfc that Kyle pointed out is gets clearer.

So the patch motivation/idea makes sense to me.


I'll note two things:

(1) The interface design is not really extensible. Imagine if we want to 
exclude yet another page type.

Can we maybe add a second interface that defines a filter for types?

Alternatively, you could use all the remaining flags as such a filter.

0 - Soft offline is completely disabled.
1 - Soft offline is enabled except for manually disabled types.

Filter

2 - disable hugetlb.

So value 3 would give you "enable all except hugetlb" etc.

We could add in the future

4 - disable guest_memfd (just some random example)


Then you

2) Changing the semantics of the value "1"

IIUC, you are changing the semantics of value "1". It used to mean 
"SOFT_OFFLINE_ENABLED" now it is "SOFT_OFFLINE_ENABLED_SKIP_HUGETLB", 
which is a change in behavior.

If that is the case, I don't think that's okay.


2) I am not sure about changing the default. That should be an admin/
    distro decision.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default
  2025-09-12  7:53     ` David Hildenbrand
@ 2025-09-12 15:17       ` Kyle Meyer
  0 siblings, 0 replies; 10+ messages in thread
From: Kyle Meyer @ 2025-09-12 15:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Luck, Tony, akpm, corbet, linmiaohe, shuah, Liam.Howlett, bp,
	hannes, jack, jane.chu, jiaqiyan, joel.granados, laoar.shao,
	lorenzo.stoakes, mclapinski, mhocko, nao.horiguchi, osalvador,
	rafael.j.wysocki, rppt, russ.anderson, shawn.fan, surenb, vbabka,
	linux-acpi, linux-doc, linux-kernel, linux-kselftest, linux-mm

On Fri, Sep 12, 2025 at 09:53:02AM +0200, David Hildenbrand wrote:
> On 11.09.25 19:56, Luck, Tony wrote:
> > On Thu, Sep 11, 2025 at 10:46:10AM +0200, David Hildenbrand wrote:
> > > On 10.09.25 18:15, Kyle Meyer wrote:
> > > > Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> > > > Since HugeTLB pages are preallocated, reducing the available HugeTLB
> > > > page pool can cause allocation failures.
> > > > 
> > > > /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> > > > disable/enable soft offline:
> > > > 
> > > > 0 - Soft offline is disabled.
> > > > 1 - Soft offline is enabled.
> > > > 
> > > > The current sysctl interface does not distinguish between HugeTLB pages
> > > > and other page types.
> > > > 
> > > > Disable soft offline for HugeTLB pages by default (1) and extend the
> > > > sysctl interface to preserve existing behavior (2):
> > > > 
> > > > 0 - Soft offline is disabled.
> > > > 1 - Soft offline is enabled (excluding HugeTLB pages).
> > > > 2 - Soft offline is enabled (including HugeTLB pages).
> > > > 
> > > > Update documentation for the sysctl interface, reference the sysctl
> > > > interface in the sysfs ABI documentation, and update HugeTLB soft
> > > > offline selftests.
> > > 
> > > I'm sure you spotted that the documentation for
> > > "/sys/devices/system/memory/soft_offline_pag" resides under "testing".
> > 
> > But that is only one of several places in the kernel that
> > feed into the page offline code.
> 
> Right, I can see one more call to soft_offline_page() from
> arch/parisc/kernel/pdt.c.
> 
> And there is memory_failure_work_func() that I missed.
> 
> So agreed that this goes beyond testing.
> 
> It caught my attention because you ended up modifying documentation residing
> in Documentation/ABI/testing/sysfs-memory-page-offline.
> 
> Reading 56374430c5dfc that Kyle pointed out is gets clearer.
> 
> So the patch motivation/idea makes sense to me.
> 
> 
> I'll note two things:
> 
> (1) The interface design is not really extensible. Imagine if we want to
> exclude yet another page type.
> 
> Can we maybe add a second interface that defines a filter for types?
> 
> Alternatively, you could use all the remaining flags as such a filter.
> 
> 0 - Soft offline is completely disabled.
> 1 - Soft offline is enabled except for manually disabled types.
> 
> Filter
> 
> 2 - disable hugetlb.
> 
> So value 3 would give you "enable all except hugetlb" etc.
> 
> We could add in the future
> 
> 4 - disable guest_memfd (just some random example)
> 
> 
> Then you
> 
> 2) Changing the semantics of the value "1"
> 
> IIUC, you are changing the semantics of value "1". It used to mean
> "SOFT_OFFLINE_ENABLED" now it is "SOFT_OFFLINE_ENABLED_SKIP_HUGETLB", which
> is a change in behavior.
> 
> If that is the case, I don't think that's okay.
> 
> 
> 2) I am not sure about changing the default. That should be an admin/
>    distro decision.

Thank you, that sounds good to me. I'll put something together.

Thanks,
Kyle Meyer


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-09-12 15:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-10 16:15 [PATCH] mm/memory-failure: Disable soft offline for HugeTLB pages by default Kyle Meyer
2025-09-10 16:44 ` Jiaqi Yan
2025-09-10 17:50   ` Kyle Meyer
2025-09-11 21:26     ` Jiaqi Yan
2025-09-10 18:05 ` jane.chu
2025-09-11  8:46 ` David Hildenbrand
2025-09-11 17:56   ` Luck, Tony
2025-09-11 20:56     ` Kyle Meyer
2025-09-12  7:53     ` David Hildenbrand
2025-09-12 15:17       ` Kyle Meyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox