* [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl
@ 2025-12-03 6:38 Gregory Price
2025-12-03 9:26 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 5+ messages in thread
From: Gregory Price @ 2025-12-03 6:38 UTC (permalink / raw)
To: linux-mm
Cc: kernel-team, linux-kernel, linux-doc, david, osalvador, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, muchun.song, David Hildenbrand, Mel Gorman,
Alexandru Moise, David Rientjes
This reintroduces a concept removed by:
commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")
This sysctl provides flexibility between ZONE_MOVABLE use cases:
1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable
When ZONE_MOVABLE is used to make huge page allocation more reliable,
disallowing gigantic pages memory in this region is pointless. If
hotplug is not a requirement, we can loosen the restrictions to allow
1GB gigantic pages in ZONE_MOVABLE.
Since 1GB can be difficult to migrate / has impacts on compaction /
defragmentation, we don't enable this by default. Notably, 1GB pages
can only be migrated if another 1GB page is available - so hot-unplug
will fail if such a page cannot be found.
However, since there are scenarios where gigantic pages are migratable,
we should allow use of these on movable regions.
Note: Boot-time CMA is not possible for driver-managed hotplug memory,
as CMA requires the memory to be registered as SystemRAM at boot time.
Additionally, 1GB huge pages are not supported by THP.
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
---
Documentation/admin-guide/mm/memory-hotplug.rst | 14 ++++++++++++--
Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++++++++
include/linux/hugetlb.h | 3 ++-
mm/hugetlb.c | 1 -
mm/hugetlb_sysctl.c | 9 +++++++++
5 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index 33c886f3d198..6581558fd0d7 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -612,8 +612,9 @@ ZONE_MOVABLE, especially when fine-tuning zone ratios:
allocations and silently create a zone imbalance, usually triggered by
inflation requests from the hypervisor.
-- Gigantic pages are unmovable, resulting in user space consuming a
- lot of unmovable memory.
+- Gigantic pages are unmovable when an architecture does not support
+ huge page migration and/or the ``movable_gigantic_pages`` sysctl is false.
+ See Documentation/admin-guide/sysctl/vm.rst for more info on this sysctl.
- Huge pages are unmovable when an architectures does not support huge
page migration, resulting in a similar issue as with gigantic pages.
@@ -672,6 +673,15 @@ block might fail:
- Concurrent activity that operates on the same physical memory area, such as
allocating gigantic pages, can result in temporary offlining failures.
+- When an admin sets the ``movable_gigantic_pages`` sysctl to true, gigantic
+ pages are allowed in ZONE_MOVABLE. This only allows migratable gigantic
+ pages to be allocated; however, if there are no eligible destination gigantic
+ pages at offline, the offlining operation will fail.
+
+ Users leveraging ``movable_gigantic_pages`` should weigh the value of
+ ZONE_MOVABLE for increasing the reliability of gigantic page allocation
+ against the potential loss of hot-unplug reliability.
+
- Out of memory when dissolving huge pages, especially when HugeTLB Vmemmap
Optimization (HVO) is enabled.
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..36a390c0561e 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
- mmap_min_addr
- mmap_rnd_bits
- mmap_rnd_compat_bits
+- movable_gigantic_pages
- nr_hugepages
- nr_hugepages_mempolicy
- nr_overcommit_hugepages
@@ -624,6 +625,22 @@ This value can be changed after boot using the
/proc/sys/vm/mmap_rnd_compat_bits tunable
+movable_gigantic_pages
+======================
+
+This parameter controls whether gigantic pages may be allocated from
+ZONE_MOVABLE. If set to non-zero, gigantic pages can be allocated
+from ZONE_MOVABLE. ZONE_MOVABLE memory may be created via the kernel
+boot parameter `kernelcore` or via memory hotplug as discussed in
+Documentation/admin-guide/mm/memory-hotplug.rst.
+
+Support may depend on specific architecture.
+
+Note that using ZONE_MOVABLE gigantic pages may make features like
+memory hotremove more unreliable, as migrating gigantic pages is more
+difficult due to needing larger amounts of physically contiguous memory.
+
+
nr_hugepages
============
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 019a1c5281e4..5c190b22108e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -171,6 +171,7 @@ bool hugetlbfs_pagecache_present(struct hstate *h,
struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio);
+extern int movable_gigantic_pages __read_mostly;
extern int sysctl_hugetlb_shm_group __read_mostly;
extern struct list_head huge_boot_pages[MAX_NUMNODES];
@@ -924,7 +925,7 @@ static inline bool hugepage_movable_supported(struct hstate *h)
if (!hugepage_migration_supported(h))
return false;
- if (hstate_is_gigantic(h))
+ if (hstate_is_gigantic(h) && !movable_gigantic_pages)
return false;
return true;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9e7815b4f058..084d45d5311d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -49,7 +49,6 @@
#include "internal.h"
#include "hugetlb_vmemmap.h"
#include "hugetlb_cma.h"
-#include "hugetlb_internal.h"
#include <linux/page-isolation.h>
int hugetlb_max_hstate __read_mostly;
diff --git a/mm/hugetlb_sysctl.c b/mm/hugetlb_sysctl.c
index bd3077150542..22a9e15e534f 100644
--- a/mm/hugetlb_sysctl.c
+++ b/mm/hugetlb_sysctl.c
@@ -125,6 +125,15 @@ static const struct ctl_table hugetlb_table[] = {
.mode = 0644,
.proc_handler = hugetlb_overcommit_handler,
},
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+ {
+ .procname = "movable_gigantic_pages",
+ .data = &movable_gigantic_pages,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif
};
void __init hugetlb_sysctl_init(void)
--
2.52.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl
2025-12-03 6:38 [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl Gregory Price
@ 2025-12-03 9:26 ` David Hildenbrand (Red Hat)
2025-12-03 9:36 ` Gregory Price
2025-12-04 17:14 ` Gregory Price
0 siblings, 2 replies; 5+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-03 9:26 UTC (permalink / raw)
To: Gregory Price, linux-mm
Cc: kernel-team, linux-kernel, linux-doc, osalvador, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, muchun.song, Mel Gorman, Alexandru Moise, David Rientjes
On 12/3/25 07:38, Gregory Price wrote:
> This reintroduces a concept removed by:
> commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")
>
> This sysctl provides flexibility between ZONE_MOVABLE use cases:
> 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
> 2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable
>
> When ZONE_MOVABLE is used to make huge page allocation more reliable,
> disallowing gigantic pages memory in this region is pointless. If
> hotplug is not a requirement, we can loosen the restrictions to allow
> 1GB gigantic pages in ZONE_MOVABLE.
>
> Since 1GB can be difficult to migrate / has impacts on compaction /
> defragmentation, we don't enable this by default. Notably, 1GB pages
> can only be migrated if another 1GB page is available - so hot-unplug
> will fail if such a page cannot be found.
In light of the other discussion: will it fail or will it simplt retry
forever, until there is a free 1g page?
>
> However, since there are scenarios where gigantic pages are migratable,
> we should allow use of these on movable regions.
>
> Note: Boot-time CMA is not possible for driver-managed hotplug memory,
> as CMA requires the memory to be registered as SystemRAM at boot time.
> Additionally, 1GB huge pages are not supported by THP.
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Alexandru Moise <00moses.alexander00@gmail.com>
> Suggested-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
> ---
> Documentation/admin-guide/mm/memory-hotplug.rst | 14 ++++++++++++--
> Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++++++++
> include/linux/hugetlb.h | 3 ++-
> mm/hugetlb.c | 1 -
> mm/hugetlb_sysctl.c | 9 +++++++++
> 5 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> index 33c886f3d198..6581558fd0d7 100644
> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> @@ -612,8 +612,9 @@ ZONE_MOVABLE, especially when fine-tuning zone ratios:
> allocations and silently create a zone imbalance, usually triggered by
> inflation requests from the hypervisor.
>
> -- Gigantic pages are unmovable, resulting in user space consuming a
> - lot of unmovable memory.
> +- Gigantic pages are unmovable when an architecture does not support
> + huge page migration and/or the ``movable_gigantic_pages`` sysctl is false.
> + See Documentation/admin-guide/sysctl/vm.rst for more info on this sysctl.
>
> - Huge pages are unmovable when an architectures does not support huge
> page migration, resulting in a similar issue as with gigantic pages.
> @@ -672,6 +673,15 @@ block might fail:
> - Concurrent activity that operates on the same physical memory area, such as
> allocating gigantic pages, can result in temporary offlining failures.
>
> +- When an admin sets the ``movable_gigantic_pages`` sysctl to true, gigantic
> + pages are allowed in ZONE_MOVABLE. This only allows migratable gigantic
> + pages to be allocated; however, if there are no eligible destination gigantic
> + pages at offline, the offlining operation will fail.
Same question here.
Nothing else jumped at me, in general as discussed, as long as it is
opt-in behavior
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
--
Cheers
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl
2025-12-03 9:26 ` David Hildenbrand (Red Hat)
@ 2025-12-03 9:36 ` Gregory Price
2025-12-04 17:14 ` Gregory Price
1 sibling, 0 replies; 5+ messages in thread
From: Gregory Price @ 2025-12-03 9:36 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: linux-mm, kernel-team, linux-kernel, linux-doc, osalvador, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, muchun.song, Mel Gorman, Alexandru Moise, David Rientjes
On Wed, Dec 03, 2025 at 10:26:20AM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/3/25 07:38, Gregory Price wrote:
> > This reintroduces a concept removed by:
> > commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")
> >
> > This sysctl provides flexibility between ZONE_MOVABLE use cases:
> > 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
> > 2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable
> >
> > When ZONE_MOVABLE is used to make huge page allocation more reliable,
> > disallowing gigantic pages memory in this region is pointless. If
> > hotplug is not a requirement, we can loosen the restrictions to allow
> > 1GB gigantic pages in ZONE_MOVABLE.
> >
> > Since 1GB can be difficult to migrate / has impacts on compaction /
> > defragmentation, we don't enable this by default. Notably, 1GB pages
> > can only be migrated if another 1GB page is available - so hot-unplug
> > will fail if such a page cannot be found.
>
> In light of the other discussion: will it fail or will it simplt retry
> forever, until there is a free 1g page?
>
...
> > +- When an admin sets the ``movable_gigantic_pages`` sysctl to true, gigantic
> > + pages are allowed in ZONE_MOVABLE. This only allows migratable gigantic
> > + pages to be allocated; however, if there are no eligible destination gigantic
> > + pages at offline, the offlining operation will fail.
>
> Same question here.
>
Hah, great question. I will make a note to try this in the morning.
> Nothing else jumped at me, in general as discussed, as long as it is opt-in
> behavior
>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
>
> --
> Cheers
>
> David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl
2025-12-03 9:26 ` David Hildenbrand (Red Hat)
2025-12-03 9:36 ` Gregory Price
@ 2025-12-04 17:14 ` Gregory Price
2025-12-04 20:58 ` David Hildenbrand (Red Hat)
1 sibling, 1 reply; 5+ messages in thread
From: Gregory Price @ 2025-12-04 17:14 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: linux-mm, kernel-team, linux-kernel, linux-doc, osalvador, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, muchun.song, Mel Gorman, Alexandru Moise, David Rientjes
On Wed, Dec 03, 2025 at 10:26:20AM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/3/25 07:38, Gregory Price wrote:
> > This reintroduces a concept removed by:
> > commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")
> >
> > This sysctl provides flexibility between ZONE_MOVABLE use cases:
> > 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
> > 2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable
> >
> > When ZONE_MOVABLE is used to make huge page allocation more reliable,
> > disallowing gigantic pages memory in this region is pointless. If
> > hotplug is not a requirement, we can loosen the restrictions to allow
> > 1GB gigantic pages in ZONE_MOVABLE.
> >
> > Since 1GB can be difficult to migrate / has impacts on compaction /
> > defragmentation, we don't enable this by default. Notably, 1GB pages
> > can only be migrated if another 1GB page is available - so hot-unplug
> > will fail if such a page cannot be found.
>
> In light of the other discussion: will it fail or will it simplt retry
> forever, until there is a free 1g page?
>
It retries until a 1GB page is available.
Example test:
echo 0 > node0/hugepages/..-1GB/nr_hugepages (dram node)
echo 1 > node1/hugepages/..-1GB/nr_hugepages (zone_movable node)
./alloc_huge & (allocate the page)
./node1_offline & (offline > memory*/state)
sleep 5 (give offline time)
echo 1 > node0/hugepages/..-1GB/nr_hugepages (dram node)
This node1_offline generates migration failures until the last step
occurs, at which point migration and node1_offline complete as expected.
The migration failures produce the following:
[ 707.443105] migrating pfn c080000 failed ret:-12
[ 707.453353] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc080000
[ 707.471315] head: order:18 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
[ 707.488504] anon flags: 0x17ffff0000000848(uptodate|owner_2|head|node=1|zone=3|lastcpupid=0x1ffff)
[ 707.508393] page_type: f4(hugetlb)
[ 707.515940] raw: 17ffff0000000848 ffa000007d873cc0 ffa000007d873cc0 ff1100082366c6e9
[ 707.533126] raw: 0000000000000000 0000000000000010 00000002f4000000 0000000000000000
[ 707.550317] head: 17ffff0000000848 ffa000007d873cc0 ffa000007d873cc0 ff1100082366c6e9
[ 707.567699] head: 0000000000000000 0000000000000010 00000002f4000000 0000000000000000
[ 707.585085] head: 17ffff0000000012 ffd4000302000001 0000000000000000 0000000000000000
[ 707.602469] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000040000
[ 707.619851] page dumped because: migration failure
I can add this to the changelog if you prefer
~Gregory
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl
2025-12-04 17:14 ` Gregory Price
@ 2025-12-04 20:58 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-04 20:58 UTC (permalink / raw)
To: Gregory Price
Cc: linux-mm, kernel-team, linux-kernel, linux-doc, osalvador, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, muchun.song, Mel Gorman, Alexandru Moise, David Rientjes
On 12/4/25 18:14, Gregory Price wrote:
> On Wed, Dec 03, 2025 at 10:26:20AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 12/3/25 07:38, Gregory Price wrote:
>>> This reintroduces a concept removed by:
>>> commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")
>>>
>>> This sysctl provides flexibility between ZONE_MOVABLE use cases:
>>> 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
>>> 2) onlining memory in ZONE_MOVABLE to make hugepage allocate reliable
>>>
>>> When ZONE_MOVABLE is used to make huge page allocation more reliable,
>>> disallowing gigantic pages memory in this region is pointless. If
>>> hotplug is not a requirement, we can loosen the restrictions to allow
>>> 1GB gigantic pages in ZONE_MOVABLE.
>>>
>>> Since 1GB can be difficult to migrate / has impacts on compaction /
>>> defragmentation, we don't enable this by default. Notably, 1GB pages
>>> can only be migrated if another 1GB page is available - so hot-unplug
>>> will fail if such a page cannot be found.
>>
>> In light of the other discussion: will it fail or will it simplt retry
>> forever, until there is a free 1g page?
>>
>
> It retries until a 1GB page is available.
>
> Example test:
>
> echo 0 > node0/hugepages/..-1GB/nr_hugepages (dram node)
> echo 1 > node1/hugepages/..-1GB/nr_hugepages (zone_movable node)
> ./alloc_huge & (allocate the page)
> ./node1_offline & (offline > memory*/state)
> sleep 5 (give offline time)
> echo 1 > node0/hugepages/..-1GB/nr_hugepages (dram node)
>
> This node1_offline generates migration failures until the last step
> occurs, at which point migration and node1_offline complete as expected.
>
> The migration failures produce the following:
>
> [ 707.443105] migrating pfn c080000 failed ret:-12
> [ 707.453353] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc080000
> [ 707.471315] head: order:18 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
> [ 707.488504] anon flags: 0x17ffff0000000848(uptodate|owner_2|head|node=1|zone=3|lastcpupid=0x1ffff)
> [ 707.508393] page_type: f4(hugetlb)
> [ 707.515940] raw: 17ffff0000000848 ffa000007d873cc0 ffa000007d873cc0 ff1100082366c6e9
> [ 707.533126] raw: 0000000000000000 0000000000000010 00000002f4000000 0000000000000000
> [ 707.550317] head: 17ffff0000000848 ffa000007d873cc0 ffa000007d873cc0 ff1100082366c6e9
> [ 707.567699] head: 0000000000000000 0000000000000010 00000002f4000000 0000000000000000
> [ 707.585085] head: 17ffff0000000012 ffd4000302000001 0000000000000000 0000000000000000
> [ 707.602469] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000040000
> [ 707.619851] page dumped because: migration failure
>
>
> I can add this to the changelog if you prefer
Yes, we should document that. I guess it's just what we already document
in the memory hotplug doc: it keeps retrying until there is sufficient
free memory.
--
Cheers
David
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-12-04 20:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-03 6:38 [PATCH v3] mm, hugetlb: implement movable_gigantic_pages sysctl Gregory Price
2025-12-03 9:26 ` David Hildenbrand (Red Hat)
2025-12-03 9:36 ` Gregory Price
2025-12-04 17:14 ` Gregory Price
2025-12-04 20:58 ` David Hildenbrand (Red Hat)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox