From: David Hildenbrand <david@redhat.com>
To: Gregory Price <gourry@gourry.net>, linux-mm@kvack.org
Cc: corbet@lwn.net, muchun.song@linux.dev, osalvador@suse.de,
akpm@linux-foundation.org, hannes@cmpxchg.org,
laoar.shao@gmail.com, brauner@kernel.org, mclapinski@google.com,
joel.granados@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
Michal Hocko <mhocko@suse.com>,
Alexandru Moise <00moses.alexander00@gmail.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH] Revert "mm, hugetlb: remove hugepages_treat_as_movable sysctl"
Date: Wed, 8 Oct 2025 10:58:23 +0200 [thread overview]
Message-ID: <402170e6-c49f-4d28-a010-eb253fc2f923@redhat.com> (raw)
In-Reply-To: <20251007214412.3832340-1-gourry@gourry.net>
On 07.10.25 23:44, Gregory Price wrote:
> This reverts commit d6cb41cc44c63492702281b1d329955ca767d399.
>
> This sysctl provides some flexibility between multiple requirements which
> are difficult to square without adding significantly more complexity.
>
> 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
> 2) onlining memory in ZONE_MOVABLE to prevent GFP_KERNEL usage
> 3) passing NUMA structure through to a virtual machine (node0=vnode0,
> node1=vnode1) so a guest can make good placement decisions.
> 4) utilizing 1GB hugepages for VM host memory to reduce TLB pressure
> 5) Managing device memory after init-time to avoid incidental usage
> at boot (due to being placed in ZONE_NORMAL), or to provide users
> configuration flexibility.
>
> When device-hotplugged memory does not require hot-unplug assurances,
> there is no reason to avoid allowing otherwise non-migratable hugepages
> in this zone. This allows for allocation of 1GB gigantic pages for VMs
> with existing mechanisms.
>
> Boot-time CMA is not possible for driver-managed hotplug memory, as CMA
> requires the memory to be registered as SystemRAM at boot time.
>
> Updated the code to land in appropriate locations since it all moved.
> Updated the documentation to add more context when this is useful.
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Alexandru Moise <00moses.alexander00@gmail.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Suggested-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
> ---
> Documentation/admin-guide/sysctl/vm.rst | 31 +++++++++++++++++++++++++
> include/linux/hugetlb.h | 4 +++-
> mm/hugetlb.c | 9 +++++++
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 4d71211fdad8..c9f26cd447d7 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -40,6 +40,7 @@ Currently, these files are in /proc/sys/vm:
> - enable_soft_offline
> - extfrag_threshold
> - highmem_is_dirtyable
> +- hugepages_treat_as_movable
> - hugetlb_shm_group
> - laptop_mode
> - legacy_va_layout
> @@ -356,6 +357,36 @@ only use the low memory and they can fill it up with dirty data without
> any throttling.
>
>
> +hugepages_treat_as_movable
> +==========================
> +
> +This parameter controls whether otherwise immovable hugepages (e.g. 1GB
> +gigantic pages) may be allocated from from ZONE_MOVABLE. If set to non-zero,
> +gigantic hugepages can be allocated from ZONE_MOVABLE. ZONE_MOVABLE memory
> +may be created via the kernel boot parameter `kernelcore` or via memory
> +hotplug as discussed in Documentation/admin-guide/mm/memory-hotplug.rst.
> +
> +Support may depend on specific architecture and/or the hugepage size. If
> +a hugepage supports migration, allocation from ZONE_MOVABLE is always
> +enabled (for example 2MB on x86) for the hugepage regardless of the value
> +of this parameter. IOW, this parameter affects only non-migratable hugepages.
> +
> +Assuming that hugepages are not migratable in your system, one usecase of
> +this parameter is that users can make hugepage pool more extensible by
> +enabling the allocation from ZONE_MOVABLE. This is because on ZONE_MOVABLE
> +page reclaim/migration/compaction work more and you can get contiguous
> +memory more likely. Note that using ZONE_MOVABLE for non-migratable
> +hugepages can do harm to other features like memory hotremove (because
> +memory hotremove expects that memory blocks on ZONE_MOVABLE are always
> +removable,) so it's a trade-off responsible for the users.
> +
> +One common use-case of this feature is allocate 1GB gigantic pages for
> +virtual machines from otherwise not-hotplugged memory which has been
> +isolated from kernel allocations by being onlined into ZONE_MOVABLE.
> +These pages tend to be allocated and released more explicitly, and so
> +hotplug can still be achieved with appropriate orchestration.
> +
> +
> hugetlb_shm_group
> =================
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 526d27e88b3b..bbaa1b4908b6 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -172,6 +172,7 @@ bool hugetlbfs_pagecache_present(struct hstate *h,
>
> struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio);
>
> +extern int hugepages_treat_as_movable;
> extern int sysctl_hugetlb_shm_group;
> extern struct list_head huge_boot_pages[MAX_NUMNODES];
>
> @@ -926,7 +927,8 @@ static inline gfp_t htlb_alloc_mask(struct hstate *h)
> {
> gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
>
> - gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
> + gfp |= (hugepage_movable_supported(h) || hugepages_treat_as_movable) ?
> + GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
I mean, this is as ugly as it gets.
Can't we just let that old approach RIP where it belongs? :)
If something unmovable, it does not belong on ZONE_MOVABLE, as simple as that.
Something I could sympathize is is treaing gigantic pages that are actually
migratable as movable.
Like
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 526d27e88b3b2..78da85b1308dd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -896,37 +896,12 @@ static inline bool hugepage_migration_supported(struct hstate *h)
return arch_hugetlb_migration_supported(h);
}
-/*
- * Movability check is different as compared to migration check.
- * It determines whether or not a huge page should be placed on
- * movable zone or not. Movability of any huge page should be
- * required only if huge page size is supported for migration.
- * There won't be any reason for the huge page to be movable if
- * it is not migratable to start with. Also the size of the huge
- * page should be large enough to be placed under a movable zone
- * and still feasible enough to be migratable. Just the presence
- * in movable zone does not make the migration feasible.
- *
- * So even though large huge page sizes like the gigantic ones
- * are migratable they should not be movable because its not
- * feasible to migrate them from movable zone.
- */
-static inline bool hugepage_movable_supported(struct hstate *h)
-{
- if (!hugepage_migration_supported(h))
- return false;
-
- if (hstate_is_gigantic(h))
- return false;
- return true;
-}
-
/* Movability of hugepages depends on migration support. */
static inline gfp_t htlb_alloc_mask(struct hstate *h)
{
gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
- gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
+ gfp |= hugepage_migration_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
return gfp;
}
Assume you want to offline part of the ZONE_MOVABLE there might still be sufficient
space to possibly allocate a 1 GiB area elsewhere and actually move the gigantic page.
IIRC, we do the same for memory offlining already.
Now, maybe we want to make the configurable. But then, I would much rather tweak the
hstate_is_gigantic() check in hugepage_movable_supported(). And the parameter
would need a much better name than some "treat as movable".
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-08 8:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-07 21:44 Gregory Price
2025-10-07 21:59 ` Andrew Morton
2025-10-07 22:12 ` Gregory Price
2025-10-08 8:58 ` David Hildenbrand [this message]
2025-10-08 14:18 ` Gregory Price
2025-10-08 14:44 ` David Hildenbrand
2025-10-08 18:58 ` Gregory Price
2025-10-08 19:01 ` David Hildenbrand
2025-10-08 19:44 ` Gregory Price
2025-10-08 19:52 ` David Hildenbrand
2025-10-08 19:59 ` Gregory Price
2025-10-08 14:59 ` Michal Hocko
2025-10-08 15:14 ` David Hildenbrand
2025-10-08 15:23 ` Michal Hocko
2025-10-08 15:43 ` David Hildenbrand
2025-10-08 16:31 ` Gregory Price
2025-10-09 6:14 ` Michal Hocko
2025-10-09 15:29 ` Gregory Price
2025-10-09 18:47 ` Michal Hocko
2025-10-09 18:51 ` David Hildenbrand
2025-10-09 21:31 ` Gregory Price
2025-10-10 7:40 ` David Hildenbrand
2025-10-10 18:53 ` Gregory Price
2025-10-08 16:08 ` Frank van der Linden
2025-10-08 16:39 ` Gregory Price
2025-10-08 17:05 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=402170e6-c49f-4d28-a010-eb253fc2f923@redhat.com \
--to=david@redhat.com \
--cc=00moses.alexander00@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joel.granados@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mclapinski@google.com \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox