[PATCH v5] page_alloc: allow migration of smaller hugepages during contig

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
@ 2025-12-18 19:08 Gregory Price
  2025-12-18 19:35 ` Johannes Weiner
  2025-12-18 19:45 ` Zi Yan
  0 siblings, 2 replies; 8+ messages in thread
From: Gregory Price @ 2025-12-18 19:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, kernel-team, akpm, vbabka, surenb, mhocko,
	jackmanb, hannes, ziy, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

We presently skip regions with hugepages entirely when trying to do
contiguous page allocation.  This will cause otherwise-movable
2MB HugeTLB pages to be considered unmovable, and will make 1GB
hugepages allocation less reliable on systems utilizing both.

Commit 4d73ba5fa710 ("mm: page_alloc: skip regions with hugetlbfs pages
when allocating 1G pages") skipped all HugePage containing regions
because it can cause significant delays in 1G allocation (as HugeTLB
migrations may fail for a number of reasons).

Instead, if hugepage migration is enabled, consider regions with
hugepages smaller than the target contiguous allocation request
as valid targets for allocation.

We optimize for the existing behavior by searching for non-hugetlb
regions in a first pass, then retrying the search to include hugetlb
only on failure.  This allows the existing fast-path to remain the
default case with a slow-path fallback to increase reliability.

isolate_migrate_pages_block() has similar hugetlb filter logic, and
the hugetlb code does a migratable check in folio_isolate_hugetlb()
during isolation.  The code servicing the allocation and migration
already supports this exact use case (it's just unreachable).

To test, allocate a bunch of 2MB HugeTLB pages (in this case 48GB)
and then attempt to allocate some 1G HugeTLB pages (in this case 4GB)
(Scale to your machine's memory capacity).

echo 24576 > .../hugepages-2048kB/nr_hugepages
echo 4 > .../hugepages-1048576kB/nr_hugepages

Prior to this patch, the 1GB page allocation can fail if no contiguous
1GB pages remain.  After this patch, the kernel will try to move 2MB
pages and successfully allocate the 1GB pages (assuming overall
sufficient memory is available). Also tested this while a program had
the 2MB reservations mapped, and the 1GB reservation still succeeds.

folio_alloc_gigantic() is the primary user of alloc_contig_pages(),
other users are debug or init-time allocations and largely unaffected.
- ppc/memtrace is a debugfs interface
- x86/tdx memory allocation occurs once on module-init
- kfence/core happens once on module (late) init
- THP uses it in debug_vm_pgtable_alloc_huge_page at __init time

Suggested-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/linux-mm/6fe3562d-49b2-4975-aa86-e139c535ad00@redhat.com/
Signed-off-by: Gregory Price <gourry@gourry.net>
---
v5: add fast-path/slow-path mechanism to retain current performance
    dropped tags as this changes the behavior of the patch
    most of the logic otherwise remains the same.

 mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 822e05f1a964..3ddad1fca924 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7083,7 +7083,7 @@ static int __alloc_contig_pages(unsigned long start_pfn,
 }
 
 static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
-				   unsigned long nr_pages)
+				   unsigned long nr_pages, bool search_hugetlb)
 {
 	unsigned long i, end_pfn = start_pfn + nr_pages;
 	struct page *page;
@@ -7099,8 +7099,30 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 		if (PageReserved(page))
 			return false;
 
-		if (PageHuge(page))
-			return false;
+		/*
+		 * Only consider ranges containing hugepages if those pages are
+		 * smaller than the requested contiguous region.  e.g.:
+		 *     Move 2MB pages to free up a 1GB range.
+		 *     Don't move 1GB pages to free up a 2MB range.
+		 *
+		 * This makes contiguous allocation more reliable if multiple
+		 * hugepage sizes are used without causing needless movement.
+		 */
+		if (PageHuge(page)) {
+			unsigned int order;
+
+			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
+				return false;
+
+			if (!search_hugetlb)
+				return false;
+
+			page = compound_head(page);
+			order = compound_order(page);
+			if ((order >= MAX_FOLIO_ORDER) ||
+			    (nr_pages <= (1 << order)))
+				return false;
+		}
 	}
 	return true;
 }
@@ -7143,7 +7165,9 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 	struct zonelist *zonelist;
 	struct zone *zone;
 	struct zoneref *z;
+	bool hugetlb = false;
 
+retry:
 	zonelist = node_zonelist(nid, gfp_mask);
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					gfp_zone(gfp_mask), nodemask) {
@@ -7151,7 +7175,8 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 
 		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
 		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
-			if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
+			if (pfn_range_valid_contig(zone, pfn, nr_pages,
+						   hugetlb)) {
 				/*
 				 * We release the zone lock here because
 				 * alloc_contig_range() will also lock the zone
@@ -7170,6 +7195,17 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
 	}
+	/*
+	 * If we failed, retry the search, but treat regions with HugeTLB pages
+	 * as valid targets.  This retains fast-allocations on first pass
+	 * without trying to migrate HugeTLB pages (which may fail). On the
+	 * second pass, we will try moving HugeTLB pages when those pages are
+	 * smaller than the requested contiguous region size.
+	 */
+	if (!hugetlb && IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION)) {
+		hugetlb = true;
+		goto retry;
+	}
 	return NULL;
 }
 #endif /* CONFIG_CONTIG_ALLOC */
-- 
2.52.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 19:08 [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc Gregory Price
@ 2025-12-18 19:35 ` Johannes Weiner
  2025-12-18 23:38   ` Gregory Price
  2025-12-18 19:45 ` Zi Yan
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2025-12-18 19:35 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, ziy, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On Thu, Dec 18, 2025 at 02:08:31PM -0500, Gregory Price wrote:
> @@ -7099,8 +7099,30 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>  		if (PageReserved(page))
>  			return false;
>  
> -		if (PageHuge(page))
> -			return false;
> +		/*
> +		 * Only consider ranges containing hugepages if those pages are
> +		 * smaller than the requested contiguous region.  e.g.:
> +		 *     Move 2MB pages to free up a 1GB range.
> +		 *     Don't move 1GB pages to free up a 2MB range.
> +		 *
> +		 * This makes contiguous allocation more reliable if multiple
> +		 * hugepage sizes are used without causing needless movement.
> +		 */
> +		if (PageHuge(page)) {
> +			unsigned int order;
> +
> +			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> +				return false;
> +
> +			if (!search_hugetlb)
> +				return false;
> +
> +			page = compound_head(page);
> +			order = compound_order(page);
> +			if ((order >= MAX_FOLIO_ORDER) ||
> +			    (nr_pages <= (1 << order)))
> +				return false;

If you keep searching past it, you can step over the whole page to
speed things up a bit:

			i += (1 << order) - 1;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 19:08 [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc Gregory Price
  2025-12-18 19:35 ` Johannes Weiner
@ 2025-12-18 19:45 ` Zi Yan
  2025-12-18 20:42   ` Gregory Price
  2025-12-18 21:07   ` Gregory Price
  1 sibling, 2 replies; 8+ messages in thread
From: Zi Yan @ 2025-12-18 19:45 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On 18 Dec 2025, at 14:08, Gregory Price wrote:

> We presently skip regions with hugepages entirely when trying to do
> contiguous page allocation.  This will cause otherwise-movable
> 2MB HugeTLB pages to be considered unmovable, and will make 1GB
> hugepages allocation less reliable on systems utilizing both.
>
> Commit 4d73ba5fa710 ("mm: page_alloc: skip regions with hugetlbfs pages
> when allocating 1G pages") skipped all HugePage containing regions
> because it can cause significant delays in 1G allocation (as HugeTLB
> migrations may fail for a number of reasons).
>
> Instead, if hugepage migration is enabled, consider regions with
> hugepages smaller than the target contiguous allocation request
> as valid targets for allocation.
>
> We optimize for the existing behavior by searching for non-hugetlb
> regions in a first pass, then retrying the search to include hugetlb
> only on failure.  This allows the existing fast-path to remain the
> default case with a slow-path fallback to increase reliability.

Why not do hugetlb search when non-hugetlb fails in pfn_range_valid_contig()
and give hugetlb_search result in an input parameter? Something like

bool pfn_range_valid_contig(..., bool *hugetlb_search_result)
{
	bool no_hugetlb = true;

	if (hugetlb_search_result)
		*hugetlb_search_result = false;
	...
	if (PageHuge(page)) {
		no_hugetlb = false;
		...
		
		if (hugetlb_search_result) {
			page = compound_head(page);
			order = compound_order(page);
			if ((order >= MAX_FOLIO_ORDER) ||
				(nr_pages <= (1 << order)))
				return false;
				
		}
    /* At this point, we have not found 1GB hugetlb */
	if (hugetlb_search_result)
			*hugetlb_search_result = true;
	return no_hugetlb;
}

That can save another scan? And caller can pass hugetlb_search_result if
they care and check its value if pfn_range_valid_contig() returns false.



>
> isolate_migrate_pages_block() has similar hugetlb filter logic, and
> the hugetlb code does a migratable check in folio_isolate_hugetlb()
> during isolation.  The code servicing the allocation and migration
> already supports this exact use case (it's just unreachable).
>
> To test, allocate a bunch of 2MB HugeTLB pages (in this case 48GB)
> and then attempt to allocate some 1G HugeTLB pages (in this case 4GB)
> (Scale to your machine's memory capacity).
>
> echo 24576 > .../hugepages-2048kB/nr_hugepages
> echo 4 > .../hugepages-1048576kB/nr_hugepages
>
> Prior to this patch, the 1GB page allocation can fail if no contiguous
> 1GB pages remain.  After this patch, the kernel will try to move 2MB
> pages and successfully allocate the 1GB pages (assuming overall
> sufficient memory is available). Also tested this while a program had
> the 2MB reservations mapped, and the 1GB reservation still succeeds.
>
> folio_alloc_gigantic() is the primary user of alloc_contig_pages(),
> other users are debug or init-time allocations and largely unaffected.
> - ppc/memtrace is a debugfs interface
> - x86/tdx memory allocation occurs once on module-init
> - kfence/core happens once on module (late) init
> - THP uses it in debug_vm_pgtable_alloc_huge_page at __init time
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Link: https://lore.kernel.org/linux-mm/6fe3562d-49b2-4975-aa86-e139c535ad00@redhat.com/
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> v5: add fast-path/slow-path mechanism to retain current performance
>     dropped tags as this changes the behavior of the patch
>     most of the logic otherwise remains the same.
>
>  mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 822e05f1a964..3ddad1fca924 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7083,7 +7083,7 @@ static int __alloc_contig_pages(unsigned long start_pfn,
>  }
>
>  static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
> -				   unsigned long nr_pages)
> +				   unsigned long nr_pages, bool search_hugetlb)
>  {
>  	unsigned long i, end_pfn = start_pfn + nr_pages;
>  	struct page *page;
> @@ -7099,8 +7099,30 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>  		if (PageReserved(page))
>  			return false;
>
> -		if (PageHuge(page))
> -			return false;
> +		/*
> +		 * Only consider ranges containing hugepages if those pages are
> +		 * smaller than the requested contiguous region.  e.g.:
> +		 *     Move 2MB pages to free up a 1GB range.
> +		 *     Don't move 1GB pages to free up a 2MB range.
> +		 *
> +		 * This makes contiguous allocation more reliable if multiple
> +		 * hugepage sizes are used without causing needless movement.
> +		 */
> +		if (PageHuge(page)) {
> +			unsigned int order;
> +
> +			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> +				return false;
> +
> +			if (!search_hugetlb)
> +				return false;
> +
> +			page = compound_head(page);
> +			order = compound_order(page);
> +			if ((order >= MAX_FOLIO_ORDER) ||
> +			    (nr_pages <= (1 << order)))
> +				return false;
> +		}
>  	}
>  	return true;
>  }
> @@ -7143,7 +7165,9 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>  	struct zonelist *zonelist;
>  	struct zone *zone;
>  	struct zoneref *z;
> +	bool hugetlb = false;
>
> +retry:
>  	zonelist = node_zonelist(nid, gfp_mask);
>  	for_each_zone_zonelist_nodemask(zone, z, zonelist,
>  					gfp_zone(gfp_mask), nodemask) {
> @@ -7151,7 +7175,8 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>
>  		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>  		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
> -			if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
> +			if (pfn_range_valid_contig(zone, pfn, nr_pages,
> +						   hugetlb)) {
>  				/*
>  				 * We release the zone lock here because
>  				 * alloc_contig_range() will also lock the zone
> @@ -7170,6 +7195,17 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>  		}
>  		spin_unlock_irqrestore(&zone->lock, flags);
>  	}
> +	/*
> +	 * If we failed, retry the search, but treat regions with HugeTLB pages
> +	 * as valid targets.  This retains fast-allocations on first pass
> +	 * without trying to migrate HugeTLB pages (which may fail). On the
> +	 * second pass, we will try moving HugeTLB pages when those pages are
> +	 * smaller than the requested contiguous region size.
> +	 */
> +	if (!hugetlb && IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION)) {
> +		hugetlb = true;
> +		goto retry;
> +	}
>  	return NULL;
>  }
>  #endif /* CONFIG_CONTIG_ALLOC */
> -- 
> 2.52.0


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 19:45 ` Zi Yan
@ 2025-12-18 20:42   ` Gregory Price
  2025-12-18 21:17     ` Zi Yan
  2025-12-18 21:07   ` Gregory Price
  1 sibling, 1 reply; 8+ messages in thread
From: Gregory Price @ 2025-12-18 20:42 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On Thu, Dec 18, 2025 at 02:45:37PM -0500, Zi Yan wrote:
> 
> That can save another scan? And caller can pass hugetlb_search_result if
> they care and check its value if pfn_range_valid_contig() returns false.
> 

Well, first, I've generally seen it discouraged to do output-parameters
like this for such trivial things.  But that aside...

We have to scan again either way if we want to prefer allocating
non-hugetlb regions in different memory blocks first.  This is what Mel
was pointing out (we should touch every OTHER block before we attempt
HugeTLB migrations).

The best optimization you could hope for is something like the following
- but honestly, this is ugly, racy (zone contents may have changed
between scans), and if you're already in the slow reliable path then we
should just be slow and re-scan the non-hugetlb sections as well.

Other than this being ugly, I don't have strong feelings.  If people
would prefer the second pass to ONLY touch hugetlb sections, I'll ship
this.

static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
                                   unsigned long nr_pages, bool search_hugetlb,
                                   bool *hugetlb_found)
{
        bool hugetlb = false;

        for (i = start_pfn; i < end_pfn; i++) {
	...
                if (PageHuge(page)) {
                        if (hugetlb_found)
                                *hugetlb_found = true;

                        if (!search_hugetlb)
                                return false;

			...
                        hugetlb = true;
                }
        }
	/* 
	 * If we're searching for hugetlb regions, only return those
	 * Otherwise only return regions without hugetlb reservations
	 */
        return !search_hugetlb || hugetlb;
}


struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
                                 int nid, nodemask_t *nodemask)
{
        bool search_hugetlb = false;
	bool hugetlb_found = false;

retry:
        zonelist = node_zonelist(nid, gfp_mask);
        for_each_zone_zonelist_nodemask(zone, z, zonelist,
                                        gfp_zone(gfp_mask), nodemask) {
                spin_lock_irqsave(&zone->lock, flags);

                pfn = ALIGN(zone->zone_start_pfn, nr_pages);
                while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
                        if (pfn_range_valid_contig(zone, pfn, nr_pages,
                                                   search_hugetlb,
                                                   &hugetlb_found)) {
						   ...
                }
        }
        if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
            !search_hugetlb && hugetlb_found) {
                search_hugetlb = true;
                goto retry;
        }
        return NULL;
}

~Gregory


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 19:45 ` Zi Yan
  2025-12-18 20:42   ` Gregory Price
@ 2025-12-18 21:07   ` Gregory Price
  1 sibling, 0 replies; 8+ messages in thread
From: Gregory Price @ 2025-12-18 21:07 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On Thu, Dec 18, 2025 at 02:45:37PM -0500, Zi Yan wrote:
> On 18 Dec 2025, at 14:08, Gregory Price wrote:
> 
> Why not do hugetlb search when non-hugetlb fails in pfn_range_valid_contig()
> and give hugetlb_search result in an input parameter? Something like
> 

Short discussion with Johannes - it's not worth trying to do any kind of
filtering, but we can at least optimize for the case where we don't
actually find any hugetlb regions (preventing a retry for no reason).

So I'll ship a v6 with a mild improvement.

~Gregory


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 20:42   ` Gregory Price
@ 2025-12-18 21:17     ` Zi Yan
  2025-12-18 21:32       ` Gregory Price
  0 siblings, 1 reply; 8+ messages in thread
From: Zi Yan @ 2025-12-18 21:17 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On 18 Dec 2025, at 15:42, Gregory Price wrote:

> On Thu, Dec 18, 2025 at 02:45:37PM -0500, Zi Yan wrote:
>>
>> That can save another scan? And caller can pass hugetlb_search_result if
>> they care and check its value if pfn_range_valid_contig() returns false.
>>
>
> Well, first, I've generally seen it discouraged to do output-parameters
> like this for such trivial things.  But that aside...
>
> We have to scan again either way if we want to prefer allocating
> non-hugetlb regions in different memory blocks first.  This is what Mel
> was pointing out (we should touch every OTHER block before we attempt
> HugeTLB migrations).

OK, you assume hugetlb is harder to migrate compared to other movable pages.
Considering the limited number of hugetlb pages, it is quite possible.
Anyway, I will wait for your v6. Thank you for the explanation and the
prototype below.

>
> The best optimization you could hope for is something like the following
> - but honestly, this is ugly, racy (zone contents may have changed
> between scans), and if you're already in the slow reliable path then we
> should just be slow and re-scan the non-hugetlb sections as well.
>
> Other than this being ugly, I don't have strong feelings.  If people
> would prefer the second pass to ONLY touch hugetlb sections, I'll ship
> this.
>
> static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>                                    unsigned long nr_pages, bool search_hugetlb,
>                                    bool *hugetlb_found)
> {
>         bool hugetlb = false;
>
>         for (i = start_pfn; i < end_pfn; i++) {
> 	...
>                 if (PageHuge(page)) {
>                         if (hugetlb_found)
>                                 *hugetlb_found = true;
>
>                         if (!search_hugetlb)
>                                 return false;
>
> 			...
>                         hugetlb = true;
>                 }
>         }
> 	/*
> 	 * If we're searching for hugetlb regions, only return those
> 	 * Otherwise only return regions without hugetlb reservations
> 	 */
>         return !search_hugetlb || hugetlb;
> }
>
>
> struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>                                  int nid, nodemask_t *nodemask)
> {
>         bool search_hugetlb = false;
> 	bool hugetlb_found = false;
>
> retry:
>         zonelist = node_zonelist(nid, gfp_mask);
>         for_each_zone_zonelist_nodemask(zone, z, zonelist,
>                                         gfp_zone(gfp_mask), nodemask) {
>                 spin_lock_irqsave(&zone->lock, flags);
>
>                 pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>                 while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
>                         if (pfn_range_valid_contig(zone, pfn, nr_pages,
>                                                    search_hugetlb,
>                                                    &hugetlb_found)) {
> 						   ...
>                 }
>         }
>         if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
>             !search_hugetlb && hugetlb_found) {
>                 search_hugetlb = true;
>                 goto retry;
>         }
>         return NULL;
> }
>
> ~Gregory


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 21:17     ` Zi Yan
@ 2025-12-18 21:32       ` Gregory Price
  0 siblings, 0 replies; 8+ messages in thread
From: Gregory Price @ 2025-12-18 21:32 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, hannes, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On Thu, Dec 18, 2025 at 04:17:14PM -0500, Zi Yan wrote:
> On 18 Dec 2025, at 15:42, Gregory Price wrote:
> 
> OK, you assume hugetlb is harder to migrate compared to other movable pages.
> Considering the limited number of hugetlb pages, it is quite possible.
> Anyway, I will wait for your v6. Thank you for the explanation and the
> prototype below.
> 

They are harder - a migration starts with an allocation. Contiguous
allocations may then cause more migrations. We can end back up in similar
style search/migrate code.

Basically this can go some amount of recursive (though not much in practice).

We at least guarantee termination because we guarantee we won't attempt to
move pages of the same size or larger than the current allocation.

~Gregory

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc
  2025-12-18 19:35 ` Johannes Weiner
@ 2025-12-18 23:38   ` Gregory Price
  0 siblings, 0 replies; 8+ messages in thread
From: Gregory Price @ 2025-12-18 23:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, kernel-team, akpm, vbabka, surenb,
	mhocko, jackmanb, ziy, richard.weiyang, osalvador, rientjes,
	david, joshua.hahnjy, fvdl

On Thu, Dec 18, 2025 at 02:35:19PM -0500, Johannes Weiner wrote:
> On Thu, Dec 18, 2025 at 02:08:31PM -0500, Gregory Price wrote:
> > +			page = compound_head(page);
> > +			order = compound_order(page);
> > +			if ((order >= MAX_FOLIO_ORDER) ||
> > +			    (nr_pages <= (1 << order)))
> > +				return false;
> 
> If you keep searching past it, you can step over the whole page to
> speed things up a bit:
> 
> 			i += (1 << order) - 1;

ack, good catch, added to v6

~Gregory


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-18 23:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-18 19:08 [PATCH v5] page_alloc: allow migration of smaller hugepages during contig_alloc Gregory Price
2025-12-18 19:35 ` Johannes Weiner
2025-12-18 23:38   ` Gregory Price
2025-12-18 19:45 ` Zi Yan
2025-12-18 20:42   ` Gregory Price
2025-12-18 21:17     ` Zi Yan
2025-12-18 21:32       ` Gregory Price
2025-12-18 21:07   ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox