Re: [PATCH v2 6/9] mm: page_alloc: add alloc_contig_frozen_pages()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <muchun.song@linux.dev>,
	sidhartha.kumar@oracle.com, jane.chu@oracle.com,
	Vlastimil Babka <vbabka@suse.cz>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2 6/9] mm: page_alloc: add alloc_contig_frozen_pages()
Date: Mon, 08 Sep 2025 21:44:31 -0400	[thread overview]
Message-ID: <298B40E0-8EA5-4667-86EF-22F88B832839@nvidia.com> (raw)
In-Reply-To: <20250902124820.3081488-7-wangkefeng.wang@huawei.com>

On 2 Sep 2025, at 8:48, Kefeng Wang wrote:

> Introduce a ACR_FLAGS_FROZEN flags to indicate that we want to
> allocate a frozen compound pages by alloc_contig_range(), also
> provide alloc_contig_frozen_pages() to allocate pages without
> incrementing their refcount, which may be beneficial to some
> users (eg hugetlb).
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  include/linux/gfp.h |  6 ++++
>  mm/page_alloc.c     | 85 +++++++++++++++++++++++++--------------------
>  2 files changed, 54 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 5ebf26fcdcfa..d0047b85fe34 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -427,6 +427,7 @@ extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
>  typedef unsigned int __bitwise acr_flags_t;
>  #define ACR_FLAGS_NONE ((__force acr_flags_t)0) // ordinary allocation request
>  #define ACR_FLAGS_CMA ((__force acr_flags_t)BIT(0)) // allocate for CMA
> +#define ACR_FLAGS_FROZEN ((__force acr_flags_t)BIT(1)) // allocate for frozen compound pages

ACR_FLAGS_FROZEN_COMP might be better based on your comment.
But maybe in the future, we might want to convert non compound part
to use frozen page too. In that case, it might be better to
remove “compound” in the comment and then

>
>  /* The below functions must be run on a range from a single zone. */
>  extern int alloc_contig_range_noprof(unsigned long start, unsigned long end,
> @@ -437,6 +438,11 @@ extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_
>  					      int nid, nodemask_t *nodemask);
>  #define alloc_contig_pages(...)			alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
>
> +struct page *alloc_contig_frozen_pages_noprof(unsigned long nr_pages,
> +		gfp_t gfp_mask, int nid, nodemask_t *nodemask);
> +#define alloc_contig_frozen_pages(...) \
> +	alloc_hooks(alloc_contig_frozen_pages_noprof(__VA_ARGS__))
> +
>  #endif
>  void free_contig_range(unsigned long pfn, unsigned long nr_pages);
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index baead29b3e67..0677c49fdff1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6854,6 +6854,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>  	if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
>  		return -EINVAL;
>
> +	if ((alloc_flags & ACR_FLAGS_FROZEN) && !(gfp_mask & __GFP_COMP))
> +		return -EINVAL;
> +

... add a comment above this to say only frozen compound pages are supported.

>  	/*
>  	 * What we do here is we mark all pageblocks in range as
>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
> @@ -6951,7 +6954,8 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>
>  		check_new_pages(head, order);
>  		prep_new_page(head, order, gfp_mask, 0);
> -		set_page_refcounted(head);
> +		if (!(alloc_flags & ACR_FLAGS_FROZEN))
> +			set_page_refcounted(head);
>  	} else {
>  		ret = -EINVAL;
>  		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
> @@ -6963,15 +6967,6 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>  }
>  EXPORT_SYMBOL(alloc_contig_range_noprof);
>
> -static int __alloc_contig_pages(unsigned long start_pfn,
> -				unsigned long nr_pages, gfp_t gfp_mask)
> -{
> -	unsigned long end_pfn = start_pfn + nr_pages;
> -
> -	return alloc_contig_range_noprof(start_pfn, end_pfn, ACR_FLAGS_NONE,
> -					 gfp_mask);
> -}
> -
>  static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>  				   unsigned long nr_pages)
>  {
> @@ -7003,31 +6998,8 @@ static bool zone_spans_last_pfn(const struct zone *zone,
>  	return zone_spans_pfn(zone, last_pfn);
>  }
>
> -/**
> - * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
> - * @nr_pages:	Number of contiguous pages to allocate
> - * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
> - *		action and reclaim modifiers are supported. Reclaim modifiers
> - *		control allocation behavior during compaction/migration/reclaim.
> - * @nid:	Target node
> - * @nodemask:	Mask for other possible nodes
> - *
> - * This routine is a wrapper around alloc_contig_range(). It scans over zones
> - * on an applicable zonelist to find a contiguous pfn range which can then be
> - * tried for allocation with alloc_contig_range(). This routine is intended
> - * for allocation requests which can not be fulfilled with the buddy allocator.
> - *
> - * The allocated memory is always aligned to a page boundary. If nr_pages is a
> - * power of two, then allocated range is also guaranteed to be aligned to same
> - * nr_pages (e.g. 1GB request would be aligned to 1GB).
> - *
> - * Allocated pages can be freed with free_contig_range() or by manually calling
> - * __free_page() on each allocated page.
> - *
> - * Return: pointer to contiguous pages on success, or NULL if not successful.
> - */
> -struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
> -				 int nid, nodemask_t *nodemask)
> +static struct page *__alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
> +		acr_flags_t alloc_flags, int nid, nodemask_t *nodemask)
>  {
>  	unsigned long ret, pfn, flags;
>  	struct zonelist *zonelist;
> @@ -7050,8 +7022,8 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>  				 * and cause alloc_contig_range() to fail...
>  				 */
>  				spin_unlock_irqrestore(&zone->lock, flags);
> -				ret = __alloc_contig_pages(pfn, nr_pages,
> -							gfp_mask);
> +				ret = alloc_contig_range_noprof(pfn, pfn + nr_pages,
> +							alloc_flags, gfp_mask);
>  				if (!ret)
>  					return pfn_to_page(pfn);
>  				spin_lock_irqsave(&zone->lock, flags);
> @@ -7062,6 +7034,45 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>  	}
>  	return NULL;
>  }
> +
> +/**
> + * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
> + * @nr_pages:	Number of contiguous pages to allocate
> + * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
> + *		action and reclaim modifiers are supported. Reclaim modifiers
> + *		control allocation behavior during compaction/migration/reclaim.
> + * @nid:	Target node
> + * @nodemask:	Mask for other possible nodes
> + *
> + * This routine is a wrapper around alloc_contig_range(). It scans over zones
> + * on an applicable zonelist to find a contiguous pfn range which can then be
> + * tried for allocation with alloc_contig_range(). This routine is intended
> + * for allocation requests which can not be fulfilled with the buddy allocator.
> + *
> + * The allocated memory is always aligned to a page boundary. If nr_pages is a
> + * power of two, then allocated range is also guaranteed to be aligned to same
> + * nr_pages (e.g. 1GB request would be aligned to 1GB).
> + *
> + * Allocated pages can be freed with free_contig_range() or by manually calling
> + * __free_page() on each allocated page.
> + *
> + * Return: pointer to contiguous pages on success, or NULL if not successful.
> + */
> +struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
> +		int nid, nodemask_t *nodemask)
> +{
> +	return __alloc_contig_pages(nr_pages, gfp_mask, ACR_FLAGS_NONE,
> +				    nid, nodemask);
> +}
> +
> +struct page *alloc_contig_frozen_pages_noprof(unsigned long nr_pages,
> +		gfp_t gfp_mask, int nid, nodemask_t *nodemask)
> +{
> +	/* always allocate compound pages without refcount increased */
> +	return __alloc_contig_pages(nr_pages, gfp_mask | __GFP_COMP,
> +				    ACR_FLAGS_FROZEN, nid, nodemask);
> +}
> +

When all contig page allocations are converted to use frozen page,
alloc_contig_pages_noprof() can do similar things as __alloc_pages_noprof():

struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
		int nid, nodemask_t *nodemask)
{
	struct page *page = __alloc_frozen_contig_pages_noprof(...);
	if (gfp_mask & __GFP_COMP)
		set_page_refcounted(page);
	else {
		unsigned long i;

		for (i = 0; i < nr_pages; i++)
			set_page_refcounted(page + i);
	}
	return page;
}

And ACR_FLAGS_FROZEN will be no longer needed.

I looked at the code briefly, it seems that we need:

1. remove set_page_refcounted() in split_free_pages();
2. a new split_frozen_page() to do split_page() work without
   set_page_refcounted() and split_page() can just call it;
3. a new free_frozen_contig_range() and free_contig_range()
   calls it.

Anyway, with the comment change mentioned above,
Reviewed-by: Zi Yan <ziy@nvidia.com>

>  #endif /* CONFIG_CONTIG_ALLOC */
>
>  void free_contig_range(unsigned long pfn, unsigned long nr_pages)
> -- 
> 2.27.0


Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2025-09-09  1:44 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-02 12:48 [PATCH v2 0/9] mm: hugetlb: cleanup and allocate frozen hugetlb folio Kefeng Wang
2025-09-02 12:48 ` [PATCH v2 1/9] mm: hugetlb: convert to use more alloc_fresh_hugetlb_folio() Kefeng Wang
2025-09-08  9:21   ` Oscar Salvador
2025-09-08 12:59     ` Kefeng Wang
2025-09-09  0:54   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 2/9] mm: hugetlb: convert to account_new_hugetlb_folio() Kefeng Wang
2025-09-08  9:26   ` Oscar Salvador
2025-09-08 13:20     ` Kefeng Wang
2025-09-08 13:38       ` Oscar Salvador
2025-09-08 13:40   ` Oscar Salvador
2025-09-09  7:04     ` Kefeng Wang
2025-09-09  0:59   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 3/9] mm: hugetlb: directly pass order when allocate a hugetlb folio Kefeng Wang
2025-09-08  9:29   ` Oscar Salvador
2025-09-09  1:11   ` Zi Yan
2025-09-09  7:11     ` Kefeng Wang
2025-09-02 12:48 ` [PATCH v2 4/9] mm: hugetlb: remove struct hstate from init_new_hugetlb_folio() Kefeng Wang
2025-09-08  9:31   ` Oscar Salvador
2025-09-09  1:13   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 5/9] mm: hugeltb: check NUMA_NO_NODE in only_alloc_fresh_hugetlb_folio() Kefeng Wang
2025-09-08  9:34   ` Oscar Salvador
2025-09-09  1:16   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 6/9] mm: page_alloc: add alloc_contig_frozen_pages() Kefeng Wang
2025-09-09  0:21   ` jane.chu
2025-09-09  1:44   ` Zi Yan [this message]
2025-09-09  7:29     ` Kefeng Wang
2025-09-09  8:11   ` Oscar Salvador
2025-09-09 18:55   ` Matthew Wilcox
2025-09-09 19:08     ` Zi Yan
2025-09-10  2:05       ` Kefeng Wang
2025-09-02 12:48 ` [PATCH v2 7/9] mm: cma: add alloc flags for __cma_alloc() Kefeng Wang
2025-09-09  0:19   ` jane.chu
2025-09-09  2:03   ` Zi Yan
2025-09-09  8:05   ` Oscar Salvador
2025-09-02 12:48 ` [PATCH v2 8/9] mm: cma: add __cma_release() Kefeng Wang
2025-09-09  0:15   ` jane.chu
2025-09-02 12:48 ` [PATCH v2 9/9] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio() Kefeng Wang
2025-09-09  1:48   ` jane.chu
2025-09-09  7:33     ` Kefeng Wang
2025-09-09  2:02   ` Zi Yan
2025-09-09  7:34     ` Kefeng Wang
2025-09-02 13:51 ` [PATCH v2 0/9] mm: hugetlb: cleanup and allocate frozen hugetlb folio Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=298B40E0-8EA5-4667-86EF-22F88B832839@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=sidhartha.kumar@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox