linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <muchun.song@linux.dev>, <sidhartha.kumar@oracle.com>,
	<jane.chu@oracle.com>, Vlastimil Babka <vbabka@suse.cz>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH v2 6/9] mm: page_alloc: add alloc_contig_frozen_pages()
Date: Tue, 9 Sep 2025 15:29:21 +0800	[thread overview]
Message-ID: <d23b0e37-2f67-46dc-badc-fc6807512c2e@huawei.com> (raw)
In-Reply-To: <298B40E0-8EA5-4667-86EF-22F88B832839@nvidia.com>



On 2025/9/9 9:44, Zi Yan wrote:
> On 2 Sep 2025, at 8:48, Kefeng Wang wrote:
> 
>> Introduce a ACR_FLAGS_FROZEN flags to indicate that we want to
>> allocate a frozen compound pages by alloc_contig_range(), also
>> provide alloc_contig_frozen_pages() to allocate pages without
>> incrementing their refcount, which may be beneficial to some
>> users (eg hugetlb).
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   include/linux/gfp.h |  6 ++++
>>   mm/page_alloc.c     | 85 +++++++++++++++++++++++++--------------------
>>   2 files changed, 54 insertions(+), 37 deletions(-)
>>
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index 5ebf26fcdcfa..d0047b85fe34 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -427,6 +427,7 @@ extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
>>   typedef unsigned int __bitwise acr_flags_t;
>>   #define ACR_FLAGS_NONE ((__force acr_flags_t)0) // ordinary allocation request
>>   #define ACR_FLAGS_CMA ((__force acr_flags_t)BIT(0)) // allocate for CMA
>> +#define ACR_FLAGS_FROZEN ((__force acr_flags_t)BIT(1)) // allocate for frozen compound pages
> 
> ACR_FLAGS_FROZEN_COMP might be better based on your comment.
> But maybe in the future, we might want to convert non compound part
> to use frozen page too. In that case, it might be better to
> remove “compound” in the comment and then

Will drop compound.

> 
>>
>>   /* The below functions must be run on a range from a single zone. */
>>   extern int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>> @@ -437,6 +438,11 @@ extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_
>>   					      int nid, nodemask_t *nodemask);
>>   #define alloc_contig_pages(...)			alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
>>
>> +struct page *alloc_contig_frozen_pages_noprof(unsigned long nr_pages,
>> +		gfp_t gfp_mask, int nid, nodemask_t *nodemask);
>> +#define alloc_contig_frozen_pages(...) \
>> +	alloc_hooks(alloc_contig_frozen_pages_noprof(__VA_ARGS__))
>> +
>>   #endif
>>   void free_contig_range(unsigned long pfn, unsigned long nr_pages);
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index baead29b3e67..0677c49fdff1 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6854,6 +6854,9 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>   	if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
>>   		return -EINVAL;
>>
>> +	if ((alloc_flags & ACR_FLAGS_FROZEN) && !(gfp_mask & __GFP_COMP))
>> +		return -EINVAL;
>> +
> 
> ... add a comment above this to say only frozen compound pages are supported.

Sure.

> 
>>   	/*
>>   	 * What we do here is we mark all pageblocks in range as
>>   	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>> @@ -6951,7 +6954,8 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>
>>   		check_new_pages(head, order);
>>   		prep_new_page(head, order, gfp_mask, 0);
>> -		set_page_refcounted(head);
>> +		if (!(alloc_flags & ACR_FLAGS_FROZEN))
>> +			set_page_refcounted(head);
>>   	} else {
>>   		ret = -EINVAL;
>>   		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
>> @@ -6963,15 +6967,6 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>   }
>>   EXPORT_SYMBOL(alloc_contig_range_noprof);
>>
>> -static int __alloc_contig_pages(unsigned long start_pfn,
>> -				unsigned long nr_pages, gfp_t gfp_mask)
>> -{
>> -	unsigned long end_pfn = start_pfn + nr_pages;
>> -
>> -	return alloc_contig_range_noprof(start_pfn, end_pfn, ACR_FLAGS_NONE,
>> -					 gfp_mask);
>> -}
>> -
>>   static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>>   				   unsigned long nr_pages)
>>   {
>> @@ -7003,31 +6998,8 @@ static bool zone_spans_last_pfn(const struct zone *zone,
>>   	return zone_spans_pfn(zone, last_pfn);
>>   }
>>
>> -/**
>> - * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
>> - * @nr_pages:	Number of contiguous pages to allocate
>> - * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
>> - *		action and reclaim modifiers are supported. Reclaim modifiers
>> - *		control allocation behavior during compaction/migration/reclaim.
>> - * @nid:	Target node
>> - * @nodemask:	Mask for other possible nodes
>> - *
>> - * This routine is a wrapper around alloc_contig_range(). It scans over zones
>> - * on an applicable zonelist to find a contiguous pfn range which can then be
>> - * tried for allocation with alloc_contig_range(). This routine is intended
>> - * for allocation requests which can not be fulfilled with the buddy allocator.
>> - *
>> - * The allocated memory is always aligned to a page boundary. If nr_pages is a
>> - * power of two, then allocated range is also guaranteed to be aligned to same
>> - * nr_pages (e.g. 1GB request would be aligned to 1GB).
>> - *
>> - * Allocated pages can be freed with free_contig_range() or by manually calling
>> - * __free_page() on each allocated page.
>> - *
>> - * Return: pointer to contiguous pages on success, or NULL if not successful.
>> - */
>> -struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>> -				 int nid, nodemask_t *nodemask)
>> +static struct page *__alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
>> +		acr_flags_t alloc_flags, int nid, nodemask_t *nodemask)
>>   {
>>   	unsigned long ret, pfn, flags;
>>   	struct zonelist *zonelist;
>> @@ -7050,8 +7022,8 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>>   				 * and cause alloc_contig_range() to fail...
>>   				 */
>>   				spin_unlock_irqrestore(&zone->lock, flags);
>> -				ret = __alloc_contig_pages(pfn, nr_pages,
>> -							gfp_mask);
>> +				ret = alloc_contig_range_noprof(pfn, pfn + nr_pages,
>> +							alloc_flags, gfp_mask);
>>   				if (!ret)
>>   					return pfn_to_page(pfn);
>>   				spin_lock_irqsave(&zone->lock, flags);
>> @@ -7062,6 +7034,45 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>>   	}
>>   	return NULL;
>>   }
>> +
>> +/**
>> + * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
>> + * @nr_pages:	Number of contiguous pages to allocate
>> + * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
>> + *		action and reclaim modifiers are supported. Reclaim modifiers
>> + *		control allocation behavior during compaction/migration/reclaim.
>> + * @nid:	Target node
>> + * @nodemask:	Mask for other possible nodes
>> + *
>> + * This routine is a wrapper around alloc_contig_range(). It scans over zones
>> + * on an applicable zonelist to find a contiguous pfn range which can then be
>> + * tried for allocation with alloc_contig_range(). This routine is intended
>> + * for allocation requests which can not be fulfilled with the buddy allocator.
>> + *
>> + * The allocated memory is always aligned to a page boundary. If nr_pages is a
>> + * power of two, then allocated range is also guaranteed to be aligned to same
>> + * nr_pages (e.g. 1GB request would be aligned to 1GB).
>> + *
>> + * Allocated pages can be freed with free_contig_range() or by manually calling
>> + * __free_page() on each allocated page.
>> + *
>> + * Return: pointer to contiguous pages on success, or NULL if not successful.
>> + */
>> +struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
>> +		int nid, nodemask_t *nodemask)
>> +{
>> +	return __alloc_contig_pages(nr_pages, gfp_mask, ACR_FLAGS_NONE,
>> +				    nid, nodemask);
>> +}
>> +
>> +struct page *alloc_contig_frozen_pages_noprof(unsigned long nr_pages,
>> +		gfp_t gfp_mask, int nid, nodemask_t *nodemask)
>> +{
>> +	/* always allocate compound pages without refcount increased */
>> +	return __alloc_contig_pages(nr_pages, gfp_mask | __GFP_COMP,
>> +				    ACR_FLAGS_FROZEN, nid, nodemask);
>> +}
>> +
> 
> When all contig page allocations are converted to use frozen page,
> alloc_contig_pages_noprof() can do similar things as __alloc_pages_noprof():
> 
> struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
> 		int nid, nodemask_t *nodemask)
> {
> 	struct page *page = __alloc_frozen_contig_pages_noprof(...);
> 	if (gfp_mask & __GFP_COMP)
> 		set_page_refcounted(page);
> 	else {
> 		unsigned long i;
> 
> 		for (i = 0; i < nr_pages; i++)
> 			set_page_refcounted(page + i);
> 	}
> 	return page;
> }
> 
> And ACR_FLAGS_FROZEN will be no longer needed.
> 
> I looked at the code briefly, it seems that we need:
> 
> 1. remove set_page_refcounted() in split_free_pages();
> 2. a new split_frozen_page() to do split_page() work without
>     set_page_refcounted() and split_page() can just call it;
> 3. a new free_frozen_contig_range() and free_contig_range()
>     calls it.
> 

Yes, I think this way before, but no users for !compound for now, let's
keep it simple.

> Anyway, with the comment change mentioned above,
> Reviewed-by: Zi Yan <ziy@nvidia.com>

Thanks.

> 
>>   #endif /* CONFIG_CONTIG_ALLOC */
>>
>>   void free_contig_range(unsigned long pfn, unsigned long nr_pages)
>> -- 
>> 2.27.0
> 
> 
> Best Regards,
> Yan, Zi



  reply	other threads:[~2025-09-09  7:29 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-02 12:48 [PATCH v2 0/9] mm: hugetlb: cleanup and allocate frozen hugetlb folio Kefeng Wang
2025-09-02 12:48 ` [PATCH v2 1/9] mm: hugetlb: convert to use more alloc_fresh_hugetlb_folio() Kefeng Wang
2025-09-08  9:21   ` Oscar Salvador
2025-09-08 12:59     ` Kefeng Wang
2025-09-09  0:54   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 2/9] mm: hugetlb: convert to account_new_hugetlb_folio() Kefeng Wang
2025-09-08  9:26   ` Oscar Salvador
2025-09-08 13:20     ` Kefeng Wang
2025-09-08 13:38       ` Oscar Salvador
2025-09-08 13:40   ` Oscar Salvador
2025-09-09  7:04     ` Kefeng Wang
2025-09-09  0:59   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 3/9] mm: hugetlb: directly pass order when allocate a hugetlb folio Kefeng Wang
2025-09-08  9:29   ` Oscar Salvador
2025-09-09  1:11   ` Zi Yan
2025-09-09  7:11     ` Kefeng Wang
2025-09-02 12:48 ` [PATCH v2 4/9] mm: hugetlb: remove struct hstate from init_new_hugetlb_folio() Kefeng Wang
2025-09-08  9:31   ` Oscar Salvador
2025-09-09  1:13   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 5/9] mm: hugeltb: check NUMA_NO_NODE in only_alloc_fresh_hugetlb_folio() Kefeng Wang
2025-09-08  9:34   ` Oscar Salvador
2025-09-09  1:16   ` Zi Yan
2025-09-02 12:48 ` [PATCH v2 6/9] mm: page_alloc: add alloc_contig_frozen_pages() Kefeng Wang
2025-09-09  0:21   ` jane.chu
2025-09-09  1:44   ` Zi Yan
2025-09-09  7:29     ` Kefeng Wang [this message]
2025-09-09  8:11   ` Oscar Salvador
2025-09-09 18:55   ` Matthew Wilcox
2025-09-09 19:08     ` Zi Yan
2025-09-10  2:05       ` Kefeng Wang
2025-09-02 12:48 ` [PATCH v2 7/9] mm: cma: add alloc flags for __cma_alloc() Kefeng Wang
2025-09-09  0:19   ` jane.chu
2025-09-09  2:03   ` Zi Yan
2025-09-09  8:05   ` Oscar Salvador
2025-09-02 12:48 ` [PATCH v2 8/9] mm: cma: add __cma_release() Kefeng Wang
2025-09-09  0:15   ` jane.chu
2025-09-02 12:48 ` [PATCH v2 9/9] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio() Kefeng Wang
2025-09-09  1:48   ` jane.chu
2025-09-09  7:33     ` Kefeng Wang
2025-09-09  2:02   ` Zi Yan
2025-09-09  7:34     ` Kefeng Wang
2025-09-02 13:51 ` [PATCH v2 0/9] mm: hugetlb: cleanup and allocate frozen hugetlb folio Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d23b0e37-2f67-46dc-badc-fc6807512c2e@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=sidhartha.kumar@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox