linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	yuzhao@google.com, david@redhat.com, linux-mm@kvack.org,
	wangkefeng.wang@huawei.com, sunnanyong@huawei.com
Subject: Re: [PATCH] mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
Date: Fri, 18 Apr 2025 21:32:05 -0400	[thread overview]
Message-ID: <AC977477-BECA-4887-8ECF-9713030515C8@nvidia.com> (raw)
In-Reply-To: <a443a135-fedb-2a99-22ef-b4c3d9610542@huawei.com>

On 18 Apr 2025, at 20:54, Jinjiang Tu wrote:

> 在 2025/4/19 5:32, Zi Yan 写道:
>> Hi Jinjiang,
>>
>> On 17 Apr 2025, at 22:59, Andrew Morton wrote:
>>
>>> On Wed, 12 Mar 2025 16:47:05 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote:
>>>
>>>> When calling alloc_contig_range() with __GFP_COMP and the order of
>>>> requested pfn range is pageblock_order, less than MAX_ORDER, I triggered
>>>> WARNING as follows:
>>>>
>>>>   PFN range: requested [2150105088, 2150105600), allocated [2150105088, 2150106112)
>>>>   WARNING: CPU: 3 PID: 580 at mm/page_alloc.c:6877 alloc_contig_range+0x280/0x340
>> Basically, you are using alloc_contig_range() to allocate a compound page
>> that can be allocated from buddy allocator, since order is < MAX_ORDER.
>> What is the use case? Why is alloc_contig_range() used?
>
> In CMA case, alloc_contig_range() is used to allocate from requested pfn range, and the order may
> be < MAX_ORDER.

But why do you need __GFP_COMP? I thought __GFP_COMP was only used for
1GB hugetlb.

>
>>
>>>> alloc_contig_range() marks pageblocks of the requested pfn range to be
>>>> isolated, migrate these pages if they are in use and will be freed to
>>>> MIGRATE_ISOLATED freelist.
>>>>
>>>> Suppose two alloc_contig_range() calls at the same time and the requested
>>>> pfn range are [0x80280000, 0x80280200) and [0x80280200, 0x80280400)
>>>> respectively. Suppose the two memory range are in use, then
>>>> alloc_contig_range() will migrate and free these pages to MIGRATE_ISOLATED
>>>> freelist. __free_one_page() will merge MIGRATE_ISOLATE buddy to larger
>>>> buddy, resulting in a MAX_ORDER buddy. Finally, find_large_buddy() in
>>>> alloc_contig_range() returns a MAX_ORDER buddy and results in WARNING.
>>>>
>>>> To fix it, call free_contig_range() to free the excess pfn range.
>>> This has been in mm-hotfixes for a month without issue.  Is there any
>>> reviewer interest?
>>>
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -6528,7 +6528,8 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>>>   		goto done;
>>>>   	}
>>>>
>>>> -	if (!(gfp_mask & __GFP_COMP)) {
>>>> +	if (!(gfp_mask & __GFP_COMP) ||
>>>> +		(is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)) {
>>>>   		split_free_pages(cc.freepages, gfp_mask);
>> This does not look right to me. When a compound page is requested,
>> alloc_contig_range() should give a compound page, but split_free_pages()
>> will make the free page as a list of contiguous order-0 pages.
>>
>> I do not think we should keep this patch.
>>
>> Jinjiang, let me know if I miss anything.
>
> After split_free_pages(), below code is execucted to collapse the contiguous order-0 pages
> to a compound page.

OK, got it. Since cc.freepages can be MAX_PAGE_ORDER and the requested
PFN range is smaller than MAX_PAGE_ORDER. A more “right” way of handling
this would be in the second if (the one shown below), you check order
against the cc.freepages order and free out of requested range pages.
But that might be too complicated. Your approach is simpler.

Can you add a comment above
“(is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)” to
explain why __GFP_COMP needs split_free_pages()? Something like:

With __GFP_COMP and the requested order < MAX_PAGE_ORDER, isolated free
pages can have higher order than the requested one. Use split_free_pages()
to free out of range pages.


>
>   if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {

is_power_of_2(end - start) is used twice. Can you add a variable for it
if that sounds good to you?

Thanks.

>         struct page *head = pfn_to_page(start);
>         int order = ilog2(end - start);
>
>         check_new_pages(head, order);
>         prep_new_page(head, order, gfp_mask, 0);
>         set_page_refcounted(head);
>
>   }
>
> Thanks.
>
>>
>>>>   		/* Free head and tail (if any) */
>>>> @@ -6536,7 +6537,15 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>>>   			free_contig_range(outer_start, start - outer_start);
>>>>   		if (end != outer_end)
>>>>   			free_contig_range(end, outer_end - end);
>>>> -	} else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
>>>> +
>>>> +		outer_start = start;
>>>> +		outer_end = end;
>>>> +
>>>> +		if (!(gfp_mask & __GFP_COMP))
>>>> +			goto done;
>>>> +	}
>>>> +
>>>> +	if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
>>>>   		struct page *head = pfn_to_page(start);
>>>>   		int order = ilog2(end - start);
>>>>
>>
>> Best Regards,
>> Yan, Zi


Best Regards,
Yan, Zi


  reply	other threads:[~2025-04-19  1:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12  8:47 Jinjiang Tu
2025-04-18  2:59 ` Andrew Morton
2025-04-18 21:32   ` Zi Yan
2025-04-19  0:54     ` Jinjiang Tu
2025-04-19  1:32       ` Zi Yan [this message]
2025-04-19  1:50         ` Jinjiang Tu
2025-05-12  6:38         ` Jinjiang Tu
2025-05-12  7:58           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AC977477-BECA-4887-8ECF-9713030515C8@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=sunnanyong@huawei.com \
    --cc=tujinjiang@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox