From: Jinjiang Tu <tujinjiang@huawei.com>
To: Zi Yan <ziy@nvidia.com>, <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <yuzhao@google.com>,
<linux-mm@kvack.org>, <wangkefeng.wang@huawei.com>,
<sunnanyong@huawei.com>
Subject: Re: [PATCH] mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
Date: Mon, 12 May 2025 14:38:28 +0800 [thread overview]
Message-ID: <80039e40-a518-a85b-b955-96fb048a2dd0@huawei.com> (raw)
In-Reply-To: <AC977477-BECA-4887-8ECF-9713030515C8@nvidia.com>
在 2025/4/19 9:32, Zi Yan 写道:
> On 18 Apr 2025, at 20:54, Jinjiang Tu wrote:
>
>> 在 2025/4/19 5:32, Zi Yan 写道:
>>> Hi Jinjiang,
>>>
>>> On 17 Apr 2025, at 22:59, Andrew Morton wrote:
>>>
>>>> On Wed, 12 Mar 2025 16:47:05 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote:
>>>>
>>>>> When calling alloc_contig_range() with __GFP_COMP and the order of
>>>>> requested pfn range is pageblock_order, less than MAX_ORDER, I triggered
>>>>> WARNING as follows:
>>>>>
>>>>> PFN range: requested [2150105088, 2150105600), allocated [2150105088, 2150106112)
>>>>> WARNING: CPU: 3 PID: 580 at mm/page_alloc.c:6877 alloc_contig_range+0x280/0x340
>>> Basically, you are using alloc_contig_range() to allocate a compound page
>>> that can be allocated from buddy allocator, since order is < MAX_ORDER.
>>> What is the use case? Why is alloc_contig_range() used?
>> In CMA case, alloc_contig_range() is used to allocate from requested pfn range, and the order may
>> be < MAX_ORDER.
> But why do you need __GFP_COMP? I thought __GFP_COMP was only used for
> 1GB hugetlb.
>
>>>>> alloc_contig_range() marks pageblocks of the requested pfn range to be
>>>>> isolated, migrate these pages if they are in use and will be freed to
>>>>> MIGRATE_ISOLATED freelist.
>>>>>
>>>>> Suppose two alloc_contig_range() calls at the same time and the requested
>>>>> pfn range are [0x80280000, 0x80280200) and [0x80280200, 0x80280400)
>>>>> respectively. Suppose the two memory range are in use, then
>>>>> alloc_contig_range() will migrate and free these pages to MIGRATE_ISOLATED
>>>>> freelist. __free_one_page() will merge MIGRATE_ISOLATE buddy to larger
>>>>> buddy, resulting in a MAX_ORDER buddy. Finally, find_large_buddy() in
>>>>> alloc_contig_range() returns a MAX_ORDER buddy and results in WARNING.
>>>>>
>>>>> To fix it, call free_contig_range() to free the excess pfn range.
>>>> This has been in mm-hotfixes for a month without issue. Is there any
>>>> reviewer interest?
>>>>
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -6528,7 +6528,8 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>>>> goto done;
>>>>> }
>>>>>
>>>>> - if (!(gfp_mask & __GFP_COMP)) {
>>>>> + if (!(gfp_mask & __GFP_COMP) ||
>>>>> + (is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)) {
>>>>> split_free_pages(cc.freepages, gfp_mask);
>>> This does not look right to me. When a compound page is requested,
>>> alloc_contig_range() should give a compound page, but split_free_pages()
>>> will make the free page as a list of contiguous order-0 pages.
>>>
>>> I do not think we should keep this patch.
>>>
>>> Jinjiang, let me know if I miss anything.
>> After split_free_pages(), below code is execucted to collapse the contiguous order-0 pages
>> to a compound page.
This is wrong. After split_free_pages(), these pages are order-0 pages with refcount 1. Call
prep_new_page(head) and set_page_refcounted(head) lead to VM_BUG_ON_PAGE(page_ref_count(page), page).
> OK, got it. Since cc.freepages can be MAX_PAGE_ORDER and the requested
> PFN range is smaller than MAX_PAGE_ORDER. A more “right” way of handling
> this would be in the second if (the one shown below), you check order
> against the cc.freepages order and free out of requested range pages.
> But that might be too complicated. Your approach is simpler.
For (outer_start != start) || (outer_end != end) case, We could split the pages to order-0 list,
and only call post_alloc_hook/set_page_refcounted for pages in [outer_start, start), [end, outer_end) range,
and call free_contig_range() for them. How about this approach?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ff19413e876..a80767621c52 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6625,6 +6625,37 @@ static void split_free_pages(struct list_head *list, gfp_t gfp_mask)
}
}
+static void split_pages_to_order0(struct list_head *list)
+{
+ int order;
+
+ for (order = 0; order < NR_PAGE_ORDERS; order++) {
+ struct page *page, *next;
+ int nr_pages = 1 << order;
+
+ list_for_each_entry_safe(page, next, &list[order], lru) {
+ int i;
+
+ list_del(&page->lru);
+ for (i = 0; i < nr_pages; i++)
+ list_add_tail(&page[i].lru, &list[0]);
+ }
+ }
+}
+
+static void free_pfn_range(unsigned long start, unsigned end, gfp_t gfp_mask)
+{
+ struct page *page;
+ unsigned long i;
+
+ page = pfn_to_page(start);
+ for (i = 0; i < end - start; ++i, ++page) {
+ post_alloc_hook(page, 0, gfp_mask);
+ set_page_refcounted(page);
+ }
+ free_contig_range(start, end - start);
+}
+
static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
{
const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
@@ -6800,7 +6831,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
* isolated free pages can have higher order than the requested
* one. Use split_free_pages() to free out of range pages.
*/
- if (!(gfp_mask & __GFP_COMP) || range_order < MAX_PAGE_ORDER) {
+ if (!(gfp_mask & __GFP_COMP)) {
split_free_pages(cc.freepages, gfp_mask);
/* Free head and tail (if any) */
@@ -6809,8 +6840,17 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
if (end != outer_end)
free_contig_range(end, outer_end - end);
- outer_start = start;
- outer_end = end;
+ } else if ((outer_start != start) || (end != outer_end)) {
+ struct page *page;
+ int i;
+
+ split_pages_to_order0(cc.freepages);
+
+ if (start != outer_start)
+ free_pfn_range(outer_start, start, gfp_mask);
+
+ if (end != outer_end)
+ free_pfn_range(end, outer_end, gfp_mask);
}
>
> Can you add a comment above
> “(is_power_of_2(end - start) && ilog2(end - start) < MAX_PAGE_ORDER)” to
> explain why __GFP_COMP needs split_free_pages()? Something like:
>
> With __GFP_COMP and the requested order < MAX_PAGE_ORDER, isolated free
> pages can have higher order than the requested one. Use split_free_pages()
> to free out of range pages.
>
>
>> if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
> is_power_of_2(end - start) is used twice. Can you add a variable for it
> if that sounds good to you?
>
> Thanks.
>
>> struct page *head = pfn_to_page(start);
>> int order = ilog2(end - start);
>>
>> check_new_pages(head, order);
>> prep_new_page(head, order, gfp_mask, 0);
>> set_page_refcounted(head);
>>
>> }
>>
>> Thanks.
>>
>>>>> /* Free head and tail (if any) */
>>>>> @@ -6536,7 +6537,15 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>>>>> free_contig_range(outer_start, start - outer_start);
>>>>> if (end != outer_end)
>>>>> free_contig_range(end, outer_end - end);
>>>>> - } else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
>>>>> +
>>>>> + outer_start = start;
>>>>> + outer_end = end;
>>>>> +
>>>>> + if (!(gfp_mask & __GFP_COMP))
>>>>> + goto done;
>>>>> + }
>>>>> +
>>>>> + if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
>>>>> struct page *head = pfn_to_page(start);
>>>>> int order = ilog2(end - start);
>>>>>
>>> Best Regards,
>>> Yan, Zi
>
> Best Regards,
> Yan, Zi
next prev parent reply other threads:[~2025-05-12 6:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 8:47 Jinjiang Tu
2025-04-18 2:59 ` Andrew Morton
2025-04-18 21:32 ` Zi Yan
2025-04-19 0:54 ` Jinjiang Tu
2025-04-19 1:32 ` Zi Yan
2025-04-19 1:50 ` Jinjiang Tu
2025-05-12 6:38 ` Jinjiang Tu [this message]
2025-05-12 7:58 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80039e40-a518-a85b-b955-96fb048a2dd0@huawei.com \
--to=tujinjiang@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=sunnanyong@huawei.com \
--cc=wangkefeng.wang@huawei.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox