From: Zi Yan <ziy@nvidia.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, surenb@google.com,
mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
wangkefeng.wang@huawei.com, linux-mm@kvack.org,
Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb
Date: Thu, 11 Sep 2025 21:13:30 -0400 [thread overview]
Message-ID: <0D6DB0F3-A893-4B60-9939-FF4575FEE3CB@nvidia.com> (raw)
In-Reply-To: <20250912002854.plq5uhy432xq6puv@master>
On 11 Sep 2025, at 20:28, Wei Yang wrote:
> On Thu, Sep 11, 2025 at 12:28:24PM -0400, Zi Yan wrote:
>> On 11 Sep 2025, at 2:30, Wei Yang wrote:
>>
>>> On Thu, Sep 11, 2025 at 03:34:13AM +0000, Wei Yang wrote:
>>>> On Wed, Sep 10, 2025 at 09:50:09PM -0400, Zi Yan wrote:
>>>>> On 10 Sep 2025, at 21:38, Zi Yan wrote:
>>>>>
>>>>>> On 10 Sep 2025, at 21:35, Zi Yan wrote:
>>>>>>
>>>>>>> On 10 Sep 2025, at 21:25, Wei Yang wrote:
>>>>>>>
>>>>>>>> On Wed, Sep 10, 2025 at 09:22:40AM +0000, Wei Yang wrote:
>>>>>>>>> Commit 56ae0bb349b4 ("mm: compaction: convert to use a folio in
>>>>>>>>> isolate_migratepages_block()") converts api from page to folio. But the
>>>>>>>>> low_pfn advance for hugetlb page seems wrong when low_pfn doesn't point
>>>>>>>>> to head page.
>>>>>>>>>
>>>>>>>>> Originally, if page is a hugetlb tail page, compound_nr() return 1,
>>>>>>>>> which means low_pfn only advance one in next iteration. After the
>>>>>>>>> change, low_pfn would advance more than the hugetlb range, since
>>>>>>>>> folio_nr_pages() always return total number of the large page. This
>>>>>>>>> results in skipping some range to isolate and then to migrate.
>>>>>>>>>
>>>>>>>>> The worst case for alloc_contig is it does all the isolation and
>>>>>>>>> migration, but finally find some range is still not isolated. And then
>>>>>>>>> undo all the work and try a new range.
>>>>>>>>>
>>>>>>>>> Advance low_pfn to the end of hugetlb.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>>>>>> Fixes: 56ae0bb349b4 ("mm: compaction: convert to use a folio in isolate_migratepages_block()")
>>>>>
>>>>> This behavior seems to be introduced by commit 369fa227c219 ("mm: make
>>>>> alloc_contig_range handle free hugetlb pages”). The related change is:
>>>>>
>>>>> + if (PageHuge(page) && cc->alloc_contig) {
>>>>> + ret = isolate_or_dissolve_huge_page(page);
>>>>> +
>>>>> + /*
>>>>> + * Fail isolation in case isolate_or_dissolve_huge_page()
>>>>> + * reports an error. In case of -ENOMEM, abort right away.
>>>>> + */
>>>>> + if (ret < 0) {
>>>>> + /* Do not report -EBUSY down the chain */
>>>>> + if (ret == -EBUSY)
>>>>> + ret = 0;
>>>>> + low_pfn += (1UL << compound_order(page)) - 1;
>>>>
>>>> The compound_order(page) return 1 for a tail page.
>>>>
>>>> See below.
>>>>
>>>>> + goto isolate_fail;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * Ok, the hugepage was dissolved. Now these pages are
>>>>> + * Buddy and cannot be re-allocated because they are
>>>>> + * isolated. Fall-through as the check below handles
>>>>> + * Buddy pages.
>>>>> + */
>>>>> + }
>>>>> +
>>>>>
>>>>>>>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>
>>>>>>>> Forgot to cc stable.
>>>>>>>>
>>>>>>>> Cc: <stable@vger.kernel.org>
>>>>>>>
>>>>>>> Is there any bug report to justify the backport? Since it is more likely
>>>>>>> to be a performance issue instead of a correctness issue.
>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> mm/compaction.c | 2 +-
>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>>>> index bf021b31c7ec..1e8f8eca318c 100644
>>>>>>>>> --- a/mm/compaction.c
>>>>>>>>> +++ b/mm/compaction.c
>>>>>>>>> @@ -989,7 +989,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>>>>> * Hugepage was successfully isolated and placed
>>>>>>>>> * on the cc->migratepages list.
>>>>>>>>> */
>>>>>>>>> - low_pfn += folio_nr_pages(folio) - 1;
>>>>>>>>> + low_pfn += folio_nr_pages(folio) - folio_page_idx(folio, page) - 1;
>>>>>>>>
>>>>>>>> One question is why we advance compound_nr() in original version.
>>>>>>>>
>>>>>>>> Yes, there are several places advancing compound_nr(), but it seems to iterate
>>>>>>>> on the same large page and do the same thing and advance 1 again.
>>>>>>>>
>>>>>>>> Not sure which part story I missed.
>>>>>>>
>>>>>>> isolate_migratepages_block() starts from the beginning of a pageblock.
>>>>>>> How likely the code hit in the middle of a hugetlb?
>>>>>>>
>>>>>>
>>>>>> In addition, there are two other “low_pfn += (1UL << order) - 1”
>>>>>> in the if (PageHuge(page)), why not change them too if you think
>>>>>> page can point to the middle of a hugetlb?
>>>>>>
>>>>
>>>> The order here is get from compound_order(page), which is 1 for a tail page.
>>>>
>>>> So it looks right. Maybe I misunderstand it?
>>>
>>> Oops, compound_order(page) returns 0 for tail page.
>>>
>>> What I want to say is low_pfn advance by 1 for tail page. Sorry for the
>>> misleading.
>>
>> OK, that sounds inefficient and inconsistent with your fix.
>>
>> While at it, can you also change two “low_pfn += (1UL << order) - 1” to skip
>> the rest of hugetlb?
>>
>
> Sure, glad to.
>
> You prefer do the fix in one patch or have a separate one?
A separate one is better, since these are optimizations, whereas your current
patch is a fix.
--
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2025-09-12 1:13 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 9:22 Wei Yang
2025-09-11 1:25 ` Wei Yang
2025-09-11 1:35 ` Zi Yan
2025-09-11 1:38 ` Zi Yan
2025-09-11 1:50 ` Zi Yan
2025-09-11 3:34 ` Wei Yang
2025-09-11 6:30 ` Wei Yang
2025-09-11 16:28 ` Zi Yan
2025-09-12 0:28 ` Wei Yang
2025-09-12 1:13 ` Zi Yan [this message]
2025-09-12 6:00 ` Baolin Wang
2025-09-13 0:22 ` Zi Yan
2025-09-11 3:27 ` Wei Yang
2025-09-11 16:19 ` Zi Yan
2025-09-11 17:27 ` Vishal Moola (Oracle)
2025-09-12 1:07 ` Wei Yang
2025-09-12 1:29 ` Zi Yan
2025-09-12 17:22 ` Vishal Moola (Oracle)
2025-09-13 0:11 ` Wei Yang
2025-09-13 0:10 ` Wei Yang
2025-09-23 2:35 ` Andrew Morton
2025-09-23 2:41 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0D6DB0F3-A893-4B60-9939-FF4575FEE3CB@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=richard.weiyang@gmail.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox