From: Zi Yan <ziy@nvidia.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>,
akpm@linux-foundation.org, vbabka@suse.cz, surenb@google.com,
mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
wangkefeng.wang@huawei.com, linux-mm@kvack.org,
Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb
Date: Thu, 11 Sep 2025 21:29:39 -0400 [thread overview]
Message-ID: <ED7450E4-51F9-4749-A74E-C3AE8927E577@nvidia.com> (raw)
In-Reply-To: <20250912010721.zd67xbfdava2u26e@master>
On 11 Sep 2025, at 21:07, Wei Yang wrote:
> On Thu, Sep 11, 2025 at 10:27:08AM -0700, Vishal Moola (Oracle) wrote:
>> On Thu, Sep 11, 2025 at 12:19:34PM -0400, Zi Yan wrote:
>>> On 10 Sep 2025, at 23:27, Wei Yang wrote:
>>>
>>>> On Wed, Sep 10, 2025 at 09:35:53PM -0400, Zi Yan wrote:
>>>>> On 10 Sep 2025, at 21:25, Wei Yang wrote:
>>>>>
>>>>>> On Wed, Sep 10, 2025 at 09:22:40AM +0000, Wei Yang wrote:
>>>>>>> Commit 56ae0bb349b4 ("mm: compaction: convert to use a folio in
>>>>>>> isolate_migratepages_block()") converts api from page to folio. But the
>>>>>>> low_pfn advance for hugetlb page seems wrong when low_pfn doesn't point
>>>>>>> to head page.
>>>>>>>
>>>>>>> Originally, if page is a hugetlb tail page, compound_nr() return 1,
>>>>>>> which means low_pfn only advance one in next iteration. After the
>>>>>>> change, low_pfn would advance more than the hugetlb range, since
>>>>>>> folio_nr_pages() always return total number of the large page. This
>>>>>>> results in skipping some range to isolate and then to migrate.
>>>>>>>
>>>>>>> The worst case for alloc_contig is it does all the isolation and
>>>>>>> migration, but finally find some range is still not isolated. And then
>>>>>>> undo all the work and try a new range.
>>>>>>>
>>>>>>> Advance low_pfn to the end of hugetlb.
>>>>>>>
>>>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>>>> Fixes: 56ae0bb349b4 ("mm: compaction: convert to use a folio in isolate_migratepages_block()")
>>>>>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>
>>>>>> Forgot to cc stable.
>>>>>>
>>>>>> Cc: <stable@vger.kernel.org>
>>>>>
>>>>> Is there any bug report to justify the backport? Since it is more likely
>>>>> to be a performance issue instead of a correctness issue.
>>>>>
>>>>
>>>> OK, I thought cc-stable is paired with fixes tag.
>>>>
>>>> If not, please drop it.
>>>>
>>>>>>
>>>>>>> ---
>>>>>>> mm/compaction.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>> index bf021b31c7ec..1e8f8eca318c 100644
>>>>>>> --- a/mm/compaction.c
>>>>>>> +++ b/mm/compaction.c
>>>>>>> @@ -989,7 +989,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>>> * Hugepage was successfully isolated and placed
>>>>>>> * on the cc->migratepages list.
>>>>>>> */
>>>>>>> - low_pfn += folio_nr_pages(folio) - 1;
>>>>>>> + low_pfn += folio_nr_pages(folio) - folio_page_idx(folio, page) - 1;
>>>>>>
>>>>>> One question is why we advance compound_nr() in original version.
>>>>>>
>>>>>> Yes, there are several places advancing compound_nr(), but it seems to iterate
>>>>>> on the same large page and do the same thing and advance 1 again.
>>>>>>
>>>>>> Not sure which part story I missed.
>>>>>
>>>>> isolate_migratepages_block() starts from the beginning of a pageblock.
>>>>> How likely the code hit in the middle of a hugetlb?
>>>>>
>>>>
>>>> OK, this is a kind of optimization based on the knowledge it is not likely to
>>>> be a tail page?
>>>
>>> No, it might be that most of the time page is the head, or people assume so.
>>
>> For compound pages, we will always have tail pfn < head pfn, so we should
>> always find the head page first.
>>
>
> I think you want to say tail pfn > head pfn?
>
>> If you did find a case where we somehow encounter a tail page here, I'd
>> love to see it. And then you'd also want to make sure the other compaction
>> trackers are appropriately accounted for.
>
> I may not follow you here, below is the call flow for
> isolate_migratepages_block() invoked during __alloc_contig_pages().
>
> __alloc_contig_pages(nr_pages, ..);
> start = ALIGN(zone->zone_start_pfn, nr_pages);
> alloc_contig_range_noprof(start, ..);
> __alloc_contig_migrate_range(.., start, ..);
> pfn = start;
> isolate_migratepages_range(.., pfn, ..);
> isolate_migratepages_block(.., pfn, ..);
> page = pfn_to_page(pfn);
> start += nr_pages;
>
> In the loop of __alloc_contig_pages(), it iterate on each nr_pages range. And
> nr_pages seems could be any positive number, so it looks the first pfn checked
> by isolate_migratepages_block() could be not aligned with page order or less
> than MAX_PAGE_ORDER. This mean it could be a tail page per my understanding.
>
> Maybe I missed some point here?
You are right.
But nr_pages cannot be any positive number, since ALIGN only accepts
power of 2 as the alignment. So alloc_contig_pages() might need another
fix to handle the case nr_pages is not power of 2.
Oh, after I checked pfn_range_valid_contig(), I find your example does not
apply, since it returns false when any page in the range is PageHuge.
This means with your example, PageHuge branch will never be executed.
But alloc_contig_range_noprof() is exported and can be used directly,
the @start input can be any pfn, which can be in the middle of PageHuge.
So your fix is still needed for this case.
--
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2025-09-12 1:29 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 9:22 Wei Yang
2025-09-11 1:25 ` Wei Yang
2025-09-11 1:35 ` Zi Yan
2025-09-11 1:38 ` Zi Yan
2025-09-11 1:50 ` Zi Yan
2025-09-11 3:34 ` Wei Yang
2025-09-11 6:30 ` Wei Yang
2025-09-11 16:28 ` Zi Yan
2025-09-12 0:28 ` Wei Yang
2025-09-12 1:13 ` Zi Yan
2025-09-12 6:00 ` Baolin Wang
2025-09-13 0:22 ` Zi Yan
2025-09-11 3:27 ` Wei Yang
2025-09-11 16:19 ` Zi Yan
2025-09-11 17:27 ` Vishal Moola (Oracle)
2025-09-12 1:07 ` Wei Yang
2025-09-12 1:29 ` Zi Yan [this message]
2025-09-12 17:22 ` Vishal Moola (Oracle)
2025-09-13 0:11 ` Wei Yang
2025-09-13 0:10 ` Wei Yang
2025-09-23 2:35 ` Andrew Morton
2025-09-23 2:41 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ED7450E4-51F9-4749-A74E-C3AE8927E577@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=richard.weiyang@gmail.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox