Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	wangkefeng.wang@huawei.com, linux-mm@kvack.org,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb
Date: Fri, 12 Sep 2025 20:22:32 -0400	[thread overview]
Message-ID: <1E7C1A9D-F8DB-4CF4-A84D-09B7FF75C1AD@nvidia.com> (raw)
In-Reply-To: <2e2dd208-c3f2-4d44-835e-003af2f019bd@linux.alibaba.com>

On 12 Sep 2025, at 2:00, Baolin Wang wrote:

> On 2025/9/12 09:13, Zi Yan wrote:
>> On 11 Sep 2025, at 20:28, Wei Yang wrote:
>>
>>> On Thu, Sep 11, 2025 at 12:28:24PM -0400, Zi Yan wrote:
>>>> On 11 Sep 2025, at 2:30, Wei Yang wrote:
>>>>
>>>>> On Thu, Sep 11, 2025 at 03:34:13AM +0000, Wei Yang wrote:
>>>>>> On Wed, Sep 10, 2025 at 09:50:09PM -0400, Zi Yan wrote:
>>>>>>> On 10 Sep 2025, at 21:38, Zi Yan wrote:
>>>>>>>
>>>>>>>> On 10 Sep 2025, at 21:35, Zi Yan wrote:
>>>>>>>>
>>>>>>>>> On 10 Sep 2025, at 21:25, Wei Yang wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Sep 10, 2025 at 09:22:40AM +0000, Wei Yang wrote:
>>>>>>>>>>> Commit 56ae0bb349b4 ("mm: compaction: convert to use a folio in
>>>>>>>>>>> isolate_migratepages_block()") converts api from page to folio. But the
>>>>>>>>>>> low_pfn advance for hugetlb page seems wrong when low_pfn doesn't point
>>>>>>>>>>> to head page.
>>>>>>>>>>>
>>>>>>>>>>> Originally, if page is a hugetlb tail page, compound_nr() return 1,
>>>>>>>>>>> which means low_pfn only advance one in next iteration. After the
>>>>>>>>>>> change, low_pfn would advance more than the hugetlb range, since
>>>>>>>>>>> folio_nr_pages() always return total number of the large page. This
>>>>>>>>>>> results in skipping some range to isolate and then to migrate.
>>>>>>>>>>>
>>>>>>>>>>> The worst case for alloc_contig is it does all the isolation and
>>>>>>>>>>> migration, but finally find some range is still not isolated. And then
>>>>>>>>>>> undo all the work and try a new range.
>>>>>>>>>>>
>>>>>>>>>>> Advance low_pfn to the end of hugetlb.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>>>>>>>>> Fixes: 56ae0bb349b4 ("mm: compaction: convert to use a folio in isolate_migratepages_block()")
>>>>>>>
>>>>>>> This behavior seems to be introduced by commit 369fa227c219 ("mm: make
>>>>>>> alloc_contig_range handle free hugetlb pages”). The related change is:
>>>>>>>
>>>>>>> +               if (PageHuge(page) && cc->alloc_contig) {
>>>>>>> +                       ret = isolate_or_dissolve_huge_page(page);
>>>>>>> +
>>>>>>> +                       /*
>>>>>>> +                        * Fail isolation in case isolate_or_dissolve_huge_page()
>>>>>>> +                        * reports an error. In case of -ENOMEM, abort right away.
>>>>>>> +                        */
>>>>>>> +                       if (ret < 0) {
>>>>>>> +                                /* Do not report -EBUSY down the chain */
>>>>>>> +                               if (ret == -EBUSY)
>>>>>>> +                                       ret = 0;
>>>>>>> +                               low_pfn += (1UL << compound_order(page)) - 1;
>>>>>>
>>>>>> The compound_order(page) return 1 for a tail page.
>>>>>>
>>>>>> See below.
>>>>>>
>>>>>>> +                               goto isolate_fail;
>>>>>>> +                       }
>>>>>>> +
>>>>>>> +                       /*
>>>>>>> +                        * Ok, the hugepage was dissolved. Now these pages are
>>>>>>> +                        * Buddy and cannot be re-allocated because they are
>>>>>>> +                        * isolated. Fall-through as the check below handles
>>>>>>> +                        * Buddy pages.
>>>>>>> +                        */
>>>>>>> +               }
>>>>>>> +
>>>>>>>
>>>>>>>>>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>>
>>>>>>>>>> Forgot to cc stable.
>>>>>>>>>>
>>>>>>>>>> Cc: <stable@vger.kernel.org>
>>>>>>>>>
>>>>>>>>> Is there any bug report to justify the backport? Since it is more likely
>>>>>>>>> to be a performance issue instead of a correctness issue.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> mm/compaction.c | 2 +-
>>>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>>>>>> index bf021b31c7ec..1e8f8eca318c 100644
>>>>>>>>>>> --- a/mm/compaction.c
>>>>>>>>>>> +++ b/mm/compaction.c
>>>>>>>>>>> @@ -989,7 +989,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>>>>>>> 				 * Hugepage was successfully isolated and placed
>>>>>>>>>>> 				 * on the cc->migratepages list.
>>>>>>>>>>> 				 */
>>>>>>>>>>> -				low_pfn += folio_nr_pages(folio) - 1;
>>>>>>>>>>> +				low_pfn += folio_nr_pages(folio) - folio_page_idx(folio, page) - 1;
>>>>>>>>>>
>>>>>>>>>> One question is why we advance compound_nr() in original version.
>>>>>>>>>>
>>>>>>>>>> Yes, there are several places advancing compound_nr(), but it seems to iterate
>>>>>>>>>> on the same large page and do the same thing and advance 1 again.
>>>>>>>>>>
>>>>>>>>>> Not sure which part story I missed.
>>>>>>>>>
>>>>>>>>> isolate_migratepages_block() starts from the beginning of a pageblock.
>>>>>>>>> How likely the code hit in the middle of a hugetlb?
>>>>>>>>>
>>>>>>>>
>>>>>>>> In addition, there are two other “low_pfn += (1UL << order) - 1”
>>>>>>>> in the if (PageHuge(page)), why not change them too if you think
>>>>>>>> page can point to the middle of a hugetlb?
>>>>>>>>
>>>>>>
>>>>>> The order here is get from compound_order(page), which is 1 for a tail page.
>>>>>>
>>>>>> So it looks right. Maybe I misunderstand it?
>>>>>
>>>>> Oops, compound_order(page) returns 0 for tail page.
>>>>>
>>>>> What I want to say is low_pfn advance by 1 for tail page. Sorry for the
>>>>> misleading.
>>>>
>>>> OK, that sounds inefficient and inconsistent with your fix.
>>>>
>>>> While at it, can you also change two “low_pfn += (1UL << order) - 1” to skip
>>>> the rest of hugetlb?
>>>>
>>>
>>> Sure, glad to.
>>>
>>> You prefer do the fix in one patch or have a separate one?
>>
>> A separate one is better, since these are optimizations, whereas your current
>> patch is a fix.
>
> I'm afraid we should not do that. Because the order getting from compound_order(page) is not stable without lock protection for a hugetlb, This is why we add a sanity check 'if (order <= MAX_PAGE_ORDER)' before advancing low_pfn.

But if page is not the head, "low_pfn += (1UL << order) - 1" will skip
more than the hugetlb. But it is OK, since these two are isolate_fail
cases and make alloc_contig = true fail anyway.

OK. It seems that we should leave two "low_pfn += (1UL << order) - 1" unchanged.


--
Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2025-09-13  0:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  9:22 Wei Yang
2025-09-11  1:25 ` Wei Yang
2025-09-11  1:35   ` Zi Yan
2025-09-11  1:38     ` Zi Yan
2025-09-11  1:50       ` Zi Yan
2025-09-11  3:34         ` Wei Yang
2025-09-11  6:30           ` Wei Yang
2025-09-11 16:28             ` Zi Yan
2025-09-12  0:28               ` Wei Yang
2025-09-12  1:13                 ` Zi Yan
2025-09-12  6:00                   ` Baolin Wang
2025-09-13  0:22                     ` Zi Yan [this message]
2025-09-11  3:27     ` Wei Yang
2025-09-11 16:19       ` Zi Yan
2025-09-11 17:27         ` Vishal Moola (Oracle)
2025-09-12  1:07           ` Wei Yang
2025-09-12  1:29             ` Zi Yan
2025-09-12 17:22               ` Vishal Moola (Oracle)
2025-09-13  0:11                 ` Wei Yang
2025-09-13  0:10               ` Wei Yang
2025-09-23  2:35                 ` Andrew Morton
2025-09-23  2:41                   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1E7C1A9D-F8DB-4CF4-A84D-09B7FF75C1AD@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=richard.weiyang@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox