linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	akpm@linux-foundation.org, vbabka@suse.cz, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	wangkefeng.wang@huawei.com, linux-mm@kvack.org,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] mm/compaction: fix low_pfn advance on isolating hugetlb
Date: Fri, 12 Sep 2025 10:22:01 -0700	[thread overview]
Message-ID: <aMRWuS-Ore7jgna3@fedora> (raw)
In-Reply-To: <ED7450E4-51F9-4749-A74E-C3AE8927E577@nvidia.com>

On Thu, Sep 11, 2025 at 09:29:39PM -0400, Zi Yan wrote:
> On 11 Sep 2025, at 21:07, Wei Yang wrote:
> 
> > On Thu, Sep 11, 2025 at 10:27:08AM -0700, Vishal Moola (Oracle) wrote:
> >> On Thu, Sep 11, 2025 at 12:19:34PM -0400, Zi Yan wrote:
> >>> On 10 Sep 2025, at 23:27, Wei Yang wrote:
> >>>
> >>>> On Wed, Sep 10, 2025 at 09:35:53PM -0400, Zi Yan wrote:
> >>>>> On 10 Sep 2025, at 21:25, Wei Yang wrote:
> >>>>>
> >>>>>> On Wed, Sep 10, 2025 at 09:22:40AM +0000, Wei Yang wrote:
> >>>>>>> Commit 56ae0bb349b4 ("mm: compaction: convert to use a folio in
> >>>>>>> isolate_migratepages_block()") converts api from page to folio. But the
> >>>>>>> low_pfn advance for hugetlb page seems wrong when low_pfn doesn't point
> >>>>>>> to head page.
> >>>>>>>
> >>>>>>> Originally, if page is a hugetlb tail page, compound_nr() return 1,
> >>>>>>> which means low_pfn only advance one in next iteration. After the
> >>>>>>> change, low_pfn would advance more than the hugetlb range, since
> >>>>>>> folio_nr_pages() always return total number of the large page. This
> >>>>>>> results in skipping some range to isolate and then to migrate.
> >>>>>>>
> >>>>>>> The worst case for alloc_contig is it does all the isolation and
> >>>>>>> migration, but finally find some range is still not isolated. And then
> >>>>>>> undo all the work and try a new range.
> >>>>>>>
> >>>>>>> Advance low_pfn to the end of hugetlb.
> >>>>>>>
> >>>>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >>>>>>> Fixes: 56ae0bb349b4 ("mm: compaction: convert to use a folio in isolate_migratepages_block()")
> >>>>>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> >>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
> >>>>>>
> >>>>>> Forgot to cc stable.
> >>>>>>
> >>>>>> Cc: <stable@vger.kernel.org>
> >>>>>
> >>>>> Is there any bug report to justify the backport? Since it is more likely
> >>>>> to be a performance issue instead of a correctness issue.
> >>>>>
> >>>>
> >>>> OK, I thought cc-stable is paired with fixes tag.
> >>>>
> >>>> If not, please drop it.
> >>>>
> >>>>>>
> >>>>>>> ---
> >>>>>>> mm/compaction.c | 2 +-
> >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>>>>>> index bf021b31c7ec..1e8f8eca318c 100644
> >>>>>>> --- a/mm/compaction.c
> >>>>>>> +++ b/mm/compaction.c
> >>>>>>> @@ -989,7 +989,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >>>>>>> 				 * Hugepage was successfully isolated and placed
> >>>>>>> 				 * on the cc->migratepages list.
> >>>>>>> 				 */
> >>>>>>> -				low_pfn += folio_nr_pages(folio) - 1;
> >>>>>>> +				low_pfn += folio_nr_pages(folio) - folio_page_idx(folio, page) - 1;
> >>>>>>
> >>>>>> One question is why we advance compound_nr() in original version.
> >>>>>>
> >>>>>> Yes, there are several places advancing compound_nr(), but it seems to iterate
> >>>>>> on the same large page and do the same thing and advance 1 again.
> >>>>>>
> >>>>>> Not sure which part story I missed.
> >>>>>
> >>>>> isolate_migratepages_block() starts from the beginning of a pageblock.
> >>>>> How likely the code hit in the middle of a hugetlb?
> >>>>>
> >>>>
> >>>> OK, this is a kind of optimization based on the knowledge it is not likely to
> >>>> be a tail page?
> >>>
> >>> No, it might be that most of the time page is the head, or people assume so.
> >>
> >> For compound pages, we will always have tail pfn < head pfn, so we should
> >> always find the head page first.
> >>
> >
> > I think you want to say tail pfn > head pfn?
> >
> >> If you did find a case where we somehow encounter a tail page here, I'd
> >> love to see it. And then you'd also want to make sure the other compaction
> >> trackers are appropriately accounted for.
> >
> > I may not follow you here, below is the call flow for
> > isolate_migratepages_block() invoked during __alloc_contig_pages().
> >
> > __alloc_contig_pages(nr_pages, ..);
> >     start = ALIGN(zone->zone_start_pfn, nr_pages);
> >     alloc_contig_range_noprof(start, ..);
> >         __alloc_contig_migrate_range(.., start, ..);
> >             pfn = start;
> >             isolate_migratepages_range(.., pfn, ..);
> >                 isolate_migratepages_block(.., pfn, ..);
> >                     page = pfn_to_page(pfn);
> >     start += nr_pages;
> >
> > In the loop of __alloc_contig_pages(), it iterate on each nr_pages range. And
> > nr_pages seems could be any positive number, so it looks the first pfn checked
> > by isolate_migratepages_block() could be not aligned with page order or less
> > than MAX_PAGE_ORDER. This mean it could be a tail page per my understanding.
> >
> > Maybe I missed some point here?

That looks right, I missed the comment:
        /* Scan block by block. First and last block may be incomplete */

> You are right.
> 
> But nr_pages cannot be any positive number, since ALIGN only accepts
> power of 2 as the alignment. So alloc_contig_pages() might need another
> fix to handle the case nr_pages is not power of 2.
> 
> Oh, after I checked pfn_range_valid_contig(), I find your example does not
> apply, since it returns false when any page in the range is PageHuge.
> This means with your example, PageHuge branch will never be executed.
> But alloc_contig_range_noprof() is exported and can be used directly,
> the @start input can be any pfn, which can be in the middle of PageHuge.
> So your fix is still needed for this case.

Makes sense to me.


  reply	other threads:[~2025-09-12 17:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10  9:22 Wei Yang
2025-09-11  1:25 ` Wei Yang
2025-09-11  1:35   ` Zi Yan
2025-09-11  1:38     ` Zi Yan
2025-09-11  1:50       ` Zi Yan
2025-09-11  3:34         ` Wei Yang
2025-09-11  6:30           ` Wei Yang
2025-09-11 16:28             ` Zi Yan
2025-09-12  0:28               ` Wei Yang
2025-09-12  1:13                 ` Zi Yan
2025-09-12  6:00                   ` Baolin Wang
2025-09-13  0:22                     ` Zi Yan
2025-09-11  3:27     ` Wei Yang
2025-09-11 16:19       ` Zi Yan
2025-09-11 17:27         ` Vishal Moola (Oracle)
2025-09-12  1:07           ` Wei Yang
2025-09-12  1:29             ` Zi Yan
2025-09-12 17:22               ` Vishal Moola (Oracle) [this message]
2025-09-13  0:11                 ` Wei Yang
2025-09-13  0:10               ` Wei Yang
2025-09-23  2:35                 ` Andrew Morton
2025-09-23  2:41                   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMRWuS-Ore7jgna3@fedora \
    --to=vishal.moola@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=richard.weiyang@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox