Re: [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Zi Yan <ziy@nvidia.com>, Vlastimil Babka <vbabka@suse.cz>,
	Yu Zhao <yuzhao@google.com>
Subject: Re: [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
Date: Tue, 10 Dec 2024 22:40:15 +0100	[thread overview]
Message-ID: <d6a79fa6-dcc1-4181-9946-940a91c0b1f2@redhat.com> (raw)
In-Reply-To: <20241210211613.GC2508492@cmpxchg.org>

On 10.12.24 22:16, Johannes Weiner wrote:
> On Tue, Dec 10, 2024 at 11:29:52AM +0100, David Hildenbrand wrote:
>> @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>   	spin_unlock_irqrestore(&zone->lock, flags);
>>   }
>>   
>> -/* Split a multi-block free page into its individual pageblocks. */
>> -static void split_large_buddy(struct zone *zone, struct page *page,
>> -			      unsigned long pfn, int order, fpi_t fpi)
>> +static bool pfnblock_migratetype_equal(unsigned long pfn,
>> +		unsigned long end_pfn, int mt)
>>   {
>> -	unsigned long end = pfn + (1 << order);
>> +	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages));
>>   
>> +	while (pfn != end_pfn) {
>> +		struct page *page = pfn_to_page(pfn);
>> +
>> +		if (unlikely(mt != get_pfnblock_migratetype(page, pfn)))
>> +			return false;
>> +		pfn += pageblock_nr_pages;
>> +	}
>> +	return true;
>> +}
>> +
>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>> +		unsigned long pfn, int order, fpi_t fpi_flags)
>> +{
>> +	const unsigned long end_pfn = pfn + (1 << order);
>> +	int mt = get_pfnblock_migratetype(page, pfn);
>> +
>> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>   	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>   	/* Caller removed page from freelist, buddy info cleared! */
>>   	VM_WARN_ON_ONCE(PageBuddy(page));
>>   
>> -	if (order > pageblock_order)
>> -		order = pageblock_order;
>> +	/*
>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>> +	 * pages that cover pageblocks with different migratetypes; for example
>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>> +	 * case, fallback to freeing individual pageblocks so they get put
>> +	 * onto the right lists.
>> +	 */
>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>> +	    likely(order <= pageblock_order) ||
>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>> +		return;
>> +	}

Hi Johannes,

> 
> Ok, if memory isolation is disabled, we know the migratetypes are all
> matching up, and we can skip the check. However, if memory isolation
> is enabled, but this isn't move_freepages_block_isolate() calling, we
> still do the check unnecessarily and slow down the boot, no?

Yes, although it's on most machines one additional pageblock check 
(x86), on some a bit more (e.g., 3 on s390x).

As mentioned:

"
In the future, we might want to assume that all pageblocks are equal if
zone->nr_isolate_pageblock  == 0; however, that will require some
zone->nr_isolate_pageblock accounting changes, such that we are
guaranteed to see zone->nr_isolate_pageblock != 0 when there is an
isolated pageblock.
"

With that boot time wouldn't suffer in any significant way.

> 
> Having a function guess the caller is a bit of an anti-pattern. The
> resulting code is hard to follow, and it's very easy to
> unintentionally burden some cases with unnecessary stuff. It's better
> to unshare paths until you don't need conditionals like this.
 > > In addition to the fastpath, I think you're also punishing the
> move_freepages_block_isolate() case. We *know* we just changed the
> type of one of the buddy's blocks, and yet you're still checking the
> the range again to decide whether to split.

Yes, that's not ideal, and it would be easy to unshare that case (call 
the "split" function instead of a "maybe_split" function).

I am not 100% sure though, if move_freepages_block_isolate() can always 
decide "I really have a mixture", but that code is simply quite advanced :)

> 
> All of this to accomodate hugetlb, which might not even be compiled
> in? Grrrr.

Jup. But at the same time, it's frequently compiled in but never used 
(or barely used; I mean, how often do people actually free 1Gig hugetlb 
pages compared to ordinary pages).

> 
> Like you, I was quite surprised to see that GFP_COMP patch in the
> buddy hotpath splitting *everything* into blocks - on the offchance
> that somebody might free a hugetlb page. Even if !CONFIG_HUGETLB. Just
> - what the hell. We shouldn't merge "I only care about my niche
> usecase at the expense of literally everybody else" patches like this.

After talking to Willy, the whole _GFP_COMP stuff might get removed 
sooner or later again once we hand out frozen refcount in 
alloc_contig_range(). It might take a while, though.

> 
> My vote is NAK on this patch, and a retro-NAK on the GFP_COMP patch.

I won't fight for this patch *if* the GFP_COMP patch gets reverted. It 
improves the situation, which can be improved further.

But if it doesn't get reverted, we have to think about something else.

> 
> The buddy allocator operates on the assumption of MAX_PAGE_ORDER. If
> we support folios of a larger size sourced from other allocators, then
> it should be the folio layer discriminating. So if folio_put() detects
> this is a massive alloc_contig chunk, then it should take a different
> freeing path. Do the splitting in there, then pass valid chunks back
> to the buddy. That would keep the layering cleaner and the cornercase
> overhead out of the allocator fastpath.

That might be better, although not that completely trivial I assume.

How to handle the "MAX_PAGE_ORDER page is getting freed but one 
pageblock is isolated" case cleanly is a bit of a head scratcher, at 
least to me. But I suspect we had it fullt working before the GFF_COMP 
patch.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-12-10 21:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10 10:29 [PATCH v2 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand
2024-12-10 10:29 ` [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
2024-12-10 21:16   ` Johannes Weiner
2024-12-10 21:40     ` David Hildenbrand [this message]
2024-12-11 13:04       ` David Hildenbrand
2024-12-10 10:29 ` [PATCH v2 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6a79fa6-dcc1-4181-9946-940a91c0b1f2@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox