From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>, Vlastimil Babka <vbabka@suse.cz>
Cc: linux-kernel@vger.kernel.org,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Yu Zhao <yuzhao@google.com>
Subject: Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
Date: Mon, 9 Dec 2024 22:35:25 +0100 [thread overview]
Message-ID: <db1815b1-fd24-4b8f-ab64-32b0c4df6cd6@redhat.com> (raw)
In-Reply-To: <37B7A92E-B58F-442D-8501-B07A507F0451@nvidia.com>
On 09.12.24 20:23, Zi Yan wrote:
> On 9 Dec 2024, at 14:01, Vlastimil Babka wrote:
>
>> On 12/6/24 10:59, David Hildenbrand wrote:
>>> Let's special-case for the common scenarios that:
>>>
>>> (a) We are freeing pages <= pageblock_order
>>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>>> (especially, no mixture of isolated and non-isolated pageblocks)
>>
>> Well in many of those cases we could also just adjust the pageblocks... But
>> perhaps they indeed shouldn't differ in the first place, unless there's an
>> isolation attempt.
>>
>>> When we encounter a > MAX_PAGE_ORDER page, it can only come from
>>> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>>>
>>> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
>>> check whether all pageblocks match, and if so (common case), don't
>>> split them up just for the buddy to merge them back.
>>>
>>> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
>>> for example during system startups, memory onlining, or when isolating
>>> consecutive pageblocks via alloc_contig_range()/memory offlining, that
>>> we don't unnecessarily split up what we'll immediately merge again,
>>> because the migratetypes match.
>>>
>>> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
>>> clearer what's happening, and handle in it only natural buddy orders,
>>> not the alloc_contig_range(__GFP_COMP) special case: handle that in
>>> free_one_page() only.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz
>>
>> Hm but noticed something:
>>
>>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>>> + unsigned long pfn, int order, fpi_t fpi_flags)
>>> +{
>>> + const unsigned long end_pfn = pfn + (1 << order);
>>> + int mt = get_pfnblock_migratetype(page, pfn);
>>> +
>>> + VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>> /* Caller removed page from freelist, buddy info cleared! */
>>> VM_WARN_ON_ONCE(PageBuddy(page));
>>>
>>> - if (order > pageblock_order)
>>> - order = pageblock_order;
>>> -
>>> - while (pfn != end) {
>>> - int mt = get_pfnblock_migratetype(page, pfn);
>>> + /*
>>> + * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>>> + * pages that cover pageblocks with different migratetypes; for example
>>> + * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>>> + * case, fallback to freeing individual pageblocks so they get put
>>> + * onto the right lists.
>>> + */
>>> + if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>>> + likely(order <= pageblock_order) ||
>>> + pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>>> + __free_one_page(page, pfn, zone, order, mt, fpi_flags);
>>> + return;
>>> + }
>>>
>>> - __free_one_page(page, pfn, zone, order, mt, fpi);
>>> - pfn += 1 << order;
>>> + while (pfn != end_pfn) {
>>> + mt = get_pfnblock_migratetype(page, pfn);
>>> + __free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
>>> + pfn += pageblock_nr_pages;
>>> page = pfn_to_page(pfn);
>>
>> This predates your patch, but seems potentially dangerous to attempt
>> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
>> being just outside of the valid range? Should we change that?
>>
>> But seems this code was initially introduced as part of Johannes'
>> migratetype hygiene series.
>
> It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make
> alloc_contig_range work at pageblock granularity”), but harmless since
> it is only used to split a buddy page. Then commit fd919a85cd55 ("mm:
> page_isolation: prepare for hygienic freelists") refactored it, which
> should be fine, since it is still used for the same purpose in page
> isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
> used it for gigantic hugetlb.
>
> For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not
> be. The code above using pfn in the loop might be fine. And since order
> is provided, unless the caller is providing a falsely large order, pfn
> should be valid. Or am I missing anything?
I think the question is, what happens when we call pfn_to_page() on a
PFN that falls into a memory section that is either offline, doesn't
have a memmap, or does not exist.
With CONFIG_SPARSEMEM, we do a
struct mem_section *__sec = __pfn_to_section(__pfn)
__section_mem_map_addr(__sec) + __pfn;
__pfn_to_section() can return NULL, in which case
__section_mem_map_addr() would dereference NULL.
I assume it ould happen in corner cases, if we'd exceed
NR_SECTION_ROOTS. (IOW, large memory, and we free a page that is at the
very end of physical memory).
Likely, we should do the pfn_to_page() before the __free_one_page() call.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-12-09 21:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-06 9:59 [PATCH v1 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand
2024-12-06 9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
2024-12-06 16:58 ` Zi Yan
2024-12-07 6:48 ` Yu Zhao
2024-12-09 19:01 ` Vlastimil Babka
2024-12-09 19:23 ` Zi Yan
2024-12-09 21:35 ` David Hildenbrand [this message]
2024-12-09 21:42 ` Zi Yan
2024-12-09 22:10 ` David Hildenbrand
2024-12-09 21:36 ` Vlastimil Babka
2024-12-10 9:39 ` David Hildenbrand
2024-12-06 9:59 ` [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
2024-12-06 16:59 ` Zi Yan
2024-12-09 22:13 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=db1815b1-fd24-4b8f-ab64-32b0c4df6cd6@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vbabka@suse.cz \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox