From: Zi Yan <ziy@nvidia.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Miaohe Lin <linmiaohe@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
Date: Mon, 16 Oct 2023 11:00:33 -0400 [thread overview]
Message-ID: <5B61B70F-E85D-4E6F-8856-17A90B899F98@nvidia.com> (raw)
In-Reply-To: <20231016143717.GH470544@cmpxchg.org>
[-- Attachment #1: Type: text/plain, Size: 6119 bytes --]
On 16 Oct 2023, at 10:37, Johannes Weiner wrote:
> On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote:
>>> The attached patch has all the suggested changes, let me know how it
>>> looks to you. Thanks.
>>
>> The one I sent has free page accounting issues. The attached one fixes them.
>
> Do you still have the warnings? I wonder what went wrong.
No warnings. But something with the code:
1. in your version, split_free_page() is called without changing any pageblock
migratetypes, then split_free_page() is just a no-op, since the page is
just deleted from the free list, then freed via different orders. Buddy allocator
will merge them back.
2. in my version, I set pageblock migratetype to new_mt before split_free_page(),
but it causes free page accounting issues, since in the case of head, free pages
are deleted from new_mt when they are in old_mt free list and the accounting
decreases new_mt free page number instead of old_mt one.
Basically, split_free_page() is awkward as it relies on preset migratetypes,
which changes migratetypes without deleting the free pages from the list first.
That is why I came up with the new split_free_page() below.
>
>> @@ -883,6 +886,10 @@ int split_free_page(struct page *free_page,
>> mt = get_pfnblock_migratetype(free_page, free_page_pfn);
>> del_page_from_free_list(free_page, zone, order, mt);
>>
>> + set_pageblock_migratetype(free_page, mt1);
>> + set_pageblock_migratetype(pfn_to_page(free_page_pfn + split_pfn_offset),
>> + mt2);
>> +
>> for (pfn = free_page_pfn;
>> pfn < free_page_pfn + (1UL << order);) {
>> int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>
> I don't think this is quite right.
>
> With CONFIG_ARCH_FORCE_MAX_ORDER it's possible that we're dealing with
> a buddy that is more than two blocks:
>
> [pageblock 0][pageblock 1][pageblock 2][pageblock 3]
> [buddy ]
> [isolate range ..
>
> That for loop splits the buddy into 4 blocks. The code above would set
> pageblock 0 to old_mt, and pageblock 1 to new_mt. But it should only
> set pageblock 3 to new_mt.
OK. I think I need to fix split_free_page().
Hmm, if CONFIG_ARCH_FORCE_MAX_ORDER can make a buddy have more than one
pageblock and in turn makes an in-use page have more than one pageblock,
we will have problems. Since in isolate_single_pageblock(), an in-use page
can have part of its pageblock set to a different migratetype and be freed,
causing the free page with unmatched migratetypes. We might need to
free pages at pageblock_order if their orders are bigger than pageblock_order.
Which arch with CONFIG_ARCH_FORCE_MAX_ORDER can have a buddy containing more
than one pageblocks? I would like to make some tests.
>
> My proposal had the mt update in the caller:
>
>> @@ -139,6 +139,62 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e
>> return NULL;
>> }
>>
>> +/*
>> + * additional steps for moving free pages during page isolation
>> + */
>> +static int move_freepages_for_isolation(struct zone *zone, unsigned long start_pfn,
>> + unsigned long end_pfn, int old_mt, int new_mt)
>> +{
>> + struct page *start_page = pfn_to_page(start_pfn);
>> + unsigned long pfn;
>> +
>> + VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1));
>> + VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn);
>> +
>> + /*
>> + * A free page may be comprised of 2^n blocks, which means our
>> + * block of interest could be head or tail in such a page.
>> + *
>> + * If we're a tail, update the type of our block, then split
>> + * the page into pageblocks. The splitting will do the leg
>> + * work of sorting the blocks into the right freelists.
>> + *
>> + * If we're a head, split the page into pageblocks first. This
>> + * ensures the migratetypes still match up during the freelist
>> + * removal. Then do the regular scan for buddies in the block
>> + * of interest, which will handle the rest.
>> + *
>> + * In theory, we could try to preserve 2^1 and larger blocks
>> + * that lie outside our range. In practice, MAX_ORDER is
>> + * usually one or two pageblocks anyway, so don't bother.
>> + *
>> + * Note that this only applies to page isolation, which calls
>> + * this on random blocks in the pfn range! When we move stuff
>> + * from inside the page allocator, the pages are coming off
>> + * the freelist (can't be tail) and multi-block pages are
>> + * handled directly in the stealing code (can't be a head).
>> + */
>> +
>> + /* We're a tail */
>> + pfn = find_straddling_buddy(start_pfn);
>> + if (pfn != start_pfn) {
>> + struct page *free_page = pfn_to_page(pfn);
>> +
>> + split_free_page(free_page, buddy_order(free_page),
>> + pageblock_nr_pages, old_mt, new_mt);
>> + return pageblock_nr_pages;
>> + }
>> +
>> + /* We're a head */
>> + if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order) {
>> + split_free_page(start_page, buddy_order(start_page),
>> + pageblock_nr_pages, new_mt, old_mt);
>> + return pageblock_nr_pages;
>> + }
>
> i.e. here ^: set the mt of the block that's in isolation range, then
> split the block.
>
> I think I can guess the warning you were getting: in the head case, we
> need to change the type of the head pageblock that's on the
> freelist. If we do it before calling split, the
> del_page_from_freelist() in there warns about the wrong type.
>
> How about pulling the freelist removal out of split_free_page()?
>
> del_page_from_freelist(huge_buddy);
> set_pageblock_migratetype(start_page, MIGRATE_ISOLATE);
> split_free_page(huge_buddy, buddy_order(), pageblock_nr_pages);
> return pageblock_nr_pages;
Yes, this is better. Let me change to this implementation.
But I would like to test it on an environment where a buddy contains more than
one pageblocks first. I probably can change MAX_ORDER of x86_64 to do it locally.
I will report back.
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
next prev parent reply other threads:[~2023-10-16 15:00 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 19:41 Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59 ` Zi Yan
2023-09-11 21:09 ` Andrew Morton
2023-09-12 13:47 ` Vlastimil Babka
2023-09-12 14:50 ` Johannes Weiner
2023-09-13 9:33 ` Vlastimil Babka
2023-09-13 13:24 ` Johannes Weiner
2023-09-13 13:34 ` Vlastimil Babka
2023-09-12 15:03 ` Johannes Weiner
2023-09-14 7:29 ` Vlastimil Babka
2023-09-14 9:56 ` Mel Gorman
2023-09-27 5:42 ` Huang, Ying
2023-09-27 14:51 ` Johannes Weiner
2023-09-30 4:26 ` Huang, Ying
2023-10-02 14:58 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01 ` Zi Yan
2023-09-13 9:52 ` Vlastimil Babka
2023-09-14 10:00 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17 ` Zi Yan
2023-09-11 20:47 ` Johannes Weiner
2023-09-11 20:50 ` Zi Yan
2023-09-13 14:31 ` Vlastimil Babka
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23 ` Zi Yan
2023-09-13 14:40 ` Vlastimil Babka
2023-09-14 13:37 ` Johannes Weiner
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52 ` Vlastimil Babka
2023-09-14 14:47 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18 ` Vlastimil Babka
2023-09-14 4:11 ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16 ` Johannes Weiner
2023-09-15 15:05 ` Mike Kravetz
2023-09-16 19:57 ` Mike Kravetz
2023-09-16 20:13 ` Andrew Morton
2023-09-18 7:16 ` Vlastimil Babka
2023-09-18 14:52 ` Johannes Weiner
2023-09-18 17:40 ` Mike Kravetz
2023-09-19 6:49 ` Johannes Weiner
2023-09-19 12:37 ` Zi Yan
2023-09-19 15:22 ` Zi Yan
2023-09-19 18:47 ` Mike Kravetz
2023-09-19 20:57 ` Zi Yan
2023-09-20 0:32 ` Mike Kravetz
2023-09-20 1:38 ` Zi Yan
2023-09-20 6:07 ` Vlastimil Babka
2023-09-20 13:48 ` Johannes Weiner
2023-09-20 16:04 ` Johannes Weiner
2023-09-20 17:23 ` Zi Yan
2023-09-21 2:31 ` Zi Yan
2023-09-21 10:19 ` David Hildenbrand
2023-09-21 14:47 ` Zi Yan
2023-09-25 21:12 ` Zi Yan
2023-09-26 17:39 ` Johannes Weiner
2023-09-28 2:51 ` Zi Yan
2023-10-03 2:26 ` Zi Yan
2023-10-10 21:12 ` Johannes Weiner
2023-10-11 15:25 ` Johannes Weiner
2023-10-11 15:45 ` Johannes Weiner
2023-10-11 15:57 ` Zi Yan
2023-10-13 0:06 ` Zi Yan
2023-10-13 14:51 ` Zi Yan
2023-10-16 13:35 ` Zi Yan
2023-10-16 14:37 ` Johannes Weiner
2023-10-16 15:00 ` Zi Yan [this message]
2023-10-16 18:51 ` Johannes Weiner
2023-10-16 19:49 ` Zi Yan
2023-10-16 20:26 ` Johannes Weiner
2023-10-16 20:39 ` Johannes Weiner
2023-10-16 20:48 ` Zi Yan
2023-09-26 18:19 ` David Hildenbrand
2023-09-28 3:22 ` Zi Yan
2023-10-02 11:43 ` David Hildenbrand
2023-10-03 2:35 ` Zi Yan
2023-09-18 7:07 ` Vlastimil Babka
2023-09-18 14:09 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5B61B70F-E85D-4E6F-8856-17A90B899F98@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mike.kravetz@oracle.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox