From: Vlastimil Babka <vbabka@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>,
"Huang, Ying" <ying.huang@intel.com>,
David Hildenbrand <david@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 06/10] mm: page_alloc: fix freelist movement during block conversion
Date: Tue, 26 Mar 2024 12:28:37 +0100 [thread overview]
Message-ID: <a0879316-31de-4fec-ad1f-caabbfff2e48@suse.cz> (raw)
In-Reply-To: <20240320180429.678181-7-hannes@cmpxchg.org>
On 3/20/24 7:02 PM, Johannes Weiner wrote:
> Currently, page block type conversion during fallbacks, atomic
> reservations and isolation can strand various amounts of free pages on
> incorrect freelists.
>
> For example, fallback stealing moves free pages in the block to the
> new type's freelists, but then may not actually claim the block for
> that type if there aren't enough compatible pages already allocated.
>
> In all cases, free page moving might fail if the block straddles more
> than one zone, in which case no free pages are moved at all, but the
> block type is changed anyway.
>
> This is detrimental to type hygiene on the freelists. It encourages
> incompatible page mixing down the line (ask for one type, get another)
> and thus contributes to long-term fragmentation.
>
> Split the process into a proper transaction: check first if conversion
> will happen, then try to move the free pages, and only if that was
> successful convert the block to the new type.
>
> Tested-by: "Huang, Ying" <ying.huang@intel.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Nit below:
> @@ -1743,33 +1770,37 @@ static inline bool boost_watermark(struct zone *zone)
> }
>
> /*
> - * This function implements actual steal behaviour. If order is large enough,
> - * we can steal whole pageblock. If not, we first move freepages in this
> - * pageblock to our migratetype and determine how many already-allocated pages
> - * are there in the pageblock with a compatible migratetype. If at least half
> - * of pages are free or compatible, we can change migratetype of the pageblock
> - * itself, so pages freed in the future will be put on the correct free list.
> + * This function implements actual steal behaviour. If order is large enough, we
> + * can claim the whole pageblock for the requested migratetype. If not, we check
> + * the pageblock for constituent pages; if at least half of the pages are free
> + * or compatible, we can still claim the whole block, so pages freed in the
> + * future will be put on the correct free list. Otherwise, we isolate exactly
> + * the order we need from the fallback block and leave its migratetype alone.
> */
> -static void steal_suitable_fallback(struct zone *zone, struct page *page,
> - unsigned int alloc_flags, int start_type, bool whole_block)
> +static struct page *
> +steal_suitable_fallback(struct zone *zone, struct page *page,
> + int current_order, int order, int start_type,
> + unsigned int alloc_flags, bool whole_block)
> {
> - unsigned int current_order = buddy_order(page);
> int free_pages, movable_pages, alike_pages;
> - int old_block_type;
> + unsigned long start_pfn, end_pfn;
> + int block_type;
>
> - old_block_type = get_pageblock_migratetype(page);
> + block_type = get_pageblock_migratetype(page);
>
> /*
> * This can happen due to races and we want to prevent broken
> * highatomic accounting.
> */
> - if (is_migrate_highatomic(old_block_type))
> + if (is_migrate_highatomic(block_type))
> goto single_page;
>
> /* Take ownership for orders >= pageblock_order */
> if (current_order >= pageblock_order) {
> + del_page_from_free_list(page, zone, current_order);
> change_pageblock_range(page, current_order, start_type);
> - goto single_page;
> + expand(zone, page, order, current_order, start_type);
> + return page;
Is the exact order here important (AFAIK shouldn't be?) or we could just
change_pageblock_range(); block_type = start_type; goto single_page?
> }
>
> /*
> @@ -1784,10 +1815,9 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
> if (!whole_block)
> goto single_page;
>
> - free_pages = move_freepages_block(zone, page, start_type,
> - &movable_pages);
> /* moving whole block can fail due to zone boundary conditions */
> - if (!free_pages)
> + if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn,
> + &free_pages, &movable_pages))
> goto single_page;
>
> /*
> @@ -1805,7 +1835,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
> * vice versa, be conservative since we can't distinguish the
> * exact migratetype of non-movable pages.
> */
> - if (old_block_type == MIGRATE_MOVABLE)
> + if (block_type == MIGRATE_MOVABLE)
> alike_pages = pageblock_nr_pages
> - (free_pages + movable_pages);
> else
> @@ -1816,13 +1846,16 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
> * compatible migratability as our allocation, claim the whole block.
> */
> if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
> - page_group_by_mobility_disabled)
> + page_group_by_mobility_disabled) {
> + move_freepages(zone, start_pfn, end_pfn, start_type);
> set_pageblock_migratetype(page, start_type);
> -
> - return;
> + return __rmqueue_smallest(zone, order, start_type);
> + }
>
> single_page:
> - move_to_free_list(page, zone, current_order, start_type);
> + del_page_from_free_list(page, zone, current_order);
> + expand(zone, page, order, current_order, block_type);
> + return page;
> }
next prev parent reply other threads:[~2024-03-26 11:28 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-20 18:02 [PATCH V4 00/10] mm: page_alloc: freelist migratetype hygiene Johannes Weiner
2024-03-20 18:02 ` [PATCH 01/10] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2024-03-20 18:02 ` [PATCH 02/10] mm: page_alloc: optimize free_unref_folios() Johannes Weiner
2024-03-25 15:56 ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 03/10] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2024-03-20 18:02 ` [PATCH 04/10] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2024-03-20 18:02 ` [PATCH 05/10] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2024-03-25 16:22 ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 06/10] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2024-03-26 11:28 ` Vlastimil Babka [this message]
2024-03-26 12:34 ` Johannes Weiner
2024-04-05 12:11 ` Baolin Wang
2024-04-05 16:56 ` Johannes Weiner
2024-04-07 6:58 ` Baolin Wang
2024-04-08 7:24 ` Vlastimil Babka
2024-04-09 6:21 ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 07/10] mm: page_alloc: close migratetype race between freeing and stealing Johannes Weiner
2024-03-26 15:25 ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 08/10] mm: page_alloc: set migratetype inside move_freepages() Johannes Weiner
2024-03-26 15:40 ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists Johannes Weiner
2024-03-21 13:13 ` kernel test robot
2024-03-21 14:24 ` Johannes Weiner
2024-03-21 15:03 ` Zi Yan
2024-03-27 8:06 ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 10/10] mm: page_alloc: consolidate free page accounting Johannes Weiner
2024-03-27 8:54 ` Vlastimil Babka
2024-03-27 14:32 ` Johannes Weiner
2024-03-27 18:57 ` [PATCH 1/3] mm: page_alloc: consolidate free page accounting fix Johannes Weiner
2024-03-27 18:58 ` [PATCH 2/3] mm: page_alloc: consolidate free page accounting fix 2 Johannes Weiner
2024-03-27 19:01 ` [PATCH 3/3] mm: page_alloc: batch vmstat updates in expand() Johannes Weiner
2024-03-27 20:35 ` Vlastimil Babka
2024-04-07 10:19 ` [PATCH 10/10] mm: page_alloc: consolidate free page accounting Baolin Wang
2024-04-08 7:38 ` Vlastimil Babka
2024-04-08 9:13 ` Baolin Wang
2024-04-08 14:23 ` Johannes Weiner
2024-04-09 6:23 ` Vlastimil Babka
2024-04-09 7:48 ` [PATCH] mm: page_alloc: consolidate free page accounting fix 3 Baolin Wang
2024-04-09 21:15 ` kernel test robot
2024-04-09 22:36 ` Johannes Weiner
2024-04-09 21:25 ` kernel test robot
2024-04-09 7:56 ` [PATCH 10/10] mm: page_alloc: consolidate free page accounting Baolin Wang
2024-04-09 8:41 ` Vlastimil Babka
2024-04-09 9:31 ` Baolin Wang
2024-04-09 14:46 ` Zi Yan
2024-04-10 8:49 ` Baolin Wang
2024-03-27 9:30 ` [PATCH V4 00/10] mm: page_alloc: freelist migratetype hygiene Vlastimil Babka
2024-03-27 13:10 ` Zi Yan
2024-03-27 14:29 ` Johannes Weiner
2024-04-08 9:30 ` Baolin Wang
2024-04-08 14:24 ` Johannes Weiner
2024-05-11 5:14 ` Yu Zhao
2024-05-13 16:03 ` Johannes Weiner
2024-05-13 18:10 ` Yu Zhao
2024-05-13 19:04 ` Johannes Weiner
2024-06-05 4:53 ` Yu Zhao
2024-06-10 15:28 ` Johannes Weiner
2024-06-12 18:52 ` Yu Zhao
2024-06-13 15:39 ` Johannes Weiner
-- strict thread matches above, loose matches on Subject: below --
2024-03-06 4:08 [PATCH V3 01/10] " Johannes Weiner
2024-03-06 4:08 ` [PATCH 06/10] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0879316-31de-4fec-ad1f-caabbfff2e48@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=ying.huang@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox