linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
Date: Wed, 20 Sep 2023 13:23:18 -0400	[thread overview]
Message-ID: <762CA634-053A-41DD-8ED7-895374640858@nvidia.com> (raw)
In-Reply-To: <20230920160400.GC124289@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 8823 bytes --]

On 20 Sep 2023, at 12:04, Johannes Weiner wrote:

> On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote:
>> On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote:
>>> On 9/20/23 03:38, Zi Yan wrote:
>>>> On 19 Sep 2023, at 20:32, Mike Kravetz wrote:
>>>>
>>>>> On 09/19/23 16:57, Zi Yan wrote:
>>>>>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote:
>>>>>>
>>>>>>> 	--- a/mm/page_alloc.c
>>>>>>> 	+++ b/mm/page_alloc.c
>>>>>>> 	@@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
>>>>>>>  		end = pageblock_end_pfn(pfn) - 1;
>>>>>>>
>>>>>>>  		/* Do not cross zone boundaries */
>>>>>>> 	+#if 0
>>>>>>>  		if (!zone_spans_pfn(zone, start))
>>>>>>> 			start = zone->zone_start_pfn;
>>>>>>> 	+#else
>>>>>>> 	+	if (!zone_spans_pfn(zone, start))
>>>>>>> 	+		start = pfn;
>>>>>>> 	+#endif
>>>>>>> 	 	if (!zone_spans_pfn(zone, end))
>>>>>>> 	 		return false;
>>>>>>> 	I can still trigger warnings.
>>>>>>
>>>>>> OK. One thing to note is that the page type in the warning changed from
>>>>>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change.
>>>>>>
>>>>>
>>>>> Just to be really clear,
>>>>> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path.
>>>>> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call
>>>>>   path WITHOUT your change.
>>>>>
>>>>> I am guessing the difference here has more to do with the allocation path?
>>>>>
>>>>> I went back and reran focusing on the specific migrate type.
>>>>> Without your patch, and coming from the alloc_contig_range call path,
>>>>> I got two warnings of 'page type is 0, passed migratetype is 1' as above.
>>>>> With your patch I got one 'page type is 0, passed migratetype is 1'
>>>>> warning and one 'page type is 1, passed migratetype is 0' warning.
>>>>>
>>>>> I could be wrong, but I do not think your patch changes things.
>>>>
>>>> Got it. Thanks for the clarification.
>>>>>
>>>>>>>
>>>>>>> One idea about recreating the issue is that it may have to do with size
>>>>>>> of my VM (16G) and the requested allocation sizes 4G.  However, I tried
>>>>>>> to really stress the allocations by increasing the number of hugetlb
>>>>>>> pages requested and that did not help.  I also noticed that I only seem
>>>>>>> to get two warnings and then they stop, even if I continue to run the
>>>>>>> script.
>>>>>>>
>>>>>>> Zi asked about my config, so it is attached.
>>>>>>
>>>>>> With your config, I still have no luck reproducing the issue. I will keep
>>>>>> trying. Thanks.
>>>>>>
>>>>>
>>>>> Perhaps try running both scripts in parallel?
>>>>
>>>> Yes. It seems to do the trick.
>>>>
>>>>> Adjust the number of hugetlb pages allocated to equal 25% of memory?
>>>>
>>>> I am able to reproduce it with the script below:
>>>>
>>>> while true; do
>>>>  echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages&
>>>>  echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages&
>>>>  wait
>>>>  echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>>>>  echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>>> done
>>>>
>>>> I will look into the issue.
>>
>> Nice!
>>
>> I managed to reproduce it ONCE, triggering it not even a second after
>> starting the script. But I can't seem to do it twice, even after
>> several reboots and letting it run for minutes.
>
> I managed to reproduce it reliably by cutting the nr_hugepages
> parameters respectively in half.
>
> The one that triggers for me is always MIGRATE_ISOLATE. With some
> printk-tracing, the scenario seems to be this:
>
> #0                                                   #1
> start_isolate_page_range()
>   isolate_single_pageblock()
>     set_migratetype_isolate(tail)
>       lock zone->lock
>       move_freepages_block(tail) // nop
>       set_pageblock_migratetype(tail)
>       unlock zone->lock
>                                                      del_page_from_freelist(head)
>                                                      expand(head, head_mt)
>                                                        WARN(head_mt != tail_mt)
>     start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES)
>     for (pfn = start_pfn, pfn < end_pfn)
>       if (PageBuddy())
>         split_free_page(head)
>
> IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the
> lock. The move_freepages_block() does nothing because the PageBuddy()
> is set on the pageblock to the left. Once we drop the lock, the buddy
> gets allocated and the expand() puts things on the wrong list. The
> splitting code that handles MAX_ORDER blocks runs *after* the tail
> type is set and the lock has been dropped, so it's too late.

Yes, this is the issue I can confirm as well. But it is intentional to enable
allocating a contiguous range at pageblock granularity instead of MAX_ORDER
granularity. With your changes below, it no longer works, because if there
is an unmovable page in
[ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)),
the allocation fails but it would succeed in current implementation.

I think a proper fix would be to make move_freepages_block() split the
MAX_ORDER page and put the split pages in the right migratetype free lists.

I am working on that.

>
> I think this would work fine if we always set MIGRATE_ISOLATE in a
> linear fashion, with start and end aligned to MAX_ORDER. Then we also
> wouldn't have to split things.
>
> There are two reasons this doesn't happen today:
>
> 1. The isolation range is rounded to pageblocks, not MAX_ORDER. In
>    this test case they always seem aligned, but it's not
>    guaranteed. However,
>
> 2. start_isolate_page_range() explicitly breaks ordering by doing the
>    last block in the range before the center. It's that last block
>    that triggers the race with __rmqueue_smallest -> expand() for me.
>
> With the below patch I can no longer reproduce the issue:
>
> ---
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index b5c7a9d21257..b7c8730bf0e2 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -538,8 +538,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	unsigned long pfn;
>  	struct page *page;
>  	/* isolation is done at page block granularity */
> -	unsigned long isolate_start = pageblock_start_pfn(start_pfn);
> -	unsigned long isolate_end = pageblock_align(end_pfn);
> +	unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES);
> +	unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES);
>  	int ret;
>  	bool skip_isolation = false;
>
> @@ -549,17 +549,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	if (ret)
>  		return ret;
>
> -	if (isolate_start == isolate_end - pageblock_nr_pages)
> -		skip_isolation = true;
> -
> -	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> -	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true,
> -			skip_isolation, migratetype);
> -	if (ret) {
> -		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
> -		return ret;
> -	}
> -
>  	/* skip isolated pageblocks at the beginning and end */
>  	for (pfn = isolate_start + pageblock_nr_pages;
>  	     pfn < isolate_end - pageblock_nr_pages;
> @@ -568,12 +557,21 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  		if (page && set_migratetype_isolate(page, migratetype, flags,
>  					start_pfn, end_pfn)) {
>  			undo_isolate_page_range(isolate_start, pfn, migratetype);
> -			unset_migratetype_isolate(
> -				pfn_to_page(isolate_end - pageblock_nr_pages),
> -				migratetype);
>  			return -EBUSY;
>  		}
>  	}
> +
> +	if (isolate_start == isolate_end - pageblock_nr_pages)
> +		skip_isolation = true;
> +
> +	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> +	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true,
> +			skip_isolation, migratetype);
> +	if (ret) {
> +		undo_isolate_page_range(isolate_start, pfn, migratetype);
> +		return ret;
> +	}
> +
>  	return 0;
>  }
>
> @@ -591,8 +589,8 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  {
>  	unsigned long pfn;
>  	struct page *page;
> -	unsigned long isolate_start = pageblock_start_pfn(start_pfn);
> -	unsigned long isolate_end = pageblock_align(end_pfn);
> +	unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES);
> +	unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES);
>
>  	for (pfn = isolate_start;
>  	     pfn < isolate_end;


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2023-09-20 17:23 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-11 19:41 Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59   ` Zi Yan
2023-09-11 21:09     ` Andrew Morton
2023-09-12 13:47   ` Vlastimil Babka
2023-09-12 14:50     ` Johannes Weiner
2023-09-13  9:33       ` Vlastimil Babka
2023-09-13 13:24         ` Johannes Weiner
2023-09-13 13:34           ` Vlastimil Babka
2023-09-12 15:03     ` Johannes Weiner
2023-09-14  7:29       ` Vlastimil Babka
2023-09-14  9:56   ` Mel Gorman
2023-09-27  5:42   ` Huang, Ying
2023-09-27 14:51     ` Johannes Weiner
2023-09-30  4:26       ` Huang, Ying
2023-10-02 14:58         ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01   ` Zi Yan
2023-09-13  9:52   ` Vlastimil Babka
2023-09-14 10:00   ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17   ` Zi Yan
2023-09-11 20:47     ` Johannes Weiner
2023-09-11 20:50       ` Zi Yan
2023-09-13 14:31   ` Vlastimil Babka
2023-09-14 10:03   ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23   ` Zi Yan
2023-09-13 14:40   ` Vlastimil Babka
2023-09-14 13:37     ` Johannes Weiner
2023-09-14 10:03   ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52   ` Vlastimil Babka
2023-09-14 14:47     ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18   ` Vlastimil Babka
2023-09-14  4:11     ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16   ` Johannes Weiner
2023-09-15 15:05     ` Mike Kravetz
2023-09-16 19:57     ` Mike Kravetz
2023-09-16 20:13       ` Andrew Morton
2023-09-18  7:16       ` Vlastimil Babka
2023-09-18 14:52         ` Johannes Weiner
2023-09-18 17:40           ` Mike Kravetz
2023-09-19  6:49             ` Johannes Weiner
2023-09-19 12:37               ` Zi Yan
2023-09-19 15:22                 ` Zi Yan
2023-09-19 18:47               ` Mike Kravetz
2023-09-19 20:57                 ` Zi Yan
2023-09-20  0:32                   ` Mike Kravetz
2023-09-20  1:38                     ` Zi Yan
2023-09-20  6:07                       ` Vlastimil Babka
2023-09-20 13:48                         ` Johannes Weiner
2023-09-20 16:04                           ` Johannes Weiner
2023-09-20 17:23                             ` Zi Yan [this message]
2023-09-21  2:31                               ` Zi Yan
2023-09-21 10:19                                 ` David Hildenbrand
2023-09-21 14:47                                   ` Zi Yan
2023-09-25 21:12                                     ` Zi Yan
2023-09-26 17:39                                       ` Johannes Weiner
2023-09-28  2:51                                         ` Zi Yan
2023-10-03  2:26                                           ` Zi Yan
2023-10-10 21:12                                             ` Johannes Weiner
2023-10-11 15:25                                               ` Johannes Weiner
2023-10-11 15:45                                                 ` Johannes Weiner
2023-10-11 15:57                                                   ` Zi Yan
2023-10-13  0:06                                               ` Zi Yan
2023-10-13 14:51                                                 ` Zi Yan
2023-10-16 13:35                                                   ` Zi Yan
2023-10-16 14:37                                                     ` Johannes Weiner
2023-10-16 15:00                                                       ` Zi Yan
2023-10-16 18:51                                                         ` Johannes Weiner
2023-10-16 19:49                                                           ` Zi Yan
2023-10-16 20:26                                                             ` Johannes Weiner
2023-10-16 20:39                                                               ` Johannes Weiner
2023-10-16 20:48                                                                 ` Zi Yan
2023-09-26 18:19                                     ` David Hildenbrand
2023-09-28  3:22                                       ` Zi Yan
2023-10-02 11:43                                         ` David Hildenbrand
2023-10-03  2:35                                           ` Zi Yan
2023-09-18  7:07     ` Vlastimil Babka
2023-09-18 14:09       ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=762CA634-053A-41DD-8ED7-895374640858@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox