linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock
Date: Fri, 13 Jan 2023 18:13:59 +0100	[thread overview]
Message-ID: <c21ee810-4477-e4fc-630d-9ada9aec9060@suse.cz> (raw)
In-Reply-To: <20230111142144.2tfbyxnqgupbkzug@techsingularity.net>

On 1/11/23 15:21, Mel Gorman wrote:
> On Mon, Jan 09, 2023 at 08:43:40PM +0100, Vlastimil Babka wrote:
>> On 7/19/22 10:28, Mel Gorman wrote:
>> > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote:
>> >> You are right, we do need some machenism to ensure mark skipped after
>> >> scanned a !IS_ALIGNED block from fast_find. However, I think the following
>> >> code may not working. Because *skip_updated* has been reset:
>> >>     if (!skip_updated) {
>> >>         skip_updated = true;
>> >>         if (test_and_set_skip(cc, page, low_pfn))
>> >>             goto isolate_abort;
>> >>     }
>> >> Why not ignore skip_updated after scanned a block, just like this:
>> >> 
>> >> diff --git a/mm/compaction.c b/mm/compaction.c
>> >> index 962d05d1e187..1c388c45f127 100644
>> >> --- a/mm/compaction.c
>> >> +++ b/mm/compaction.c
>> >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc,
>> >> unsigned long low_pfn,
>> >>          * rescanned twice in a row.
>> >>          */
>> >>         if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
>> >> -               if (valid_page && !skip_updated)
>> >> +               if (valid_page)
>> >>                         set_pageblock_skip(valid_page);
>> >>                 update_cached_migrate(cc, low_pfn);
>> >>         }
>> >> 
>> > 
>> > Because valid_page still needs to be set but ensuring valid_page is set
>> > is sufficient.
>> > 
>> > The pageblock will be marked for skip on the first locking of a lruvec
>> > 
>> >                         /* Try get exclusive access under lock */
>> >                         if (!skip_updated) {
>> >                                 skip_updated = true;
>> >                                 if (test_and_set_skip(cc, page, low_pfn))
>> >                                         goto isolate_abort;
>> >                         }
>> > 
>> > That will happen regardless of alignment.
>> 
>> I'm not sure about that as test_and_set_skip() has "if
>> (!pageblock_aligned(pfn)) return false". So I think it only takes for the
>> first pfn in pageblock to not be PageLRU and we don't try to get the
>> exclusive access for it, and the following pfn's will not be aligned, and
>> thus we never set_pageblock_skip() through this path. If we have nr_isolated
>> > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk
>> above neither.
>> 
> 
> True.
> 
>> I've been checking this area because we have some reports [1] for upstream
>> 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction.
>> Tracepoint data suggests we keep (successfully) isolating migratepages over
>> and over through fast_find_migrateblock() (the pfn ranges are not linearly
>> increasing but there's a cycle probably as we rotate the freelist), but are
>> failing to migrate them (for some reason). Between 6.0 and 6.1 there's this
>> patch as commit 7efc3b726103 so I strongly suspect it's the root cause and
>> will be providing a kernel with revert to in the bug to confirm.
>> Consider this an early heads up :)
>> 
>> [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848
> 
> The trace shows that there is a lot of rescanning between a small subset
> of pageblocks because the conditions for marking the block skip are not
> met. The scan is not reaching the end of the pageblock because enough
> pages were isolated, but none were migrated successfully. Eventually it
> circles back to the same block.
> 
> Pageblock skipping is tricky because we are trying to minimise both
> latency and excessive scanning. However, tracking exactly when a block
> is fully scanned requires an excessive amount of state.
> 
> One potential mitigating factor is to forcibly scan a pageblock when
> migrations fail even though it could be for transient reasons such as
> page writeback or page dirty. This will sometimes migrate too many pages
> (worst case 500) but pageblocks will be marked skip.
> 
> This is not tested at all
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ca1603524bbe..9242fcaeb09c 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2387,6 +2387,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			cc->rescan = true;
>  		}
>  
> +retry_rescan:
>  		switch (isolate_migratepages(cc)) {
>  		case ISOLATE_ABORT:
>  			ret = COMPACT_CONTENDED;
> @@ -2429,15 +2430,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  				goto out;
>  			}
>  			/*
> -			 * We failed to migrate at least one page in the current
> -			 * order-aligned block, so skip the rest of it.
> +			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
> +			 * within the current order-aligned block, rescan the
> +			 * entire block. Once the whole block is scanned, it'll
> +			 * be marked "skip" to avoid rescanning in the near
> +			 * future. This will isolate more pages than necessary
> +			 * for the request but avoid loops due to
> +			 * fast_find_migrateblock revisiting blocks that were
> +			 * recently partially scanned.
>  			 */
> -			if (cc->direct_compaction &&
> -						(cc->mode == MIGRATE_ASYNC)) {
> -				cc->migrate_pfn = block_end_pfn(
> -						cc->migrate_pfn - 1, cc->order);
> -				/* Draining pcplists is useless in this case */
> -				last_migrated_pfn = 0;
> +			if (cc->direct_compaction && !cc->rescan &&
> +						(cc->mode < MIGRATE_SYNC)) {
> +				cc->rescan = true;
> +
> +				/*
> +				 * Draining pcplists does not help THP if
> +				 * any page failed to migrate. Even after
> +				 * drain, the pageblock will not be free.
> +				 */
> +				if (cc->order == HUGETLB_PAGE_ORDER)
> +					last_migrated_pfn = 0;
> +
> +				goto retry_rescan;

I think it won't work as expected, because we will not be rescanning the
same pageblock, fast_find_migrateblock() will AFAICS pick a different one,
there's nothing to prevent it.
But the possible compaction state is rather complex for me to be confident
about a quick fix. I think it will be safer to revert the patch for 6.2 and
6.1 stable, and redo it more carefully.

>  			}
>  		}
>  



      reply	other threads:[~2023-01-13 17:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13  6:20 Chuyi Zhou
2022-07-13 15:28 ` Andrew Morton
2022-07-14 11:50   ` Mel Gorman
2022-07-15 15:26     ` Chuyi Zhou
2022-07-19  8:28       ` Mel Gorman
2022-08-15  1:24         ` Andrew Morton
     [not found]           ` <70a434b2-7f1c-7fad-a7b7-cb038a13fd2c@bytedance.com>
2022-08-15  3:22             ` Chuyi Zhou
     [not found]         ` <7d2bbb38-a96c-8212-8f89-915cd2c8668f@bytedance.com>
2022-08-17  3:10           ` Chuyi Zhou
2023-01-09 19:43         ` Vlastimil Babka
2023-01-11 14:21           ` Mel Gorman
2023-01-13 17:13             ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c21ee810-4477-e4fc-630d-9ada9aec9060@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox