Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock
Date: Wed, 11 Jan 2023 14:21:44 +0000	[thread overview]
Message-ID: <20230111142144.2tfbyxnqgupbkzug@techsingularity.net> (raw)
In-Reply-To: <a032e3bf-8470-095a-2262-791f5678e590@suse.cz>

On Mon, Jan 09, 2023 at 08:43:40PM +0100, Vlastimil Babka wrote:
> On 7/19/22 10:28, Mel Gorman wrote:
> > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote:
> >> You are right, we do need some machenism to ensure mark skipped after
> >> scanned a !IS_ALIGNED block from fast_find. However, I think the following
> >> code may not working. Because *skip_updated* has been reset:
> >>     if (!skip_updated) {
> >>         skip_updated = true;
> >>         if (test_and_set_skip(cc, page, low_pfn))
> >>             goto isolate_abort;
> >>     }
> >> Why not ignore skip_updated after scanned a block, just like this:
> >> 
> >> diff --git a/mm/compaction.c b/mm/compaction.c
> >> index 962d05d1e187..1c388c45f127 100644
> >> --- a/mm/compaction.c
> >> +++ b/mm/compaction.c
> >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc,
> >> unsigned long low_pfn,
> >>          * rescanned twice in a row.
> >>          */
> >>         if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
> >> -               if (valid_page && !skip_updated)
> >> +               if (valid_page)
> >>                         set_pageblock_skip(valid_page);
> >>                 update_cached_migrate(cc, low_pfn);
> >>         }
> >> 
> > 
> > Because valid_page still needs to be set but ensuring valid_page is set
> > is sufficient.
> > 
> > The pageblock will be marked for skip on the first locking of a lruvec
> > 
> >                         /* Try get exclusive access under lock */
> >                         if (!skip_updated) {
> >                                 skip_updated = true;
> >                                 if (test_and_set_skip(cc, page, low_pfn))
> >                                         goto isolate_abort;
> >                         }
> > 
> > That will happen regardless of alignment.
> 
> I'm not sure about that as test_and_set_skip() has "if
> (!pageblock_aligned(pfn)) return false". So I think it only takes for the
> first pfn in pageblock to not be PageLRU and we don't try to get the
> exclusive access for it, and the following pfn's will not be aligned, and
> thus we never set_pageblock_skip() through this path. If we have nr_isolated
> > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk
> above neither.
> 

True.

> I've been checking this area because we have some reports [1] for upstream
> 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction.
> Tracepoint data suggests we keep (successfully) isolating migratepages over
> and over through fast_find_migrateblock() (the pfn ranges are not linearly
> increasing but there's a cycle probably as we rotate the freelist), but are
> failing to migrate them (for some reason). Between 6.0 and 6.1 there's this
> patch as commit 7efc3b726103 so I strongly suspect it's the root cause and
> will be providing a kernel with revert to in the bug to confirm.
> Consider this an early heads up :)
> 
> [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848

The trace shows that there is a lot of rescanning between a small subset
of pageblocks because the conditions for marking the block skip are not
met. The scan is not reaching the end of the pageblock because enough
pages were isolated, but none were migrated successfully. Eventually it
circles back to the same block.

Pageblock skipping is tricky because we are trying to minimise both
latency and excessive scanning. However, tracking exactly when a block
is fully scanned requires an excessive amount of state.

One potential mitigating factor is to forcibly scan a pageblock when
migrations fail even though it could be for transient reasons such as
page writeback or page dirty. This will sometimes migrate too many pages
(worst case 500) but pageblocks will be marked skip.

This is not tested at all

diff --git a/mm/compaction.c b/mm/compaction.c
index ca1603524bbe..9242fcaeb09c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2387,6 +2387,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 			cc->rescan = true;
 		}
 
+retry_rescan:
 		switch (isolate_migratepages(cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
@@ -2429,15 +2430,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 				goto out;
 			}
 			/*
-			 * We failed to migrate at least one page in the current
-			 * order-aligned block, so skip the rest of it.
+			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
+			 * within the current order-aligned block, rescan the
+			 * entire block. Once the whole block is scanned, it'll
+			 * be marked "skip" to avoid rescanning in the near
+			 * future. This will isolate more pages than necessary
+			 * for the request but avoid loops due to
+			 * fast_find_migrateblock revisiting blocks that were
+			 * recently partially scanned.
 			 */
-			if (cc->direct_compaction &&
-						(cc->mode == MIGRATE_ASYNC)) {
-				cc->migrate_pfn = block_end_pfn(
-						cc->migrate_pfn - 1, cc->order);
-				/* Draining pcplists is useless in this case */
-				last_migrated_pfn = 0;
+			if (cc->direct_compaction && !cc->rescan &&
+						(cc->mode < MIGRATE_SYNC)) {
+				cc->rescan = true;
+
+				/*
+				 * Draining pcplists does not help THP if
+				 * any page failed to migrate. Even after
+				 * drain, the pageblock will not be free.
+				 */
+				if (cc->order == HUGETLB_PAGE_ORDER)
+					last_migrated_pfn = 0;
+
+				goto retry_rescan;
 			}
 		}

next prev parent reply	other threads:[~2023-01-11 14:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13  6:20 Chuyi Zhou
2022-07-13 15:28 ` Andrew Morton
2022-07-14 11:50   ` Mel Gorman
2022-07-15 15:26     ` Chuyi Zhou
2022-07-19  8:28       ` Mel Gorman
2022-08-15  1:24         ` Andrew Morton
     [not found]           ` <70a434b2-7f1c-7fad-a7b7-cb038a13fd2c@bytedance.com>
2022-08-15  3:22             ` Chuyi Zhou
     [not found]         ` <7d2bbb38-a96c-8212-8f89-915cd2c8668f@bytedance.com>
2022-08-17  3:10           ` Chuyi Zhou
2023-01-09 19:43         ` Vlastimil Babka
2023-01-11 14:21           ` Mel Gorman [this message]
2023-01-13 17:13             ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230111142144.2tfbyxnqgupbkzug@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox