From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32F76C5479D for ; Wed, 11 Jan 2023 14:21:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A19958E0002; Wed, 11 Jan 2023 09:21:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C9B38E0001; Wed, 11 Jan 2023 09:21:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 892C08E0002; Wed, 11 Jan 2023 09:21:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7AAB38E0001 for ; Wed, 11 Jan 2023 09:21:50 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 566BC160110 for ; Wed, 11 Jan 2023 14:21:50 +0000 (UTC) X-FDA: 80342731980.11.4A6825C Received: from outbound-smtp14.blacknight.com (outbound-smtp14.blacknight.com [46.22.139.231]) by imf22.hostedemail.com (Postfix) with ESMTP id 85090C0019 for ; Wed, 11 Jan 2023 14:21:48 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.231 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673446908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VRfpAYuCGjsZjABELq4W7I0BYA7qnbORD+KdLS8K2W4=; b=cjvdwHm/SNyibmvV+XcO4y/p26nHE+EpTJT60s1FCoMdo4VjW9E2A/CAe/qiE8rGDIYPMH 21PeW6sOaNZfOFT2FeFLVHzbziWhF1VcvS/umnOVzUkBDJvwQp33/jYOlJoCoY34rHYj7X L2a0xoGlpkvjRkphYhgLpYGUrLWdUy4= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.231 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673446908; a=rsa-sha256; cv=none; b=IcD33qFGcChYSlc2CKiBfyYaxrcI5w8k0yE3SMUVORDW3IEbfvCOCGu9iX2JJswSKvicqc Qq4VcDL2j6KaRTB1wS68uSjQ0j1EOv7Ph6byQW90jQFyKiRTf98MT2mLUnzeCIioCwOv/s YuORiHlweEjVTCgKK0K4vL/N24IBJ8c= Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp14.blacknight.com (Postfix) with ESMTPS id B88791C3C2B for ; Wed, 11 Jan 2023 14:21:46 +0000 (GMT) Received: (qmail 8457 invoked from network); 11 Jan 2023 14:21:46 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 11 Jan 2023 14:21:46 -0000 Date: Wed, 11 Jan 2023 14:21:44 +0000 From: Mel Gorman To: Vlastimil Babka Cc: Chuyi Zhou , Andrew Morton , linux-mm@kvack.org Subject: Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock Message-ID: <20230111142144.2tfbyxnqgupbkzug@techsingularity.net> References: <20220713062009.597255-1-zhouchuyi@bytedance.com> <20220713082814.bed234e00d7f5ceb3858352a@linux-foundation.org> <20220714115020.GA3563@techsingularity.net> <3529e159-8c22-5d67-5e5d-c912784df9a0@bytedance.com> <20220719082854.GB3563@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 85090C0019 X-Rspam-User: X-Stat-Signature: kbgo1xjuk6wwb4h4pqqdzc7nmhkp7qsa X-HE-Tag: 1673446908-878635 X-HE-Meta: U2FsdGVkX18psw2OwvtjFpW4DdV7WChc+eu0XNoNwcf74cfJMEBZq7a138lnWoZ4omJBhqc3U1klfeiMU3dz0X/+SjAT3Ds10VexlfSwFfZXz/GJWt20pW9urGcfrWQWi244CLSjvvaSGLYo3NR333dxMd4Kv1el5s9LTC5iyjPIE4SNG+/1l3ix04FVF4TUsVaeqguFy4tftSu2RskG2dJMj4Mh1AmRCOXRGD0OVawiHJETDjXLyhD+d2pkHZcdsc7EpyOXL3bRtLo4354v7+/nafwJLDkt6c+rBTC2zRAfc9TQKpPkc3SYih3q4GSXiM3QpK6Uy57pFkKc/GiHQKg+LXLe2FNAnHIytTCDteGdm9lOGDcp0Pu+5zJ/Mxjs1yAak5iUZt2zq6jp3O3AW1cVyzTWKyOEbzfIACaCC6xFp4l+Bg3s9ld/8ohYJHlFYH4eCoQDUh5hHEJD3blQ9GTKNpWQcWylfLKL5981kGR9C1wZ2KATJpYwGrX413VdVb4gm8hs6neVGKUO8FZHlm+ZGETtmcI70XF46Y1R8gDZVvfSaYRouQoeh8vM0VvU6fgMNlawTYA25FroVBM7OCE5cIM6qIhLF+0FRY0nzcF4ig7BWGpGVajZMmp61TIbHOuCwZcxAflFQoZFaCxFJuf8V4JeydvfQfISvOuZB/V0gWpeOpo5y6a2qOTAXnLCt2wj5TKHqcuLo+gUOamAJtD1jmQhKBIEq2duKOljjy9XQTYr67cm9kBR15T9Vpr0WfaZflU1vy4qJk8Orb+0X9xpRLB0z34ZpmPNMtZ87+9DhODw3Pa83q93TpyyEDuXtk/wX9Ds8WsHmudEwN44zx4t+Rx7thLWG3GX3h8M7FRhbCQIrc1KUJ7+vufQl6Wv0GdXmZK7nS0kMgRQIgEaeq/zsQLcILhn6jXawvpsqii2pdwngyfPqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 09, 2023 at 08:43:40PM +0100, Vlastimil Babka wrote: > On 7/19/22 10:28, Mel Gorman wrote: > > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote: > >> You are right, we do need some machenism to ensure mark skipped after > >> scanned a !IS_ALIGNED block from fast_find. However, I think the following > >> code may not working. Because *skip_updated* has been reset: > >> if (!skip_updated) { > >> skip_updated = true; > >> if (test_and_set_skip(cc, page, low_pfn)) > >> goto isolate_abort; > >> } > >> Why not ignore skip_updated after scanned a block, just like this: > >> > >> diff --git a/mm/compaction.c b/mm/compaction.c > >> index 962d05d1e187..1c388c45f127 100644 > >> --- a/mm/compaction.c > >> +++ b/mm/compaction.c > >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc, > >> unsigned long low_pfn, > >> * rescanned twice in a row. > >> */ > >> if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { > >> - if (valid_page && !skip_updated) > >> + if (valid_page) > >> set_pageblock_skip(valid_page); > >> update_cached_migrate(cc, low_pfn); > >> } > >> > > > > Because valid_page still needs to be set but ensuring valid_page is set > > is sufficient. > > > > The pageblock will be marked for skip on the first locking of a lruvec > > > > /* Try get exclusive access under lock */ > > if (!skip_updated) { > > skip_updated = true; > > if (test_and_set_skip(cc, page, low_pfn)) > > goto isolate_abort; > > } > > > > That will happen regardless of alignment. > > I'm not sure about that as test_and_set_skip() has "if > (!pageblock_aligned(pfn)) return false". So I think it only takes for the > first pfn in pageblock to not be PageLRU and we don't try to get the > exclusive access for it, and the following pfn's will not be aligned, and > thus we never set_pageblock_skip() through this path. If we have nr_isolated > > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk > above neither. > True. > I've been checking this area because we have some reports [1] for upstream > 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction. > Tracepoint data suggests we keep (successfully) isolating migratepages over > and over through fast_find_migrateblock() (the pfn ranges are not linearly > increasing but there's a cycle probably as we rotate the freelist), but are > failing to migrate them (for some reason). Between 6.0 and 6.1 there's this > patch as commit 7efc3b726103 so I strongly suspect it's the root cause and > will be providing a kernel with revert to in the bug to confirm. > Consider this an early heads up :) > > [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848 The trace shows that there is a lot of rescanning between a small subset of pageblocks because the conditions for marking the block skip are not met. The scan is not reaching the end of the pageblock because enough pages were isolated, but none were migrated successfully. Eventually it circles back to the same block. Pageblock skipping is tricky because we are trying to minimise both latency and excessive scanning. However, tracking exactly when a block is fully scanned requires an excessive amount of state. One potential mitigating factor is to forcibly scan a pageblock when migrations fail even though it could be for transient reasons such as page writeback or page dirty. This will sometimes migrate too many pages (worst case 500) but pageblocks will be marked skip. This is not tested at all diff --git a/mm/compaction.c b/mm/compaction.c index ca1603524bbe..9242fcaeb09c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2387,6 +2387,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) cc->rescan = true; } +retry_rescan: switch (isolate_migratepages(cc)) { case ISOLATE_ABORT: ret = COMPACT_CONTENDED; @@ -2429,15 +2430,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) goto out; } /* - * We failed to migrate at least one page in the current - * order-aligned block, so skip the rest of it. + * If an ASYNC or SYNC_LIGHT fails to migrate a page + * within the current order-aligned block, rescan the + * entire block. Once the whole block is scanned, it'll + * be marked "skip" to avoid rescanning in the near + * future. This will isolate more pages than necessary + * for the request but avoid loops due to + * fast_find_migrateblock revisiting blocks that were + * recently partially scanned. */ - if (cc->direct_compaction && - (cc->mode == MIGRATE_ASYNC)) { - cc->migrate_pfn = block_end_pfn( - cc->migrate_pfn - 1, cc->order); - /* Draining pcplists is useless in this case */ - last_migrated_pfn = 0; + if (cc->direct_compaction && !cc->rescan && + (cc->mode < MIGRATE_SYNC)) { + cc->rescan = true; + + /* + * Draining pcplists does not help THP if + * any page failed to migrate. Even after + * drain, the pageblock will not be free. + */ + if (cc->order == HUGETLB_PAGE_ORDER) + last_migrated_pfn = 0; + + goto retry_rescan; } }