From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14D07C54EBD for ; Fri, 13 Jan 2023 17:14:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A769B940009; Fri, 13 Jan 2023 12:14:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A26A28E0001; Fri, 13 Jan 2023 12:14:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C7A6940009; Fri, 13 Jan 2023 12:14:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7D0828E0001 for ; Fri, 13 Jan 2023 12:14:04 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6048DA075A for ; Fri, 13 Jan 2023 17:14:04 +0000 (UTC) X-FDA: 80350423608.18.485537E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf16.hostedemail.com (Postfix) with ESMTP id 180B3180002 for ; Fri, 13 Jan 2023 17:14:00 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=OSDJzr8s; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=oVaPscis; dmarc=none; spf=pass (imf16.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673630041; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dype0CNPKvLC4Fug4nEUlp7tW2azAcSbZV9/hwDOFLk=; b=zGV2HomPrJg3w5OV1/V3l69lvflTdPZ2Vw45YlJrUnapG5FYymQJtn8jpTxD3fY12hwlwb T30Tvv1PC8AxOUuCxqQrriu3x/I84s6w3ekw0FngFwXQ46WYI+YtJFmPdRRtSLvmSqQj0R knlAJ22ae6iLwfQi/4OuaKKKThOEbvc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=OSDJzr8s; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=oVaPscis; dmarc=none; spf=pass (imf16.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673630041; a=rsa-sha256; cv=none; b=W2hP61z6AhuRXakE5t31/fOWjDA+AoZtAFxOhQ1E3N1A9FxWZ9Wxqmh+A9zMp6dRCyOxot 9msyHlSdTmTjiE/i3FImAoo4uvAerOBOl6n8QZwKUWAv8KB2KIpU9l9LwEgnPFQoS2t17r Rr+OJHYdxwWZEXwMmoF3CDEjm3Fv1Qo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9506A22DFF; Fri, 13 Jan 2023 17:13:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1673630039; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dype0CNPKvLC4Fug4nEUlp7tW2azAcSbZV9/hwDOFLk=; b=OSDJzr8sr8C+edSLf6guruvouaCi+AP2rhGnYg5IZ9stdV9mEDf25eGqnWSw9tMs/BZmfu nYCHPTop+SoSfNBRDPwsutBpoM+JkksUMYh8i2HXI9hNgCjoOezc1Q+xK6j725nxpmyUik fC9PBEFz2K6JIIKBLTTPCLEId/ofK0Y= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1673630039; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dype0CNPKvLC4Fug4nEUlp7tW2azAcSbZV9/hwDOFLk=; b=oVaPscisxV0CKB/27b9Bk1kiUjRRSSUnwjve0TpvJiK+aXr/hK2r5TmplH1yMi2CPWMiTM BLEUOux9TuTsSBCw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6C39B13913; Fri, 13 Jan 2023 17:13:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Z7y3GVeRwWOIbAAAMHmgww (envelope-from ); Fri, 13 Jan 2023 17:13:59 +0000 Message-ID: Date: Fri, 13 Jan 2023 18:13:59 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock To: Mel Gorman Cc: Chuyi Zhou , Andrew Morton , linux-mm@kvack.org, Michal Hocko References: <20220713062009.597255-1-zhouchuyi@bytedance.com> <20220713082814.bed234e00d7f5ceb3858352a@linux-foundation.org> <20220714115020.GA3563@techsingularity.net> <3529e159-8c22-5d67-5e5d-c912784df9a0@bytedance.com> <20220719082854.GB3563@techsingularity.net> <20230111142144.2tfbyxnqgupbkzug@techsingularity.net> Content-Language: en-US From: Vlastimil Babka In-Reply-To: <20230111142144.2tfbyxnqgupbkzug@techsingularity.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 180B3180002 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 6rwck3wret8hjsk31at75yfn48b3j93i X-HE-Tag: 1673630040-291514 X-HE-Meta: U2FsdGVkX19f+6aVfuJSZIA1rB4PUE8Wkbyy0AGOP9F3IXpbKmTfPd3uH/Jf/vVnAzG0BcVNTBR+YHfwSFJdt48pHLpfJbRy0xzahRamW69G6+Baz27oSeRH4G+QVWd7oyg0yAzEg9PtvAXresrEuiOgFGZSjFNCsMULOyo6D+id94YLp1WlrgSqaAYKSEI+pyoqjTGvWmRH1eV8gbz1Lw3skylzUKmkqdVS7A1ktSX5ofC9um6hh/6Yhl7MxKJLHZdFo99SWbZjfUYKCSYc2N48AuSorU3yPeHvjagI8FpSc45DcidXZV02amtoj08/pT35xOvG2Jy4v0gUrv1Qt+466Qlo60QJ3iviJumE7w8o7o/gGLGhkX7QZMnSMnO/5Twh7kJ2w5GQ3MelH4QRiZco08sZ62M4TUilJnoOtuhZfBAC68J9izq0ER1t2V7df22N3FVl4t+84VJDpjqCo++5IjUGPjVpP6QciEW4I00I5o1AQ+KQ2tB+y038qFvnrbMfZHfGA6yUAaOB5hlC+lWVuTcD/6CC+BkB3Nzp7sf5XoMxf+beagdOX7X5P7/HHD7r6twpxT1i58x63uDkWfTcRz5wFRIVfRiGJfxppwJ1RwmT+vqbEEGkJI4fyE4GsYvms6ZldDNzh+HkZxGCe45UdW2Rg38mliXzPt6oKAX2s6D7ihOqoAqFpUhJt+SC6Zl/3pzMWwnjVasJEhL23af8KtiU7OEDFv1hf/aNkkKKV+tGrmTbIWRHGVXkcDJqupwrIpHwf6cBL0jFOMiJPZxjTN5N7WlNm9k46gxWg7sWg1f38wJaPiJ60sQTTo+Euxnf+040RBwzNHyJhHlQ552qAL630TrdW3rctZ1Cczrs06FcFZ9EOK6p1n0Az5Ginoa8eh6KgIRhPIXCsJBt8FNcBV3qu5y5NjOoGAkcqVndbsyIGH6oyP5rOC7fyPIVl8KYwlQF1BI++nxOB7P t4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/11/23 15:21, Mel Gorman wrote: > On Mon, Jan 09, 2023 at 08:43:40PM +0100, Vlastimil Babka wrote: >> On 7/19/22 10:28, Mel Gorman wrote: >> > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote: >> >> You are right, we do need some machenism to ensure mark skipped after >> >> scanned a !IS_ALIGNED block from fast_find. However, I think the following >> >> code may not working. Because *skip_updated* has been reset: >> >> if (!skip_updated) { >> >> skip_updated = true; >> >> if (test_and_set_skip(cc, page, low_pfn)) >> >> goto isolate_abort; >> >> } >> >> Why not ignore skip_updated after scanned a block, just like this: >> >> >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> >> index 962d05d1e187..1c388c45f127 100644 >> >> --- a/mm/compaction.c >> >> +++ b/mm/compaction.c >> >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc, >> >> unsigned long low_pfn, >> >> * rescanned twice in a row. >> >> */ >> >> if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { >> >> - if (valid_page && !skip_updated) >> >> + if (valid_page) >> >> set_pageblock_skip(valid_page); >> >> update_cached_migrate(cc, low_pfn); >> >> } >> >> >> > >> > Because valid_page still needs to be set but ensuring valid_page is set >> > is sufficient. >> > >> > The pageblock will be marked for skip on the first locking of a lruvec >> > >> > /* Try get exclusive access under lock */ >> > if (!skip_updated) { >> > skip_updated = true; >> > if (test_and_set_skip(cc, page, low_pfn)) >> > goto isolate_abort; >> > } >> > >> > That will happen regardless of alignment. >> >> I'm not sure about that as test_and_set_skip() has "if >> (!pageblock_aligned(pfn)) return false". So I think it only takes for the >> first pfn in pageblock to not be PageLRU and we don't try to get the >> exclusive access for it, and the following pfn's will not be aligned, and >> thus we never set_pageblock_skip() through this path. If we have nr_isolated >> > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk >> above neither. >> > > True. > >> I've been checking this area because we have some reports [1] for upstream >> 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction. >> Tracepoint data suggests we keep (successfully) isolating migratepages over >> and over through fast_find_migrateblock() (the pfn ranges are not linearly >> increasing but there's a cycle probably as we rotate the freelist), but are >> failing to migrate them (for some reason). Between 6.0 and 6.1 there's this >> patch as commit 7efc3b726103 so I strongly suspect it's the root cause and >> will be providing a kernel with revert to in the bug to confirm. >> Consider this an early heads up :) >> >> [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848 > > The trace shows that there is a lot of rescanning between a small subset > of pageblocks because the conditions for marking the block skip are not > met. The scan is not reaching the end of the pageblock because enough > pages were isolated, but none were migrated successfully. Eventually it > circles back to the same block. > > Pageblock skipping is tricky because we are trying to minimise both > latency and excessive scanning. However, tracking exactly when a block > is fully scanned requires an excessive amount of state. > > One potential mitigating factor is to forcibly scan a pageblock when > migrations fail even though it could be for transient reasons such as > page writeback or page dirty. This will sometimes migrate too many pages > (worst case 500) but pageblocks will be marked skip. > > This is not tested at all > > diff --git a/mm/compaction.c b/mm/compaction.c > index ca1603524bbe..9242fcaeb09c 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2387,6 +2387,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > cc->rescan = true; > } > > +retry_rescan: > switch (isolate_migratepages(cc)) { > case ISOLATE_ABORT: > ret = COMPACT_CONTENDED; > @@ -2429,15 +2430,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > goto out; > } > /* > - * We failed to migrate at least one page in the current > - * order-aligned block, so skip the rest of it. > + * If an ASYNC or SYNC_LIGHT fails to migrate a page > + * within the current order-aligned block, rescan the > + * entire block. Once the whole block is scanned, it'll > + * be marked "skip" to avoid rescanning in the near > + * future. This will isolate more pages than necessary > + * for the request but avoid loops due to > + * fast_find_migrateblock revisiting blocks that were > + * recently partially scanned. > */ > - if (cc->direct_compaction && > - (cc->mode == MIGRATE_ASYNC)) { > - cc->migrate_pfn = block_end_pfn( > - cc->migrate_pfn - 1, cc->order); > - /* Draining pcplists is useless in this case */ > - last_migrated_pfn = 0; > + if (cc->direct_compaction && !cc->rescan && > + (cc->mode < MIGRATE_SYNC)) { > + cc->rescan = true; > + > + /* > + * Draining pcplists does not help THP if > + * any page failed to migrate. Even after > + * drain, the pageblock will not be free. > + */ > + if (cc->order == HUGETLB_PAGE_ORDER) > + last_migrated_pfn = 0; > + > + goto retry_rescan; I think it won't work as expected, because we will not be rescanning the same pageblock, fast_find_migrateblock() will AFAICS pick a different one, there's nothing to prevent it. But the possible compaction state is rather complex for me to be confident about a quick fix. I think it will be safer to revert the patch for 6.2 and 6.1 stable, and redo it more carefully. > } > } >