From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD6DBC54EBD for ; Mon, 9 Jan 2023 19:43:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED8328E0002; Mon, 9 Jan 2023 14:43:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E87C48E0001; Mon, 9 Jan 2023 14:43:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D27EB8E0002; Mon, 9 Jan 2023 14:43:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C13CD8E0001 for ; Mon, 9 Jan 2023 14:43:46 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 804001A02A0 for ; Mon, 9 Jan 2023 19:43:46 +0000 (UTC) X-FDA: 80336285652.25.CA09A4B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf08.hostedemail.com (Postfix) with ESMTP id 6C2CE16001A for ; Mon, 9 Jan 2023 19:43:43 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=CuaTAaJR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=MXQF6WXt; spf=pass (imf08.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673293423; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c0EotYF5gjPvZEtfmaffUoRFiKdZT3UOGSeubNgEU9M=; b=H4ngqGJFINsUmB1faIbeHx2CHdCaMAOIvfe1f139G9opK1jUBLeSN0EeFyRfR7yCNHnEFi Rkoim6YxWHgSPdlvmHK5AsX3AMrinhn/bCc0nz12MZoweGWOv5Axzl/BYcnPWAuU8r+mbm 6bw7VCdgdXb39NJa2vP/T9kObt61kvs= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=CuaTAaJR; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=MXQF6WXt; spf=pass (imf08.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673293423; a=rsa-sha256; cv=none; b=7wn9ATHlloWnWUUP0tv+Le6ippmTfU6N8aOSrdDTQA9SuSmoUslOwHjL3EPGot6N1sVFhs HgrkB0SUR2PYFdJXlFAWTntHQlyozbxTPXlPflfyEhMhbxpUx6FKCZod97keFibg4wh2oq 4l4bMS3lr9LpGGcw76CVKxaV8Eqaajc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E8FF23F7DA; Mon, 9 Jan 2023 19:43:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1673293420; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c0EotYF5gjPvZEtfmaffUoRFiKdZT3UOGSeubNgEU9M=; b=CuaTAaJRB0IHM9gPzeCREccIHUnAq2XLrwVWzYEBCg/QkxF+I62/OZJ3w6NnHL4TGNho95 NqbOIe2QQiK6cFJmCi94jJKvpYjC1C8trE6dCU1camcFrXkCaqC5mTYk5Fg8yFW4mG4nxJ 428vxkaJY+I/5CU22oJT2RSIWFO7rPk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1673293420; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c0EotYF5gjPvZEtfmaffUoRFiKdZT3UOGSeubNgEU9M=; b=MXQF6WXt3nJedfa2U+Bwx0E8YMpmm8BGTgoITBq3c1PXcwtIKiohaiR8ZIP4bTU2P0kpwM Rd326vrmkvknoDCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D0E72134AD; Mon, 9 Jan 2023 19:43:40 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gWIbMmxuvGN6VQAAMHmgww (envelope-from ); Mon, 09 Jan 2023 19:43:40 +0000 Message-ID: Date: Mon, 9 Jan 2023 20:43:40 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v3] mm/compaction: fix set skip in fast_find_migrateblock To: Mel Gorman , Chuyi Zhou Cc: Andrew Morton , linux-mm@kvack.org References: <20220713062009.597255-1-zhouchuyi@bytedance.com> <20220713082814.bed234e00d7f5ceb3858352a@linux-foundation.org> <20220714115020.GA3563@techsingularity.net> <3529e159-8c22-5d67-5e5d-c912784df9a0@bytedance.com> <20220719082854.GB3563@techsingularity.net> Content-Language: en-US From: Vlastimil Babka In-Reply-To: <20220719082854.GB3563@techsingularity.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6C2CE16001A X-Stat-Signature: 5cqjtimp61z33bgxpxupughcm16ycrng X-HE-Tag: 1673293423-960225 X-HE-Meta: U2FsdGVkX19yuqKTv0+25li8MtGKTmOurWyDXbt6QqpqppvOF/v4K5nlE7mlYUQe+q65Qj1On3rzA5n+yddC2W3gckg2gCpJ6mxZ0IfKf7zQbvwCylj1PCIWKTr7RvqeJ/n8my/cCIys+piBNz3qTZmaXZG7SfCwSIv5nBjBn6MNpEJVM0oJZydq5v4Oauku0NvptlXgqgkyMTNq2BStS87dqxYd/rS22w/relz936iIoqC4fz9JQCdnfjULHeiwxnkAdImPoF/tKb7Hal8Ci60uP2YwsTm8YhAhWx+ooVavOg3NcNPwTh0XXiDKBV2C1sRs8Y90jW2yuo5RGuk/XY/05W+iiBh3g7NxAQ83DCjcqeLKWKFW75uG58xAhHTVlCa/mhBwN3FnRKOitt9t7QnKlSAqg44d7aIQhlRJs4mDRUec8SaW9vI0vQlFi+vBUvEj508pPYoTI7WMeo/hQQjBQKB55oHGb2Aol8jywiG3TJ4cmfXFaMlPCU9sWmwomoowIeOCqSNmlHG1q1KWREwmw0wFFaTI/3Q6U6YJWI2JAC09grOcTr2xeYNlXkHs3vh4nnN/+rVEq/eLQzYnh6ACdIMkzw5oPn6jiq1YY4YRH2etc/4nsCEoXYOzQ9YwV+rThjykD57h4Zz4b717UK9FxHJQBpkzadpWQj4c2nedRLBjjwZQoHtk/n4f6ofDPsF8IR8NY65cFhnqtEhtzQuYlOz5l/nAb04nAC/N7Dl81hzkbwJFGY9o19jxPAS9GWG0KKDK0DdxUzD2OSnTEdy7X3feLlt4mzLvNHtk1o6VHPazBTn2Yv1tVtOL/REp9xfBAPiFPPLXZfHPx3O0ft39CmHB5qbXNXxGrnYQvC8MTIBaSoNTudMtxrXWrwxqNO2ZSjHIIIonZZOBtPHsnz8s+Msze3BjdHpcwaH3oDRRi4T1wFQyNznpzF9r/iyogW76OdgE3TEfPhXp2Rf 64w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/19/22 10:28, Mel Gorman wrote: > On Fri, Jul 15, 2022 at 11:26:01PM +0800, Chuyi Zhou wrote: >> >> >> ??? 2022/7/14 ??????7:50, Mel Gorman ??????: >> > On Wed, Jul 13, 2022 at 08:28:14AM -0700, Andrew Morton wrote: >> > > (cc Mel) >> > > >> > > On Wed, 13 Jul 2022 14:20:09 +0800 Chuyi Zhou wrote: >> > > >> > > > From: zhouchuyi >> > > > >> > > > When we successfully find a pageblock in fast_find_migrateblock(), >> > > > the block will be set skip-flag through set_pageblock_skip(). However, >> > > > when entering isolate_migratepages_block(), the whole pageblock will >> > > > be skipped due to the branch >> > > > 'if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages))'. >> > > > Eventually we will goto isolate_abort and isolate nothing. That cause >> > > > fast_find_migrateblock useless. >> > > > >> > >> > It's not very clear *why* this failed from the changelog because >> > superficially !valid_page will be true for the first pageblock and there >> > is a reasonable expectation it will be aligned. Is the following accurate >> > based on your analysis? >> > >> > However, when entering isolate_migratepages_block(), the first >> > pageblock will be skipped in the branch 'if (!valid_page && >> > IS_ALIGNED(low_pfn, pageblock_nr_pages))' as isolation_suitable >> > returns true due to the skip bit set by fast_find_migrateblock(). >> > >> > If so, please update the changelog as a reviewer handling backports may >> > wonder what exactly is wrong with that branch. >> > >> Hi Mel, thanks for your review. >> >> If fast scanning failed, the return block may not be aligned, because we get >> pfn from *free_pfn*. When fast-find success, the return value *pfn* is get >> from pageblock_start_pfn, and it will be passed to >> isolate_migratepages_block as low_pfn. I think normally the value get from >> pageblock_start_pfn should be aligned with pageblock_nr_pages. I have used >> printk test it. Maby I miss something important? >> pfn = pageblock_start_pfn(free_pfn); >> ... >> found_block = true; >> set_pageblock_skip(freepage); >> break; > > The return block indeed may not be aligned. It could simply be a > restart. The changelog could still do with a little clarification but > your patch is still fine. > >> > Second, what guarantees a block returned by fast_find that is not >> > aligned gets marked skipped after it is scanned? The set_pageblock_skip >> > is only called when there is a valid page and it may not be set if >> > !IS_ALIGNED(low_pfn). Is something like this untested hunk also necessary? >> > >> You are right, we do need some machenism to ensure mark skipped after >> scanned a !IS_ALIGNED block from fast_find. However, I think the following >> code may not working. Because *skip_updated* has been reset: >> if (!skip_updated) { >> skip_updated = true; >> if (test_and_set_skip(cc, page, low_pfn)) >> goto isolate_abort; >> } >> Why not ignore skip_updated after scanned a block, just like this: >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 962d05d1e187..1c388c45f127 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -1180,7 +1180,7 @@ isolate_migratepages_block(struct compact_control *cc, >> unsigned long low_pfn, >> * rescanned twice in a row. >> */ >> if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { >> - if (valid_page && !skip_updated) >> + if (valid_page) >> set_pageblock_skip(valid_page); >> update_cached_migrate(cc, low_pfn); >> } >> > > Because valid_page still needs to be set but ensuring valid_page is set > is sufficient. > > The pageblock will be marked for skip on the first locking of a lruvec > > /* Try get exclusive access under lock */ > if (!skip_updated) { > skip_updated = true; > if (test_and_set_skip(cc, page, low_pfn)) > goto isolate_abort; > } > > That will happen regardless of alignment. I'm not sure about that as test_and_set_skip() has "if (!pageblock_aligned(pfn)) return false". So I think it only takes for the first pfn in pageblock to not be PageLRU and we don't try to get the exclusive access for it, and the following pfn's will not be aligned, and thus we never set_pageblock_skip() through this path. If we have nr_isolated > 0 and not cc->rescan, we might not set_pageblock_skip() through the hunk above neither. I've been checking this area because we have some reports [1] for upstream 6.1 causing some long loops in khugepaged pegging 100% CPU in compaction. Tracepoint data suggests we keep (successfully) isolating migratepages over and over through fast_find_migrateblock() (the pfn ranges are not linearly increasing but there's a cycle probably as we rotate the freelist), but are failing to migrate them (for some reason). Between 6.0 and 6.1 there's this patch as commit 7efc3b726103 so I strongly suspect it's the root cause and will be providing a kernel with revert to in the bug to confirm. Consider this an early heads up :) [1] https://bugzilla.suse.com/show_bug.cgi?id=1206848 > The intent here was to avoid > multiple scanners trying to isolate the same pageblock. The second > check for update happens at the end of the block > > if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) { > if (valid_page && !skip_updated) > set_pageblock_skip(valid_page); > update_cached_migrate(cc, low_pfn); > } > > And it's the second one that requires a valid_page. If valid_page is > available then the pageblock will get marked skip in most cases that > matter. If anything, it can get set prematurely due to "Try get > exclusive access" if the pageblock is not fully scanned but checking if > pageblock should be *cleared* if there was a partial scan would be a > different patch and not even clear that it worth the complexity. >