From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00624C7EE23 for ; Mon, 12 Jun 2023 09:36:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 853826B0072; Mon, 12 Jun 2023 05:36:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DD966B0074; Mon, 12 Jun 2023 05:36:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67D696B0075; Mon, 12 Jun 2023 05:36:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 572AF6B0072 for ; Mon, 12 Jun 2023 05:36:32 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2206DC0184 for ; Mon, 12 Jun 2023 09:36:32 +0000 (UTC) X-FDA: 80893590624.15.4096BD6 Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) by imf22.hostedemail.com (Postfix) with ESMTP id 11754C0017 for ; Mon, 12 Jun 2023 09:36:28 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.112 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686562590; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d6dMz1+Hd8BE3JICI6VyQ1FtE0A3YxpH0EEp4QxboGY=; b=YE+xF0xac3LkrWIN5WGh3qWL0V3cnYwMz3d0EIqpZ9YwFiTApHXwE6foTal+Fbj4nh9ShW Pwl64h9ynhazhZ045sNWBP9u1Q2Q+UNRY+5HsyPBtpyVW0y4k3vTggkXlo9oH1IgWyRyns D+pRIzySwsY24OwM/hRda9t5M6ddgVA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686562590; a=rsa-sha256; cv=none; b=qSn2PDsSjYeijJC/zc7YwNmG54NTg85kBBJ80wkmhsUI1nOyGwN5J23LgP0ThJFwS6J//2 oGSNcGsQ/qXfgj4zHfiFPbnfmkS9jtCa690ISzmpVd35tQzVxBAABJVYgQxSxSUAnYIqp2 4KkmMbPZ778aCkLF2ChOp7HVWypx7/k= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.112 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VkwIOnf_1686562584; Received: from 30.97.48.52(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VkwIOnf_1686562584) by smtp.aliyun-inc.com; Mon, 12 Jun 2023 17:36:25 +0800 Message-ID: Date: Mon, 12 Jun 2023 17:36:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH] mm: compaction: skip memory hole rapidly when isolating migratable pages To: "Huang, Ying" Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, vbabka@suse.cz, david@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8cc668b77c8eb2fa78058b3d81386ebed9c5a9cd.1686294549.git.baolin.wang@linux.alibaba.com> <87sfax6v7c.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Baolin Wang In-Reply-To: <87sfax6v7c.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: tnmce9hj55b69ofwpihm7y1mw6zunrnm X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 11754C0017 X-Rspam-User: X-HE-Tag: 1686562588-165815 X-HE-Meta: U2FsdGVkX19FF2fjStK5wbYhg4gNxbBFD8BKQ1pamN5VNVNP3z2jk/oAhd9W7k/7z6jWlLIBzwhWXaRcZ4j/xPHNJiHJ8PAHelcPqNVrbpoyKBQGN01xK28IoLp/xRN84ApWrZMpGWM/tcJNjsZxhUWjdSNUasxxQ5WRQA56rrhjoHDRqeyFiJq37HnaTvhWdIbha6l6AWzrtZIVXh6Mvw34L326b7nRXzD4tlKRLvrNz3H0q/KKZyYRw3qm4vKwIO32ONvQ6P70qME0fgHAEj+JIWsD1uqz7WsQbgXIYN+CHMf6oB7LwRQa9zHZiNmCxhQRdz9bI1D1jAsXY/Y/ztWkHkPz9vuSDbPWZkAS75La+uDMPkjWAhxY9+p86ushdMP+w4YWxtesMpfZOaHlqhsDnQ/DzG/qKeUetqcMynmF4veMMwNqCEyAGRvGKRPRwfuFrs6no4N2jtPk4r9KzMZYawNz6mWB2ZeOQhxMbbQ6Ec9dknKOebL3jy06jv8HObZ+d3rbg/bpYBQsX1DcaYdqL2vrOUpNwbVI2tIf1biwwj7IyT+ecFbhSApISLG9WEt4hKGmRWDTawN9Cka00GcEDk5X/PQb1EqwcMYEcgiVM2X6nW0F0rc+d2pQ4coUzlpOAsoH4M5N26hBLUpgCK4y0e+u7KH3M3YIEsgQf8RzE9qDMn9BLAu92q+/TprUOz7bC+rJojmRzAF0rT/moWyFtSQCRTx6kPSmOXa7EorNUgOBCOfMCTJRxQ4fp3718FdQFAIVIv8gAwRCb7LUsjLjITLkxIoqJ7JoQmJCNKwuzxslWVVRFpO9TKtnw3KOdXxJ0fobBOcg87RdDWPW/gCkN1tvoVaMcbAh6mFeEEa8D6h83x2YivQrX8VsBNruKkyuElz3GhGO45NGMoPZDFOoIarXbbtEuhhW8kkFu3lZS70/q+zA+IVJxuEzBBVsri2oHKUytKn1YiygzD8 hRk6GmjF Satfp20rIESbkz5/YVdjCWH0yBZhQ9fN30HEl5PxFGuLbbmxA/5B2+b87J93OtHRSFU0E/MqebDA6fR6VKnvKRk7MQ85r6Wu7iC/vtJPY6XGS0pkZS9MSLlgSM9Ig6bFF6EElB0/mPR+LM3c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/12/2023 2:39 PM, Huang, Ying wrote: > Baolin Wang writes: > >> On some machines, the normal zone can have a large memory hole like >> below memory layout, and we can see the range from 0x100000000 to >> 0x1800000000 is a hole. So when isolating some migratable pages, the >> scanner can meet the hole and it will take more time to skip the large >> hole. From my measurement, I can see the isolation scanner will take >> 80us ~ 100us to skip the large hole [0x100000000 - 0x1800000000]. >> >> So adding a new helper to fast search next online memory section >> to skip the large hole can help to find next suitable pageblock >> efficiently. With this patch, I can see the large hole scanning only >> takes < 1us. >> >> [ 0.000000] Zone ranges: >> [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] >> [ 0.000000] DMA32 empty >> [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff] >> [ 0.000000] Movable zone start for each node >> [ 0.000000] Early memory node ranges >> [ 0.000000] node 0: [mem 0x0000000040000000-0x0000000fffffffff] >> [ 0.000000] node 0: [mem 0x0000001800000000-0x0000001fa3c7ffff] >> [ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff] >> [ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff] >> [ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff] >> [ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff] >> [ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff] >> [ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff] >> [ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff] >> [ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7ffffff] >> >> Signed-off-by: Baolin Wang >> --- >> include/linux/mmzone.h | 10 ++++++++++ >> mm/compaction.c | 23 ++++++++++++++++++++++- >> 2 files changed, 32 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 5a7ada0413da..87e6c535d895 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -2000,6 +2000,16 @@ static inline unsigned long next_present_section_nr(unsigned long section_nr) >> return -1; >> } >> >> +static inline unsigned long next_online_section_nr(unsigned long section_nr) >> +{ >> + while (++section_nr <= __highest_present_section_nr) { >> + if (online_section_nr(section_nr)) >> + return section_nr; >> + } >> + >> + return -1UL; >> +} >> + >> /* >> * These are _only_ used during initialisation, therefore they >> * can use __initdata ... They could have names to indicate >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 3398ef3a55fe..3a55fdd20c49 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -229,6 +229,21 @@ static void reset_cached_positions(struct zone *zone) >> pageblock_start_pfn(zone_end_pfn(zone) - 1); >> } >> >> +static unsigned long skip_hole_pageblock(unsigned long start_pfn) >> +{ >> + unsigned long next_online_nr; >> + unsigned long start_nr = pfn_to_section_nr(start_pfn); >> + >> + if (online_section_nr(start_nr)) >> + return -1UL; > > Define a macro for the maigic "-1UL"? Which is used for multiple times > in the patch. I am struggling to find a readable macro for these '-1UL', since the '-1UL' in next_online_section_nr() indicates that it can not find an online section. However the '-1' in skip_hole_pageblock() indicates that it can not find an online pfn. So after more thinking, I will change to return 'NR_MEM_SECTIONS' if can not find next online section in next_online_section_nr(). And in skip_hole_pageblock(), I will change to return 0 if can not find next online pfn. What do you think? static unsigned long skip_hole_pageblock(unsigned long start_pfn) { unsigned long next_online_nr; unsigned long start_nr = pfn_to_section_nr(start_pfn); if (online_section_nr(start_nr)) return 0; next_online_nr = next_online_section_nr(start_nr); if (next_online_nr < NR_MEM_SECTIONS) return section_nr_to_pfn(next_online_nr); return 0; } >> + >> + next_online_nr = next_online_section_nr(start_nr); >> + if (next_online_nr != -1UL) >> + return section_nr_to_pfn(next_online_nr); >> + >> + return -1UL; >> +} >> + >> /* >> * Compound pages of >= pageblock_order should consistently be skipped until >> * released. It is always pointless to compact pages of such order (if they are >> @@ -1991,8 +2006,14 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) >> >> page = pageblock_pfn_to_page(block_start_pfn, >> block_end_pfn, cc->zone); >> - if (!page) >> + if (!page) { >> + unsigned long next_pfn; >> + >> + next_pfn = skip_hole_pageblock(block_start_pfn); >> + if (next_pfn != -1UL) >> + block_end_pfn = next_pfn; >> continue; >> + } >> >> /* >> * If isolation recently failed, do not retry. Only check the > > Do we need to do similar change in isolate_freepages()? Yes, it's in my todo list with some measurement data. Thanks for your comments.