From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5D1DEB64D9 for ; Wed, 14 Jun 2023 12:22:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27CA06B0074; Wed, 14 Jun 2023 08:22:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22CC78E0003; Wed, 14 Jun 2023 08:22:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1429B8E0002; Wed, 14 Jun 2023 08:22:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0796C6B0074 for ; Wed, 14 Jun 2023 08:22:30 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 55A371C81C7 for ; Wed, 14 Jun 2023 12:22:29 +0000 (UTC) X-FDA: 80901266418.20.9AC3614 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf29.hostedemail.com (Postfix) with ESMTP id B7DEE120006 for ; Wed, 14 Jun 2023 12:22:25 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686745347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l44LuB+W68YjsuL0HDg6iYp/vPMSaFOfLiB3wX9zNYo=; b=TV+esvU79b3A0tj8GLd4BT6ZojWIR3uKqWdAyxyfs63xgQbusP/J5+MIOqkBZkTwcj7Y5n h0l1CUR6mA56biXS8vW65voWLgcI7g1uAUdKE1tTAVXMK7wo61LgsT7GqhlVjb23LSAyGU 1Mqpxu/wj5nYfg+z6g8U4AvkOIfQaGc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686745347; a=rsa-sha256; cv=none; b=Hyv+N/UiaJOP0TsewVOKoo7zBaJ9Rx0hCcyVEhoSgDGSlVGUxxUz/XzjIviL55f4XPV+Vr oT9fXzcEmBCFJjzhASkn5h2Sxuo5iAdlTmDxjt3RolFxjzMc9hJpOU0j2cxXG8exAbDQcW l+s3yjrrjnKW+UYv0HS0k+PzNl5iaeo= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R431e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0Vl6WXZT_1686745339; Received: from 30.13.128.154(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Vl6WXZT_1686745339) by smtp.aliyun-inc.com; Wed, 14 Jun 2023 20:22:20 +0800 Message-ID: <4945e569-fb9c-3cb5-0c2f-da503632819d@linux.alibaba.com> Date: Wed, 14 Jun 2023 20:22:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH v2] mm: compaction: skip memory hole rapidly when isolating migratable pages To: Mel Gorman Cc: akpm@linux-foundation.org, vbabka@suse.cz, david@redhat.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <770f9f61472b24b6bc89adbd71a77d9cf62bb54f.1686646361.git.baolin.wang@linux.alibaba.com> <20230614095501.m4porztaibchrgwx@techsingularity.net> From: Baolin Wang In-Reply-To: <20230614095501.m4porztaibchrgwx@techsingularity.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B7DEE120006 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: u6pg9h87qwqt4k4anxobqxzwuzbp7u1a X-HE-Tag: 1686745345-27090 X-HE-Meta: U2FsdGVkX19I95yHSJnpD+Cf5oGZIpFZxxf33K/qNbJgm2LL4Zy3L42CHl6pUs6sBttoL20eC2Q6Y7UDBQGa6AIe0nGscF3KMJ6NoS6k52FER2EwygOxOyF+rRxIQrw7XRTjrb9cL+Ikx8z+WdKpvXxIcdFeZ8F5tmoIbdW8ugARS4VJEYG+lOuFpvTKvwOIIXKHna5dS+aSZRnSXx61kbV39GLLWu97MTeWtTfmduGMPNBnr0S8acOnEMNsVuXoV6HSYuWGO4+w6aRfd3TfTLt/rNf4FpNGSTDrwWmQq68zjoBLWEJjjgEvkFZBU1XuszKWKQosJo/n7WS6ves0DRTbAL1ZQurhKtMyzO90S7zybox0GWAfVaiviM8OhwnzrUfpmSTkwTJdVRAKc7gq73rT2sR3PFxGzjvCoJtFr/UiY56p1Pxm5Py/m3XVirQbATyT+8zYRNCMztiTojK5JaK9DfhnhSpBj8+U0lI5R3VAsmJLoVkCXPQrSOHR+ddaI5jMJbu5qEVLONmz0reCHDe2s1yZUbpiN13e0/kzl7w8OjP7/ZGY2pL24YLWMpvcjoAGPZn7jCdKEfLC8lcKErkGKQrlHpVf5TDTg03/N7MaB0e7vRhcHGldZYP9FQf+K+yuWl8G3o9xfGOl9tfXATZdHB28btQJdYDQtKei7VSEk7sNBvb1orD0Gkd44/RMml562Sqv50yt2Rd0Ad+q0q9M2Tu2DzHC4v24IYHC3wxJN3Z9AGhYh0cj6/ghzRG9bdjhty3Nbp3DVaUG8rAxbm9F/xCNMGgm0Jo8m7CR0bc/geM2gAyXegFh8k6ge7UU0LJl7vjIzDxWXiUvcrrpG8Ze3Igz66BvPK4D+njw8MZ1R9AylpuL5wXeTiq5lcZRN3Ap/xVibEVmMoqUAItd1iETQEHvcwJtVyCoo5fO6rd0d3vOnb6+MrcONv+73POpLft5PnvXZfXI4yywLGC S+o7eDlG 4z6PNDuC5Uk1hLqdWFxObtCtnYazC1o11P8ilFwQXBwX1LrubLhhoZnzE8Kc1Y2+loCI5QwmXTVwe2KyBSEIwJlQLALlV6urOdLTrUJ7/1Iqk36O+tezvl+rK3pe1iT9Nv9rB+4aLHNTjUhoToReoEQyNOUnTU/dqePFg4TaCyhfteBvEpc0J/Q6tnkJ/p6Fl7zesDR2bNyRYc9Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/14/2023 5:55 PM, Mel Gorman wrote: > On Tue, Jun 13, 2023 at 04:55:04PM +0800, Baolin Wang wrote: >> On some machines, the normal zone can have a large memory hole like >> below memory layout, and we can see the range from 0x100000000 to >> 0x1800000000 is a hole. So when isolating some migratable pages, the >> scanner can meet the hole and it will take more time to skip the large >> hole. From my measurement, I can see the isolation scanner will take >> 80us ~ 100us to skip the large hole [0x100000000 - 0x1800000000]. >> >> So adding a new helper to fast search next online memory section >> to skip the large hole can help to find next suitable pageblock >> efficiently. With this patch, I can see the large hole scanning only >> takes < 1us. >> >> [ 0.000000] Zone ranges: >> [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] >> [ 0.000000] DMA32 empty >> [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff] >> [ 0.000000] Movable zone start for each node >> [ 0.000000] Early memory node ranges >> [ 0.000000] node 0: [mem 0x0000000040000000-0x0000000fffffffff] >> [ 0.000000] node 0: [mem 0x0000001800000000-0x0000001fa3c7ffff] >> [ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff] >> [ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff] >> [ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff] >> [ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff] >> [ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff] >> [ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff] >> [ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff] >> [ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7ffffff] >> >> Signed-off-by: Baolin Wang > > This may only be necessary for non-contiguous zones so a check for > zone_contiguous could be made but I suspect the saving, if any, would be > marginal. Right. But the pageblock_pfn_to_page() have considered the contiguous case, and will not return NULL page for a contiguous zone. > However, it's subtle that block_end_pfn can end up in an arbirary location > past the end of the zone or past cc->free_pfn. As the "continue" will update > cc->migrate_pfn, that might lead to errors in the future. It would be a Ah, yes, thanks for pointing this out that I missed before. > lot safer to pass in cc->free_pfn and do two things with the value. First, > there is no point scanning for a valid online section past cc->free_pfn so > terminating after cc->free_pfn may save some cycles. Second, cc->migrate_pfn The skipping function introduced in this patch will only scan the first online section, so it can not terminate the scanning early by comparing if it is greater than cc->free_pfn. It can only compare the first online section with cc->free_pfn. > does not end up with an arbitrary value which is a more defensive approach > to any future programming errors. Right. So I think I should make sure the cc->migrate_pfn is not larger than cc->free_pfn with below change: @@ -1965,7 +1965,7 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc) next_pfn = skip_offline_sections(block_start_pfn); if (next_pfn) - block_end_pfn = next_pfn; + block_end_pfn = min(next_pfn, cc->free_pfn); continue; }