From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75288C4345F for ; Tue, 16 Apr 2024 04:58:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A1686B0082; Tue, 16 Apr 2024 00:58:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 050AE6B0085; Tue, 16 Apr 2024 00:58:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5B306B0087; Tue, 16 Apr 2024 00:58:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C2D0C6B0082 for ; Tue, 16 Apr 2024 00:58:09 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 721B41408A1 for ; Tue, 16 Apr 2024 04:58:09 +0000 (UTC) X-FDA: 82014188298.29.797CEFB Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf16.hostedemail.com (Postfix) with ESMTP id C265D18000E for ; Tue, 16 Apr 2024 04:58:06 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713243487; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AR76u2HImvYjTBEvK8K/xwg73FAgUWPwyY9yY+xB6pQ=; b=cYIbO2xe8G89TSCx6LUpJFSkhkhrhEQiVGCT2vGgU1BF9zWSoXMrwnnHql6YIYwgmIOaDh 8Ishc0UPLgXGmhVYnLaBD/QCDx/jBypXxDPYeCiYeL8v84AlYq0CmfkNFWnD5BgBTB8xI9 TvsuMKMAU1bwGvF9Bx/kkJi7IUPtfqU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713243487; a=rsa-sha256; cv=none; b=v9XaHrIez9M1MzTHysuUEx2wTU4n8+g7ZIVM57zCTwl54fa0vfih0Xed0OYWJaZsAFepq/ nBfwrMxw+wkIVfVh/X5rJbdAnMS5+Hn+mqR3Jv39fqM0BqQnijcJKDY/ccLqxCYpJ6Xsd7 X8n1U/4PFcWrHOSSfDV2HswNeZrJiGI= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4VJWvH37q9z1ypG3; Tue, 16 Apr 2024 12:55:39 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 633B218002F; Tue, 16 Apr 2024 12:58:01 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 16 Apr 2024 12:58:00 +0800 Message-ID: Date: Tue, 16 Apr 2024 12:58:00 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists Content-Language: en-US From: Kefeng Wang To: Barry Song <21cnbao@gmail.com> CC: David Hildenbrand , Andrew Morton , Huang Ying , Mel Gorman , Ryan Roberts , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> <54623c8c-a94f-4f88-bf53-5f92c634f78a@huawei.com> <3b931621-7cd1-4df8-9070-535ecaee970e@redhat.com> <90501d59-e3f2-4ac4-9e42-4eca3bb0a91b@huawei.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspamd-Queue-Id: C265D18000E X-Rspam-User: X-Stat-Signature: or8jcx7xczrmdsnuh737itwwrkgriwqd X-Rspamd-Server: rspam01 X-HE-Tag: 1713243486-153973 X-HE-Meta: U2FsdGVkX1+g88THzdGFejdMc4eE/3fKyHFRAsbNTy9efztI6+kkZfpvjRzUH7hkMZBM9cCceeDZglieLuNesBnRjlncubm9P3BNNpDWEpB34gBHDfom0tp/7ItT+HZg4iI/Ec7q7KVl9KODqI3qLcrk3eov0jlLL0CXRl2DkHwv9ntCbqVoiuvmT932490wZpoCtYnqIsTTV8owr8b1wJVMsiVpZIqQlPYc5EbOHczj9hT2IfXtiw0IEu/b7rj6vZWvtKfRqVypDCtQpua0Tw267k648Kr2O1u8aFycVixkSnb2bJjOY3utsZp/BxOhsgZoXGwrVPwafLc4EgT8wHE0oPzEhqIs1PcnX3ccb39f7huOENLOTx4boFNHMS1xrxlC3YlUZmajmS1l74cyTdGLfakt4CdM7SSKYn74vlUFxJzBMUaVWLr92tdhcpsOo6n6btvixgjnHh+QC/3/u+Iph9AJ2KUTo6GdoNf0R9f2GecG5PYCPXa9Ja8Z9fpni6w+yvQoGMXxyBPMDRILqqYza3UnoW6At0V6VcDWlXP6aytputxoFkooJ1duFCxsfJRjX8NHVBRmcuI6O4Ih7l7JqV3FgcJH/m81gCqP8QqTAcrbYNf8qlfxt/i1EdaKUMy/56ZcLW46ExCklp2vbKyu2sdvE1yK2kXkdBEWY+lHw9LEB3kqBXM4Z3eO5+0UKxbajDfNT1ok+zSklxNG+0QjZGWK9v/wfdIAE1SpBl+6m9ioTHq3KCeZqw3qZBOLDQ4PuJs0pkotF8o2AhyYaLmXzOkk+F1FC2/P/iiVsq+7/6ST8FR6HQj732uc3LsmhW4tS7JXSjmSLEFiknA3+D+VVhEDpJ7eIff6cQhyN16oHENaZPYrCUiODnSIB/VJD47Feg0q3SN8eZxxsWHF1CyMSfRAcpaDloJZIU+8Pzei/ws7vPbMDjrN/iHIbq+fmhxrCMDCWWxRoIYtnYB mymPVvpt Tf/8qmWeERoMaL14nR7zp7ie4v1Z8dh5xQtyyU+O6jfNBTMPzEZo68ImwcSPT4k5d2uT3ic9xA/jowiEe0TEScekvamP7jVZenAsX+7/J8n/XC5ifaZaRrPH9gsyvQbAjhmFrNBGiLbLR72mitHYoiPx/fscZVZ4Br3FQA3QZjMyICjo7+vCmrXF9RZF67ILejcz0aH/MvRSrpQG6Ap1iXJ/P2Y2sYDK4nkR1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/16 12:50, Kefeng Wang wrote: > > > On 2024/4/16 8:21, Barry Song wrote: >> On Tue, Apr 16, 2024 at 12:18 AM Kefeng Wang >> wrote: >>> >>> >>> >>> On 2024/4/15 18:52, David Hildenbrand wrote: >>>> On 15.04.24 10:59, Kefeng Wang wrote: >>>>> >>>>> >>>>> On 2024/4/15 16:18, Barry Song wrote: >>>>>> On Mon, Apr 15, 2024 at 8:12 PM Kefeng Wang >>>>>> wrote: >>>>>>> >>>>>>> Both the file pages and anonymous pages support large folio, >>>>>>> high-order >>>>>>> pages except PMD_ORDER will also be allocated frequently which could >>>>>>> increase the zone lock contention, allow high-order pages on pcp >>>>>>> lists >>>>>>> could reduce the big zone lock contention, but as commit >>>>>>> 44042b449872 >>>>>>> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu >>>>>>> lists") >>>>>>> pointed, it may not win in all the scenes, add a new control >>>>>>> sysfs to >>>>>>> enable or disable specified high-order pages stored on PCP lists, >>>>>>> the order >>>>>>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP list by >>>>>>> default. >>>>>> >>>>>> This is precisely something Baolin and I have discussed and intended >>>>>> to implement[1], >>>>>> but unfortunately, we haven't had the time to do so. >>>>> >>>>> Indeed, same thing. Recently, we are working on unixbench/lmbench >>>>> optimization, I tested Multi-size THP for anonymous memory by >>>>> hard-cord >>>>> PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some improvement but >>>>> not for all cases and not very stable, so re-implemented it by >>>>> according >>>>> to the user requirement and enable it dynamically. >>>> >>>> I'm wondering, though, if this is really a suitable candidate for a >>>> sysctl toggle. Can anybody really come up with an educated guess for >>>> these values? >>> >>> Not sure this is suitable in sysctl, but mTHP anon is enabled in sysctl, >>> we could trace __alloc_pages() and do order statistic to decide to >>> choose the high-order to be enabled on PCP. >>> >>>> >>>> Especially reading "Benchmarks Score shows a little improvoment(0.28%)" >>>> and "it may not win in all the scenes", to me it mostly sounds like >>>> "minimal impact" -- so who cares? >>> >>> Even though lock conflicts are eliminated, there is very limited >>> performance improvement(even maybe fluctuation), it is not a good >>> testcase to show improvement, just show the zone-lock issue, we need to >>> find other better testcase, maybe some test on Andriod(heavy use 64K, no >>> PMD THP), or LKP maybe give some help? >>> >>> I will try to find other testcase to show the benefit. >> >> Hi Kefeng, >> >> I wonder if you will see some major improvements on mTHP 64KiB using >> the below microbench I wrote just now, for example perf and time to >> finish the program >> >> #define DATA_SIZE (2UL * 1024 * 1024) >> >> int main(int argc, char **argv) >> { >>          /* make 32 concurrent alloc and free of mTHP */ >>          fork(); fork(); fork(); fork(); fork(); >> >>          for (int i = 0; i < 100000; i++) { >>                  void *addr = mmap(NULL, DATA_SIZE, PROT_READ | >> PROT_WRITE, >>                                  MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); >>                  if (addr == MAP_FAILED) { >>                          perror("fail to malloc"); >>                          return -1; >>                  } >>                  memset(addr, 0x11, DATA_SIZE); >>                  munmap(addr, DATA_SIZE); >>          } >> >>          return 0; >> } >> Rebased on next-20240415, echo never > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled Compare with echo 0 > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled echo 1 > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled > > 1) PCP disabled >     1    2    3    4    5    average > real    200.41    202.18    203.16    201.54    200.91    201.64 > user    6.49    6.21    6.25    6.31    6.35    6.322 > sys     193.3    195.39    196.3    194.65    194.01    194.73 > > 2) PCP enabled > real    198.25    199.26    195.51    199.28    189.12    196.284 > -2.66% > user    6.21    6.02    6.02    6.28    6.21    6.148       -2.75% > sys     191.46    192.64    188.96    192.47    182.39    189.584 > -2.64% > > for above test, time reduce 2.x% > > > And re-test page_fault1(anon) from will-it-scale > > 1) PCP enabled > tasks    processes    processes_idle    threads    threads_idle    linear > 0    0    100    0    100    0 > 1    1416915    98.95    1418128    98.95    1418128 > 20    5327312    79.22    3821312    94.36    28362560 > 40    9437184    58.58    4463657    94.55    56725120 > 60    8120003    38.16    4736716    94.61    85087680 > 80    7356508    18.29    4847824    94.46    113450240 > 100    7256185    1.48    4870096    94.61    141812800 > > 2) PCP disabled > tasks    processes    processes_idle    threads    threads_idle    linear > 0    0    100    0    100    0 > 1    1365398    98.95    1354502    98.95    1365398 > 20    5174918    79.22    3722368    94.65    27307960 > 40    9094265    58.58    4427267    94.82    54615920 > 60    8021606    38.18    4572896    94.93    81923880 > 80    7497318    18.2    4637062    94.76    109231840 > 100    6819897    1.47    4654521    94.63    136539800 > > ------------------------------------ > 1) vs 2)  pcp enabled improve 3.86% > > 3) PCP re-enabled > tasks    processes    processes_idle    threads    threads_idle    linear > 0    0    100    0    100    0 > 1    1419036    98.96    1428403    98.95    1428403 > 20    5356092    79.23    3851849    94.41    28568060 > 40    9437184    58.58    4512918    94.63    57136120 > 60    8252342    38.16    4659552    94.68    85704180 > 80    7414899    18.26    4790576    94.77    114272240 > 100    7062902    1.46    4759030    94.64    142840300 > > 4) PCP re-disabled > tasks    processes    processes_idle    threads    threads_idle    linear > 0    0    100    0    100    0 > 1    1352649    98.95    1354806    98.95    1354806 > 20    5172924    79.22    3719292    94.64    27096120 > 40    9174505    58.59    4310649    94.93    54192240 > 60    8021606    38.17    4552960    94.81    81288360 > 80    7497318    18.18    4671638    94.81    108384480 > 100    6823926    1.47    4725955    94.64    135480600 > > ------------------------------------ > 3) vs 4)  pcp enabled improve 5.43% > > Average: 4.645% > > > > > >>> >>>> >>>> How much is the cost vs. benefit of just having one sane system >>>> configuration? >>>> >>> >>> For arm64 with 4k, five more high-orders(4~8), five more pcplists, >>> and for high-orders, we assumes most of them are moveable, but maybe >>> not, so enable it by default maybe more fragmentization, see >>> 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized >>> allocations"). >>> >