From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DFF9C4345F for ; Tue, 16 Apr 2024 08:06:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BAD36B0087; Tue, 16 Apr 2024 04:06:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 944D26B0089; Tue, 16 Apr 2024 04:06:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E5006B008A; Tue, 16 Apr 2024 04:06:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5CBAB6B0087 for ; Tue, 16 Apr 2024 04:06:43 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DC6A912098D for ; Tue, 16 Apr 2024 08:06:42 +0000 (UTC) X-FDA: 82014663444.21.470BBFC Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf20.hostedemail.com (Postfix) with ESMTP id 7FCE51C0021 for ; Tue, 16 Apr 2024 08:06:39 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713254800; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YzXoy7PdT/SUcufHzEoC64cwgeHcOAWel5FeSL4fiIA=; b=IhC8IIAK/wRxNGZvMPTjWpJqfv21PEraLOeEHZPy5DX5s5e1PGHjoR1ykwkqo0wZYid2E9 ZWPMrFUnDtXkJZ34NHCbMKw/srBi8aRqpSIhWUKJZLfNbXzwLb6Q8QfwGYZaNWO3kmA2X/ Czix+6wM6l0pbgRfxjmglM/FjyTE2ic= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713254800; a=rsa-sha256; cv=none; b=QbzJoeVL5I6vMsuJsUAoD2jFxedAxCV5FTZr+V21mBNHIbaaO2UhwXNyThOuKwvRjsHi7j bMZmbWBdNn2TBeFGtZ8DQPjNwIl2GkHnMIBO+LO+dxtiWBFhMWs7zmCS5VPZsI7/cpyx7i nFnDFFknrsiz6/voAloAuV40Kh5+7F8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4VJc494KsFz1RDBJ; Tue, 16 Apr 2024 16:03:37 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 0E04518002D; Tue, 16 Apr 2024 16:06:35 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 16 Apr 2024 16:06:34 +0800 Message-ID: Date: Tue, 16 Apr 2024 16:06:33 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists Content-Language: en-US To: David Hildenbrand , Barry Song <21cnbao@gmail.com> CC: Andrew Morton , Huang Ying , Mel Gorman , Ryan Roberts , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> <54623c8c-a94f-4f88-bf53-5f92c634f78a@huawei.com> <3b931621-7cd1-4df8-9070-535ecaee970e@redhat.com> <90501d59-e3f2-4ac4-9e42-4eca3bb0a91b@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpemm100001.china.huawei.com (7.185.36.93) X-Stat-Signature: d8rrypmpyq63bbowj44h15ypyj3xq5bb X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7FCE51C0021 X-Rspam-User: X-HE-Tag: 1713254799-690569 X-HE-Meta: U2FsdGVkX19mntzBGeVCOGM8SN07LUJTWb02O4fRgU7qZG2dj6sGBFWhgbMxFswsfXnkPWv+uZiUoSbyS48RbsoCpua3O2QQ4kEkVNEbtquBmgt6qJ3EmlATfUL/RRn2o24NkJX7BnZExhsBUq/+1ECWBEoSaVfGOB+lZEWtMlsqnQ0RGI3bWKaZJvuqkSjIWJWPps0Wy2uoUNkdKIk52V+cgGLeOniVxge0owNrAKCyKPuc0k5u+Mijg8epHeOmp+U5oNweEqk0NkN7IT/RRN0DPEF2XJTNl+P5ny0VyY5t/6NHpyIdUOQJm8FXD9ObZDkik3DAeNAEzNgHpPeejmARSSbHUulZtd+28m7tzIK3uO+gqP2zXcxyb+S7CdyknkB/zHYtyM0iu8y3smC8IHAAYbalytl1lDpMSe0WYlqzJtCPKoHhNB/RmQMKHwV1hyzjvIrCV9dtq74c2jjKWLBtCMc/WkDyFTfIhpVhXTDFcE8f3rQLeHNFP0xqrMXBB2fhBDEje+O94fA6zGbqQ0Zeitlv2Qj0jJ8DUBqw4CQ+Q99IKpeoMEN6jiyfHmtOkGtVmsQqqgCPlME/H386iASIHc7TAgIV39FHnkvCNvsE+KzeCW0T+w8OKHfrcQ/8NCeSbxNLafh36r20oKQw6QkulFPfaHBzRtloluefF7B97WwxjUX/gh7H7PwGwh+sPcWRasnjwY23Q37a+N6DPLowGpTqqV++K0DVN1pwfloKy4VHGloOvTdw7fFMaJOZH0SRete3PzD/fBetlAHRjt/R7PIRMSto73X1s24kzLCZpBvkxbGO4uaJ1sFslgRVzIFd1vs+ahEHFKC3gmHqiW0nTnWDo/XeiKNlywJ4nZMsD1Q/MXlQJ3bahlWTct9KkuOI94a4h2/GPxfI+gc/xliHgxfRUH6gDGaHbrjDw+BBs8WU/PxHatODuxcAOf45dHa3I8WMC4aBWijeU5h iTt7uS3m 0NGNvMe2OUYSDJ8tTl1Xsxvy6tpsvq3SH2KbTnTwxCkhIsJgJgGPWaTX/w3FFx3DihY1tF+VHvMaIkg7hoFBMbtbpKl02dzhPALg4dEjMscDH6uKMjUuSxWXKOt/k2qtyn8F324+gy+k8/sn5jOb8rKwqmNL4kqiGqNKin4y52pKJPO+z2MHIKpDFqcnXyeEUGSF71gvOWqfsLf6JArQl3tAeNA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/16 15:03, David Hildenbrand wrote: > On 16.04.24 07:26, Barry Song wrote: >> On Tue, Apr 16, 2024 at 4:58 PM Kefeng Wang >> wrote: >>> >>> >>> >>> On 2024/4/16 12:50, Kefeng Wang wrote: >>>> >>>> >>>> On 2024/4/16 8:21, Barry Song wrote: >>>>> On Tue, Apr 16, 2024 at 12:18 AM Kefeng Wang >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2024/4/15 18:52, David Hildenbrand wrote: >>>>>>> On 15.04.24 10:59, Kefeng Wang wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2024/4/15 16:18, Barry Song wrote: >>>>>>>>> On Mon, Apr 15, 2024 at 8:12 PM Kefeng Wang >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Both the file pages and anonymous pages support large folio, >>>>>>>>>> high-order >>>>>>>>>> pages except PMD_ORDER will also be allocated frequently which >>>>>>>>>> could >>>>>>>>>> increase the zone lock contention, allow high-order pages on pcp >>>>>>>>>> lists >>>>>>>>>> could reduce the big zone lock contention, but as commit >>>>>>>>>> 44042b449872 >>>>>>>>>> ("mm/page_alloc: allow high-order pages to be stored on the >>>>>>>>>> per-cpu >>>>>>>>>> lists") >>>>>>>>>> pointed, it may not win in all the scenes, add a new control >>>>>>>>>> sysfs to >>>>>>>>>> enable or disable specified high-order pages stored on PCP lists, >>>>>>>>>> the order >>>>>>>>>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP >>>>>>>>>> list by >>>>>>>>>> default. >>>>>>>>> >>>>>>>>> This is precisely something Baolin and I have discussed and >>>>>>>>> intended >>>>>>>>> to implement[1], >>>>>>>>> but unfortunately, we haven't had the time to do so. >>>>>>>> >>>>>>>> Indeed, same thing. Recently, we are working on unixbench/lmbench >>>>>>>> optimization, I tested Multi-size THP for anonymous memory by >>>>>>>> hard-cord >>>>>>>> PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some >>>>>>>> improvement but >>>>>>>> not for all cases and not very stable, so re-implemented it by >>>>>>>> according >>>>>>>> to the user requirement and enable it dynamically. >>>>>>> >>>>>>> I'm wondering, though, if this is really a suitable candidate for a >>>>>>> sysctl toggle. Can anybody really come up with an educated guess for >>>>>>> these values? >>>>>> >>>>>> Not sure this is suitable in sysctl, but mTHP anon is enabled in >>>>>> sysctl, >>>>>> we could trace __alloc_pages() and do order statistic to decide to >>>>>> choose the high-order to be enabled on PCP. >>>>>> >>>>>>> >>>>>>> Especially reading "Benchmarks Score shows a little >>>>>>> improvoment(0.28%)" >>>>>>> and "it may not win in all the scenes", to me it mostly sounds like >>>>>>> "minimal impact" -- so who cares? >>>>>> >>>>>> Even though lock conflicts are eliminated, there is very limited >>>>>> performance improvement(even maybe fluctuation), it is not a good >>>>>> testcase to show improvement, just show the zone-lock issue, we >>>>>> need to >>>>>> find other better testcase, maybe some test on Andriod(heavy use >>>>>> 64K, no >>>>>> PMD THP), or LKP maybe give some help? >>>>>> >>>>>> I will try to find other testcase to show the benefit. >>>>> >>>>> Hi Kefeng, >>>>> >>>>> I wonder if you will see some major improvements on mTHP 64KiB using >>>>> the below microbench I wrote just now, for example perf and time to >>>>> finish the program >>>>> >>>>> #define DATA_SIZE (2UL * 1024 * 1024) >>>>> >>>>> int main(int argc, char **argv) >>>>> { >>>>>           /* make 32 concurrent alloc and free of mTHP */ >>>>>           fork(); fork(); fork(); fork(); fork(); >>>>> >>>>>           for (int i = 0; i < 100000; i++) { >>>>>                   void *addr = mmap(NULL, DATA_SIZE, PROT_READ | >>>>> PROT_WRITE, >>>>>                                   MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); >>>>>                   if (addr == MAP_FAILED) { >>>>>                           perror("fail to malloc"); >>>>>                           return -1; >>>>>                   } >>>>>                   memset(addr, 0x11, DATA_SIZE); >>>>>                   munmap(addr, DATA_SIZE); >>>>>           } >>>>> >>>>>           return 0; >>>>> } >>>>> >>> >>> Rebased on next-20240415, >>> >>> echo never > /sys/kernel/mm/transparent_hugepage/enabled >>> echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled >>> >>> Compare with >>>     echo 0 > >>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled >>>     echo 1 > >>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled >>> >>>> >>>> 1) PCP disabled >>>>       1    2    3    4    5    average >>>> real    200.41    202.18    203.16    201.54    200.91    201.64 >>>> user    6.49    6.21    6.25    6.31    6.35    6.322 >>>> sys     193.3    195.39    196.3    194.65    194.01    194.73 >>>> >>>> 2) PCP enabled >>>> real    198.25    199.26    195.51    199.28    189.12    196.284 >>>> -2.66% >>>> user    6.21    6.02    6.02    6.28    6.21    6.148       -2.75% >>>> sys     191.46    192.64    188.96    192.47    182.39    189.584 >>>> -2.64% >>>> >>>> for above test, time reduce 2.x% >> >> This is an improvement from 0.28%, but it's still below my expectations. > > Yes, it's noise. Maybe we need a system with more Cores/Sockets? But it > does feel a bit like we're trying to come up with the problem after we > have a solution; I'd have thought some existing benchmark could > highlight if that is worth it. 96 core, with 129 threads, a quick test with pcp_enabled to control hugepages-2048KB, it is no big improvement on 2M PCP enabled 1 2 3 average real 221.8 225.6 221.5 222.9666667 user 14.91 14.91 17.05 15.62333333 sys 141.91 159.25 156.23 152.4633333 PCP disabled real 230.76 231.39 228.39 230.18 user 15.47 15.88 17.5 16.28333333 sys 159.07 162.32 159.09 160.16 From 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists"), it seems limited improve, netperf-udp 5.13.0-rc2 5.13.0-rc2 mm-pcpburst-v3r4 mm-pcphighorder-v1r7 Hmean send-64 261.46 ( 0.00%) 266.30 * 1.85%* Hmean send-128 516.35 ( 0.00%) 536.78 * 3.96%* Hmean send-256 1014.13 ( 0.00%) 1034.63 * 2.02%* Hmean send-1024 3907.65 ( 0.00%) 4046.11 * 3.54%* Hmean send-2048 7492.93 ( 0.00%) 7754.85 * 3.50%* Hmean send-3312 11410.04 ( 0.00%) 11772.32 * 3.18%* Hmean send-4096 13521.95 ( 0.00%) 13912.34 * 2.89%* Hmean send-8192 21660.50 ( 0.00%) 22730.72 * 4.94%* Hmean send-16384 31902.32 ( 0.00%) 32637.50 * 2.30%*