From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C437C4345F for ; Tue, 16 Apr 2024 05:26:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAA636B007B; Tue, 16 Apr 2024 01:26:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5B526B0082; Tue, 16 Apr 2024 01:26:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C22CF6B0083; Tue, 16 Apr 2024 01:26:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9F19D6B007B for ; Tue, 16 Apr 2024 01:26:58 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5D26EA11F8 for ; Tue, 16 Apr 2024 05:26:58 +0000 (UTC) X-FDA: 82014260916.15.F48071B Received: from mail-vk1-f182.google.com (mail-vk1-f182.google.com [209.85.221.182]) by imf12.hostedemail.com (Postfix) with ESMTP id 978554000D for ; Tue, 16 Apr 2024 05:26:56 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MG1dJvcZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713245216; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cuB20xGVqJn966YyNbrmVvzGgQzSGSdIhQUqwkTAmvI=; b=peqyhkyEgXWFijwpN5s+zpbQZPwUjsShfCj4vpMtSZf6CC+A51E0QKiDnJ6MepI8ROlCgD Ryy8XPtbJtWaRoB0pa0NZwuclydwF/UOAX5uIZJkaH3BnHSWMkGuyKWiCasd8dGQc6Gzca YYq7Y5UpQFwCMaEPyzTiLJN0BoO9Gkc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MG1dJvcZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713245216; a=rsa-sha256; cv=none; b=ItnEEFp4cIMdmatjMW4ZOeJYz0QMTHdkbO0IWPqgbXU95mU6Bg7TcmEINePgI+b4Wqffmf DPElE1hXot48xhm1bNRDc9Tzl5Qpa5jYfj88B7xnXFGldr6PB0Dcc4EH6MdPj+k6NmNB9+ EgiVSvTCfByevUieQ6B9vS910B9YpqA= Received: by mail-vk1-f182.google.com with SMTP id 71dfb90a1353d-4daab3cf090so2155483e0c.0 for ; Mon, 15 Apr 2024 22:26:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713245216; x=1713850016; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cuB20xGVqJn966YyNbrmVvzGgQzSGSdIhQUqwkTAmvI=; b=MG1dJvcZYntZ0luNziKHTar8Oxjt8VM74V+7oVpbfr5qGlbVc/t4BLfpgYAvOP5KFp K+B8NZynziUVecLAMJq4ck3R8hm+4LeqPkUcfutVD/EyUZBrSnNss8MTwtn9CjIQkD7a FNiuKd8pkvZu0GaZq+1bQXhPZ7FOAj/vQfntgKf2tRmghX9liyCYDKIHDhGe0nhb2Eem jslZqPJqXwqf2UpXkWZj8ymZ4iBbvK4nj2v0oggCX5WyernCcywMW31FUKOeb/gamC/Y blDIBoVxy2iDY+bNWUXGogbLOKxGIO6fF9ykugpTbJCJpb/+U+JZ88m4uf/tQ1PPcEWC wneQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713245216; x=1713850016; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cuB20xGVqJn966YyNbrmVvzGgQzSGSdIhQUqwkTAmvI=; b=DyJiEdT/UmtNkGFJe5YxN3/Stdc8MhqKHvLRtM6m1F2kVyAn/NoacLmdH3MaiT6qBC 9j2qp6CH3COu5+XOnpcUMUWsmJHEo5FEsrNEJwpU93T0NwaFyMJHS7TV/YBnduxzEyIz 7Jl5CdSMVMN0rOn5wPPSvOsc+RRLBNM6M7vmlDjvk83aM4JI4droTgOMgnPkgvLSSBhG GVZ8pOJ/YBK99xt45fjfjUt7nf8dItpqnYx++gq0z4p+htz1KkebY3rc7xhbtHgXNXqn SZlakweC6u6WmZYKWW8H+SpLYCd/D/egzCJvwCT+bcHEDrcl+2CAiuhumJcHvaKY5yPZ ulMA== X-Forwarded-Encrypted: i=1; AJvYcCXV0FoRLfag3Uqk84JS2jthg2WkflxVvPYzq7VQTbV7/+ZJ9Pp/yptfO1QmmuH/v7ASePtZBXGcHGCZNUdLhx0zcgw= X-Gm-Message-State: AOJu0YxmUS9fxtHzjnzQtsyBHKLp1xFE+DcFdy7zm2DpbG3ysY51BlH2 5nWqH85UNkT0e1pyuQbwPK3byvaQUsFy/Ydpaqi8tawsXgqe6Q2Q0+1pY42NbRpjmG/veXiBn01 IeH+ygtaS2O2lQWGBT3b3YqtUkO8= X-Google-Smtp-Source: AGHT+IG8sBFYvlPpMESFoa6gq9Yn6lHxJ4ODJ2dB0S90PjIZPuXeiFIqL1wKLgvsDXfOeKA3RE5RjrSGjIu8agmd6iQ= X-Received: by 2002:a05:6122:2220:b0:4da:bae9:4449 with SMTP id bb32-20020a056122222000b004dabae94449mr497625vkb.4.1713245215619; Mon, 15 Apr 2024 22:26:55 -0700 (PDT) MIME-Version: 1.0 References: <20240415081220.3246839-1-wangkefeng.wang@huawei.com> <54623c8c-a94f-4f88-bf53-5f92c634f78a@huawei.com> <3b931621-7cd1-4df8-9070-535ecaee970e@redhat.com> <90501d59-e3f2-4ac4-9e42-4eca3bb0a91b@huawei.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 16 Apr 2024 17:26:44 +1200 Message-ID: Subject: Re: [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists To: Kefeng Wang Cc: David Hildenbrand , Andrew Morton , Huang Ying , Mel Gorman , Ryan Roberts , Barry Song , Vlastimil Babka , Zi Yan , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Yang Shi , Yu Zhao , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 978554000D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 4f7ns7x5pekbzq4qgxfe3pwzcf9onagy X-HE-Tag: 1713245216-281744 X-HE-Meta: U2FsdGVkX1+25FIjiNlzNFNDqfA3CKhtkhvkD/BeZBOWGazd3rsarWGO+wVnG6ZV9oAxpt6BCggI4H8fUuCp+NaT/tARZDbs18r5gHWLm4inbHUDDmw7S8oL89Aw8ragIJvSnsBvdSGfcYWHk6G/5C3+QmkfTLPEU+BoXtFgv1tH2oHadilFzNZ4dUPYY1OXg8voYPj/aVoEfvqFid8M19xAyjOgJvpUPvX70UJlpR+t6y2kXMaX4zLc16Wqw28GACxWOq3mBEr8r5mEzI4Xolz1GdI9Aq4XpWRAAGZKKkNhWDTtDiVCqgpMwz+4hXO3zg98IrDMg5+igl+Ob+FPZs/rFxS8B9ShKz/AupsRXrPDkvW48/nxRjcKy12PWQO+wYWBtEvBUlUaSWNqwHr5kIAwo6z1Vr839hKJ57ZsN6mWox5lqTZdHCWdflvOcTyXQtmWwoklhE1nIqLFNqw0lzKEJrhqyS5Su5ZiLNHRR3RiKPDXb56dVLJzAsmcqdK6wBxPJTGMPqT7q0ciJaHMthkr+1FGxCvzUULaElb0VvWP04vIeSugpF1pD8SLrcSycLuCnasVpLErvQjPYDc+hhzbGwPHcoKIotaRmnkeP4hCn3QBHUN4rt7uEXIOiS+cjDnhi24cdLU9pgOyjdfMdmKcjKQTkuDey44Pkvyv3HrLDNdDH273yG1Yx+HNdfOobOhARLraZWsO3S1itNxEQ4kLE1AfeHI+Sa2UTPVfShFhbqaLv4Nk54QT1oZ/bNkU9JbNw6ldRdCu1ddWkjPb+ZL1/FfQvYEy7npJI0uuK3pv3VwjnZse0L0yH2VUywqQjyOQbY3oenEPbRq5UIIo4/EEep553GZOA6Ir360Uo+/s09CfMU2D72X35w8Uo9ouFX0lZ9CtmmC+WXpbQ0X+7P3A+gBtKD8WnIIwBvgb6QDrxOxeMgC0+0v7Jh+yeBQRYhS1Dyua/KMDsX5Vo3k BF7/vmAk tIBK9ikOqwxAYLU/lH20SWYGGlSjPADW4eH9VxetanBJcubpI6vaGRuWGYPO5KOYPynGrVX0PrzJ+hQainpsHgv/h9m37zO+koRrvmDMHWGY1OlnR0gcnew43TT8Zp/1LeIPY4M5Nd9woK+DnT7W00cazEHo3tmJludFh3f1PNoa8UXOxaJhhBBJsFh0Av0sq7st8l8iHrE248kwHGJ/YPxLHFNNITMH+yGb2uYlU+c1Rz+Ja3z1/LYbXHR2rrlUtfPIHjv1BAdpMeX60/724U7Ye1tBdMTf6j9DHAj+hxOk9xcZTuISdCtvAI15CFBGuOve3P7ao2APazN0+P1EziX8qnibjDQkm60sdd5cLaBjGgnRjIyX5v1oiLtx949jYQwIj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 16, 2024 at 4:58=E2=80=AFPM Kefeng Wang wrote: > > > > On 2024/4/16 12:50, Kefeng Wang wrote: > > > > > > On 2024/4/16 8:21, Barry Song wrote: > >> On Tue, Apr 16, 2024 at 12:18=E2=80=AFAM Kefeng Wang > >> wrote: > >>> > >>> > >>> > >>> On 2024/4/15 18:52, David Hildenbrand wrote: > >>>> On 15.04.24 10:59, Kefeng Wang wrote: > >>>>> > >>>>> > >>>>> On 2024/4/15 16:18, Barry Song wrote: > >>>>>> On Mon, Apr 15, 2024 at 8:12=E2=80=AFPM Kefeng Wang > >>>>>> wrote: > >>>>>>> > >>>>>>> Both the file pages and anonymous pages support large folio, > >>>>>>> high-order > >>>>>>> pages except PMD_ORDER will also be allocated frequently which co= uld > >>>>>>> increase the zone lock contention, allow high-order pages on pcp > >>>>>>> lists > >>>>>>> could reduce the big zone lock contention, but as commit > >>>>>>> 44042b449872 > >>>>>>> ("mm/page_alloc: allow high-order pages to be stored on the per-c= pu > >>>>>>> lists") > >>>>>>> pointed, it may not win in all the scenes, add a new control > >>>>>>> sysfs to > >>>>>>> enable or disable specified high-order pages stored on PCP lists, > >>>>>>> the order > >>>>>>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP list = by > >>>>>>> default. > >>>>>> > >>>>>> This is precisely something Baolin and I have discussed and intend= ed > >>>>>> to implement[1], > >>>>>> but unfortunately, we haven't had the time to do so. > >>>>> > >>>>> Indeed, same thing. Recently, we are working on unixbench/lmbench > >>>>> optimization, I tested Multi-size THP for anonymous memory by > >>>>> hard-cord > >>>>> PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some improvement b= ut > >>>>> not for all cases and not very stable, so re-implemented it by > >>>>> according > >>>>> to the user requirement and enable it dynamically. > >>>> > >>>> I'm wondering, though, if this is really a suitable candidate for a > >>>> sysctl toggle. Can anybody really come up with an educated guess for > >>>> these values? > >>> > >>> Not sure this is suitable in sysctl, but mTHP anon is enabled in sysc= tl, > >>> we could trace __alloc_pages() and do order statistic to decide to > >>> choose the high-order to be enabled on PCP. > >>> > >>>> > >>>> Especially reading "Benchmarks Score shows a little improvoment(0.28= %)" > >>>> and "it may not win in all the scenes", to me it mostly sounds like > >>>> "minimal impact" -- so who cares? > >>> > >>> Even though lock conflicts are eliminated, there is very limited > >>> performance improvement(even maybe fluctuation), it is not a good > >>> testcase to show improvement, just show the zone-lock issue, we need = to > >>> find other better testcase, maybe some test on Andriod(heavy use 64K,= no > >>> PMD THP), or LKP maybe give some help? > >>> > >>> I will try to find other testcase to show the benefit. > >> > >> Hi Kefeng, > >> > >> I wonder if you will see some major improvements on mTHP 64KiB using > >> the below microbench I wrote just now, for example perf and time to > >> finish the program > >> > >> #define DATA_SIZE (2UL * 1024 * 1024) > >> > >> int main(int argc, char **argv) > >> { > >> /* make 32 concurrent alloc and free of mTHP */ > >> fork(); fork(); fork(); fork(); fork(); > >> > >> for (int i =3D 0; i < 100000; i++) { > >> void *addr =3D mmap(NULL, DATA_SIZE, PROT_READ | > >> PROT_WRITE, > >> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > >> if (addr =3D=3D MAP_FAILED) { > >> perror("fail to malloc"); > >> return -1; > >> } > >> memset(addr, 0x11, DATA_SIZE); > >> munmap(addr, DATA_SIZE); > >> } > >> > >> return 0; > >> } > >> > > Rebased on next-20240415, > > echo never > /sys/kernel/mm/transparent_hugepage/enabled > echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled > > Compare with > echo 0 > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enable= d > echo 1 > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enable= d > > > > > 1) PCP disabled > > 1 2 3 4 5 average > > real 200.41 202.18 203.16 201.54 200.91 201.64 > > user 6.49 6.21 6.25 6.31 6.35 6.322 > > sys 193.3 195.39 196.3 194.65 194.01 194.73 > > > > 2) PCP enabled > > real 198.25 199.26 195.51 199.28 189.12 196.284 > > -2.66% > > user 6.21 6.02 6.02 6.28 6.21 6.148 -2.75% > > sys 191.46 192.64 188.96 192.47 182.39 189.584 > > -2.64% > > > > for above test, time reduce 2.x% This is an improvement from 0.28%, but it's still below my expectations. I suspect it's due to mTHP reducing the frequency of allocations and frees. Running the same test on order-0 might yield much better results. I suppose that as the order increases, PCP exhibits fewer improvements since both allocation and release activities decrease. Conversely, we also employ PCP for THP (2MB). Do we have any data demonstrating that such large-size allocations can benefit from PCP before ? > > > > > > And re-test page_fault1(anon) from will-it-scale > > > > 1) PCP enabled > > tasks processes processes_idle threads threads_idle line= ar > > 0 0 100 0 100 0 > > 1 1416915 98.95 1418128 98.95 1418128 > > 20 5327312 79.22 3821312 94.36 28362560 > > 40 9437184 58.58 4463657 94.55 56725120 > > 60 8120003 38.16 4736716 94.61 85087680 > > 80 7356508 18.29 4847824 94.46 113450240 > > 100 7256185 1.48 4870096 94.61 141812800 > > > > 2) PCP disabled > > tasks processes processes_idle threads threads_idle line= ar > > 0 0 100 0 100 0 > > 1 1365398 98.95 1354502 98.95 1365398 > > 20 5174918 79.22 3722368 94.65 27307960 > > 40 9094265 58.58 4427267 94.82 54615920 > > 60 8021606 38.18 4572896 94.93 81923880 > > 80 7497318 18.2 4637062 94.76 109231840 > > 100 6819897 1.47 4654521 94.63 136539800 > > > > ------------------------------------ > > 1) vs 2) pcp enabled improve 3.86% > > > > 3) PCP re-enabled > > tasks processes processes_idle threads threads_idle line= ar > > 0 0 100 0 100 0 > > 1 1419036 98.96 1428403 98.95 1428403 > > 20 5356092 79.23 3851849 94.41 28568060 > > 40 9437184 58.58 4512918 94.63 57136120 > > 60 8252342 38.16 4659552 94.68 85704180 > > 80 7414899 18.26 4790576 94.77 114272240 > > 100 7062902 1.46 4759030 94.64 142840300 > > > > 4) PCP re-disabled > > tasks processes processes_idle threads threads_idle line= ar > > 0 0 100 0 100 0 > > 1 1352649 98.95 1354806 98.95 1354806 > > 20 5172924 79.22 3719292 94.64 27096120 > > 40 9174505 58.59 4310649 94.93 54192240 > > 60 8021606 38.17 4552960 94.81 81288360 > > 80 7497318 18.18 4671638 94.81 108384480 > > 100 6823926 1.47 4725955 94.64 135480600 > > > > ------------------------------------ > > 3) vs 4) pcp enabled improve 5.43% > > > > Average: 4.645% > > > > > > > > > > > >>> > >>>> > >>>> How much is the cost vs. benefit of just having one sane system > >>>> configuration? > >>>> > >>> > >>> For arm64 with 4k, five more high-orders(4~8), five more pcplists, > >>> and for high-orders, we assumes most of them are moveable, but maybe > >>> not, so enable it by default maybe more fragmentization, see > >>> 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized > >>> allocations"). > >>> Thanks Barry