From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: David Hildenbrand <david@redhat.com>, Barry Song <21cnbao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Huang Ying <ying.huang@intel.com>,
Mel Gorman <mgorman@techsingularity.net>,
Ryan Roberts <ryan.roberts@arm.com>,
Barry Song <v-songbaohua@oppo.com>,
Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Jonathan Corbet <corbet@lwn.net>, Yang Shi <shy828301@gmail.com>,
Yu Zhao <yuzhao@google.com>, <linux-mm@kvack.org>
Subject: Re: [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists
Date: Tue, 16 Apr 2024 16:06:33 +0800 [thread overview]
Message-ID: <e89b6f99-5af4-4112-98ca-10134181b783@huawei.com> (raw)
In-Reply-To: <e31cae86-fc32-443a-864c-993b8dfcfc02@redhat.com>
On 2024/4/16 15:03, David Hildenbrand wrote:
> On 16.04.24 07:26, Barry Song wrote:
>> On Tue, Apr 16, 2024 at 4:58 PM Kefeng Wang
>> <wangkefeng.wang@huawei.com> wrote:
>>>
>>>
>>>
>>> On 2024/4/16 12:50, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/4/16 8:21, Barry Song wrote:
>>>>> On Tue, Apr 16, 2024 at 12:18 AM Kefeng Wang
>>>>> <wangkefeng.wang@huawei.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/4/15 18:52, David Hildenbrand wrote:
>>>>>>> On 15.04.24 10:59, Kefeng Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/4/15 16:18, Barry Song wrote:
>>>>>>>>> On Mon, Apr 15, 2024 at 8:12 PM Kefeng Wang
>>>>>>>>> <wangkefeng.wang@huawei.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Both the file pages and anonymous pages support large folio,
>>>>>>>>>> high-order
>>>>>>>>>> pages except PMD_ORDER will also be allocated frequently which
>>>>>>>>>> could
>>>>>>>>>> increase the zone lock contention, allow high-order pages on pcp
>>>>>>>>>> lists
>>>>>>>>>> could reduce the big zone lock contention, but as commit
>>>>>>>>>> 44042b449872
>>>>>>>>>> ("mm/page_alloc: allow high-order pages to be stored on the
>>>>>>>>>> per-cpu
>>>>>>>>>> lists")
>>>>>>>>>> pointed, it may not win in all the scenes, add a new control
>>>>>>>>>> sysfs to
>>>>>>>>>> enable or disable specified high-order pages stored on PCP lists,
>>>>>>>>>> the order
>>>>>>>>>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP
>>>>>>>>>> list by
>>>>>>>>>> default.
>>>>>>>>>
>>>>>>>>> This is precisely something Baolin and I have discussed and
>>>>>>>>> intended
>>>>>>>>> to implement[1],
>>>>>>>>> but unfortunately, we haven't had the time to do so.
>>>>>>>>
>>>>>>>> Indeed, same thing. Recently, we are working on unixbench/lmbench
>>>>>>>> optimization, I tested Multi-size THP for anonymous memory by
>>>>>>>> hard-cord
>>>>>>>> PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some
>>>>>>>> improvement but
>>>>>>>> not for all cases and not very stable, so re-implemented it by
>>>>>>>> according
>>>>>>>> to the user requirement and enable it dynamically.
>>>>>>>
>>>>>>> I'm wondering, though, if this is really a suitable candidate for a
>>>>>>> sysctl toggle. Can anybody really come up with an educated guess for
>>>>>>> these values?
>>>>>>
>>>>>> Not sure this is suitable in sysctl, but mTHP anon is enabled in
>>>>>> sysctl,
>>>>>> we could trace __alloc_pages() and do order statistic to decide to
>>>>>> choose the high-order to be enabled on PCP.
>>>>>>
>>>>>>>
>>>>>>> Especially reading "Benchmarks Score shows a little
>>>>>>> improvoment(0.28%)"
>>>>>>> and "it may not win in all the scenes", to me it mostly sounds like
>>>>>>> "minimal impact" -- so who cares?
>>>>>>
>>>>>> Even though lock conflicts are eliminated, there is very limited
>>>>>> performance improvement(even maybe fluctuation), it is not a good
>>>>>> testcase to show improvement, just show the zone-lock issue, we
>>>>>> need to
>>>>>> find other better testcase, maybe some test on Andriod(heavy use
>>>>>> 64K, no
>>>>>> PMD THP), or LKP maybe give some help?
>>>>>>
>>>>>> I will try to find other testcase to show the benefit.
>>>>>
>>>>> Hi Kefeng,
>>>>>
>>>>> I wonder if you will see some major improvements on mTHP 64KiB using
>>>>> the below microbench I wrote just now, for example perf and time to
>>>>> finish the program
>>>>>
>>>>> #define DATA_SIZE (2UL * 1024 * 1024)
>>>>>
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>> /* make 32 concurrent alloc and free of mTHP */
>>>>> fork(); fork(); fork(); fork(); fork();
>>>>>
>>>>> for (int i = 0; i < 100000; i++) {
>>>>> void *addr = mmap(NULL, DATA_SIZE, PROT_READ |
>>>>> PROT_WRITE,
>>>>> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>> if (addr == MAP_FAILED) {
>>>>> perror("fail to malloc");
>>>>> return -1;
>>>>> }
>>>>> memset(addr, 0x11, DATA_SIZE);
>>>>> munmap(addr, DATA_SIZE);
>>>>> }
>>>>>
>>>>> return 0;
>>>>> }
>>>>>
>>>
>>> Rebased on next-20240415,
>>>
>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>> echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
>>>
>>> Compare with
>>> echo 0 >
>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled
>>> echo 1 >
>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled
>>>
>>>>
>>>> 1) PCP disabled
>>>> 1 2 3 4 5 average
>>>> real 200.41 202.18 203.16 201.54 200.91 201.64
>>>> user 6.49 6.21 6.25 6.31 6.35 6.322
>>>> sys 193.3 195.39 196.3 194.65 194.01 194.73
>>>>
>>>> 2) PCP enabled
>>>> real 198.25 199.26 195.51 199.28 189.12 196.284
>>>> -2.66%
>>>> user 6.21 6.02 6.02 6.28 6.21 6.148 -2.75%
>>>> sys 191.46 192.64 188.96 192.47 182.39 189.584
>>>> -2.64%
>>>>
>>>> for above test, time reduce 2.x%
>>
>> This is an improvement from 0.28%, but it's still below my expectations.
>
> Yes, it's noise. Maybe we need a system with more Cores/Sockets? But it
> does feel a bit like we're trying to come up with the problem after we
> have a solution; I'd have thought some existing benchmark could
> highlight if that is worth it.
96 core, with 129 threads, a quick test with pcp_enabled to control
hugepages-2048KB, it is no big improvement on 2M
PCP enabled
1 2 3 average
real 221.8 225.6 221.5 222.9666667
user 14.91 14.91 17.05 15.62333333
sys 141.91 159.25 156.23 152.4633333
PCP disabled
real 230.76 231.39 228.39 230.18
user 15.47 15.88 17.5 16.28333333
sys 159.07 162.32 159.09 160.16
From 44042b449872 ("mm/page_alloc: allow high-order pages to be stored
on the per-cpu lists"), it seems limited improve,
netperf-udp
5.13.0-rc2 5.13.0-rc2
mm-pcpburst-v3r4 mm-pcphighorder-v1r7
Hmean send-64 261.46 ( 0.00%) 266.30 * 1.85%*
Hmean send-128 516.35 ( 0.00%) 536.78 * 3.96%*
Hmean send-256 1014.13 ( 0.00%) 1034.63 * 2.02%*
Hmean send-1024 3907.65 ( 0.00%) 4046.11 * 3.54%*
Hmean send-2048 7492.93 ( 0.00%) 7754.85 * 3.50%*
Hmean send-3312 11410.04 ( 0.00%) 11772.32 * 3.18%*
Hmean send-4096 13521.95 ( 0.00%) 13912.34 * 2.89%*
Hmean send-8192 21660.50 ( 0.00%) 22730.72 * 4.94%*
Hmean send-16384 31902.32 ( 0.00%) 32637.50 * 2.30%*
prev parent reply other threads:[~2024-04-16 8:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-15 8:12 Kefeng Wang
2024-04-15 8:12 ` [PATCH rfc 1/3] mm: prepare more high-order pages to be stored on the per-cpu lists Kefeng Wang
2024-04-15 11:41 ` Baolin Wang
2024-04-15 12:25 ` Kefeng Wang
2024-04-15 8:12 ` [PATCH rfc 2/3] mm: add control to allow specified high-order pages stored on PCP list Kefeng Wang
2024-04-15 8:12 ` [PATCH rfc 3/3] mm: pcp: show per-order pages count Kefeng Wang
2024-04-15 8:18 ` [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists Barry Song
2024-04-15 8:59 ` Kefeng Wang
2024-04-15 10:52 ` David Hildenbrand
2024-04-15 11:14 ` Barry Song
2024-04-15 12:17 ` Kefeng Wang
2024-04-16 0:21 ` Barry Song
2024-04-16 4:50 ` Kefeng Wang
2024-04-16 4:58 ` Kefeng Wang
2024-04-16 5:26 ` Barry Song
2024-04-16 7:03 ` David Hildenbrand
2024-04-16 8:06 ` Kefeng Wang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e89b6f99-5af4-4112-98ca-10134181b783@huawei.com \
--to=wangkefeng.wang@huawei.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox