linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: David Hildenbrand <david@redhat.com>, Barry Song <21cnbao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Huang Ying <ying.huang@intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Barry Song <v-songbaohua@oppo.com>,
	Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>, Yang Shi <shy828301@gmail.com>,
	Yu Zhao <yuzhao@google.com>, <linux-mm@kvack.org>
Subject: Re: [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists
Date: Tue, 16 Apr 2024 16:06:33 +0800	[thread overview]
Message-ID: <e89b6f99-5af4-4112-98ca-10134181b783@huawei.com> (raw)
In-Reply-To: <e31cae86-fc32-443a-864c-993b8dfcfc02@redhat.com>



On 2024/4/16 15:03, David Hildenbrand wrote:
> On 16.04.24 07:26, Barry Song wrote:
>> On Tue, Apr 16, 2024 at 4:58 PM Kefeng Wang 
>> <wangkefeng.wang@huawei.com> wrote:
>>>
>>>
>>>
>>> On 2024/4/16 12:50, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/4/16 8:21, Barry Song wrote:
>>>>> On Tue, Apr 16, 2024 at 12:18 AM Kefeng Wang
>>>>> <wangkefeng.wang@huawei.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/4/15 18:52, David Hildenbrand wrote:
>>>>>>> On 15.04.24 10:59, Kefeng Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/4/15 16:18, Barry Song wrote:
>>>>>>>>> On Mon, Apr 15, 2024 at 8:12 PM Kefeng Wang
>>>>>>>>> <wangkefeng.wang@huawei.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Both the file pages and anonymous pages support large folio,
>>>>>>>>>> high-order
>>>>>>>>>> pages except PMD_ORDER will also be allocated frequently which 
>>>>>>>>>> could
>>>>>>>>>> increase the zone lock contention, allow high-order pages on pcp
>>>>>>>>>> lists
>>>>>>>>>> could reduce the big zone lock contention, but as commit
>>>>>>>>>> 44042b449872
>>>>>>>>>> ("mm/page_alloc: allow high-order pages to be stored on the 
>>>>>>>>>> per-cpu
>>>>>>>>>> lists")
>>>>>>>>>> pointed, it may not win in all the scenes, add a new control
>>>>>>>>>> sysfs to
>>>>>>>>>> enable or disable specified high-order pages stored on PCP lists,
>>>>>>>>>> the order
>>>>>>>>>> (PAGE_ALLOC_COSTLY_ORDER, PMD_ORDER) won't be stored on PCP 
>>>>>>>>>> list by
>>>>>>>>>> default.
>>>>>>>>>
>>>>>>>>> This is precisely something Baolin and I have discussed and 
>>>>>>>>> intended
>>>>>>>>> to implement[1],
>>>>>>>>> but unfortunately, we haven't had the time to do so.
>>>>>>>>
>>>>>>>> Indeed, same thing. Recently, we are working on unixbench/lmbench
>>>>>>>> optimization, I tested Multi-size THP for anonymous memory by
>>>>>>>> hard-cord
>>>>>>>> PAGE_ALLOC_COSTLY_ORDER from 3 to 4[1], it shows some 
>>>>>>>> improvement but
>>>>>>>> not for all cases and not very stable, so re-implemented it by
>>>>>>>> according
>>>>>>>> to the user requirement and enable it dynamically.
>>>>>>>
>>>>>>> I'm wondering, though, if this is really a suitable candidate for a
>>>>>>> sysctl toggle. Can anybody really come up with an educated guess for
>>>>>>> these values?
>>>>>>
>>>>>> Not sure this is suitable in sysctl, but mTHP anon is enabled in 
>>>>>> sysctl,
>>>>>> we could trace __alloc_pages() and do order statistic to decide to
>>>>>> choose the high-order to be enabled on PCP.
>>>>>>
>>>>>>>
>>>>>>> Especially reading "Benchmarks Score shows a little 
>>>>>>> improvoment(0.28%)"
>>>>>>> and "it may not win in all the scenes", to me it mostly sounds like
>>>>>>> "minimal impact" -- so who cares?
>>>>>>
>>>>>> Even though lock conflicts are eliminated, there is very limited
>>>>>> performance improvement(even maybe fluctuation), it is not a good
>>>>>> testcase to show improvement, just show the zone-lock issue, we 
>>>>>> need to
>>>>>> find other better testcase, maybe some test on Andriod(heavy use 
>>>>>> 64K, no
>>>>>> PMD THP), or LKP maybe give some help?
>>>>>>
>>>>>> I will try to find other testcase to show the benefit.
>>>>>
>>>>> Hi Kefeng,
>>>>>
>>>>> I wonder if you will see some major improvements on mTHP 64KiB using
>>>>> the below microbench I wrote just now, for example perf and time to
>>>>> finish the program
>>>>>
>>>>> #define DATA_SIZE (2UL * 1024 * 1024)
>>>>>
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>>           /* make 32 concurrent alloc and free of mTHP */
>>>>>           fork(); fork(); fork(); fork(); fork();
>>>>>
>>>>>           for (int i = 0; i < 100000; i++) {
>>>>>                   void *addr = mmap(NULL, DATA_SIZE, PROT_READ |
>>>>> PROT_WRITE,
>>>>>                                   MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>>                   if (addr == MAP_FAILED) {
>>>>>                           perror("fail to malloc");
>>>>>                           return -1;
>>>>>                   }
>>>>>                   memset(addr, 0x11, DATA_SIZE);
>>>>>                   munmap(addr, DATA_SIZE);
>>>>>           }
>>>>>
>>>>>           return 0;
>>>>> }
>>>>>
>>>
>>> Rebased on next-20240415,
>>>
>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>> echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
>>>
>>> Compare with
>>>     echo 0 > 
>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled
>>>     echo 1 > 
>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/pcp_enabled
>>>
>>>>
>>>> 1) PCP disabled
>>>>       1    2    3    4    5    average
>>>> real    200.41    202.18    203.16    201.54    200.91    201.64
>>>> user    6.49    6.21    6.25    6.31    6.35    6.322
>>>> sys     193.3    195.39    196.3    194.65    194.01    194.73
>>>>
>>>> 2) PCP enabled
>>>> real    198.25    199.26    195.51    199.28    189.12    196.284
>>>> -2.66%
>>>> user    6.21    6.02    6.02    6.28    6.21    6.148       -2.75%
>>>> sys     191.46    192.64    188.96    192.47    182.39    189.584
>>>> -2.64%
>>>>
>>>> for above test, time reduce 2.x%
>>
>> This is an improvement from 0.28%, but it's still below my expectations.
> 
> Yes, it's noise. Maybe we need a system with more Cores/Sockets? But it 
> does feel a bit like we're trying to come up with the problem after we 
> have a solution; I'd have thought some existing benchmark could 
> highlight if that is worth it.


96 core, with 129 threads, a quick test with pcp_enabled to control
hugepages-2048KB, it is no big improvement on 2M

PCP enabled
	1	2	3	average
real	221.8	225.6	221.5	222.9666667
user	14.91	14.91	17.05	15.62333333
sys 	141.91	159.25	156.23	152.4633333
				
PCP disabled				
real	230.76	231.39	228.39	230.18
user	15.47	15.88	17.5	16.28333333
sys 	159.07	162.32	159.09	160.16


 From 44042b449872 ("mm/page_alloc: allow high-order pages to be stored
on the per-cpu lists"), it seems limited improve,

  netperf-udp
                                   5.13.0-rc2             5.13.0-rc2
                             mm-pcpburst-v3r4   mm-pcphighorder-v1r7
  Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
  Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
  Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
  Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
  Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
  Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
  Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
  Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
  Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*





      reply	other threads:[~2024-04-16  8:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-15  8:12 Kefeng Wang
2024-04-15  8:12 ` [PATCH rfc 1/3] mm: prepare more high-order pages to be stored on the per-cpu lists Kefeng Wang
2024-04-15 11:41   ` Baolin Wang
2024-04-15 12:25     ` Kefeng Wang
2024-04-15  8:12 ` [PATCH rfc 2/3] mm: add control to allow specified high-order pages stored on PCP list Kefeng Wang
2024-04-15  8:12 ` [PATCH rfc 3/3] mm: pcp: show per-order pages count Kefeng Wang
2024-04-15  8:18 ` [PATCH rfc 0/3] mm: allow more high-order pages stored on PCP lists Barry Song
2024-04-15  8:59   ` Kefeng Wang
2024-04-15 10:52     ` David Hildenbrand
2024-04-15 11:14       ` Barry Song
2024-04-15 12:17       ` Kefeng Wang
2024-04-16  0:21         ` Barry Song
2024-04-16  4:50           ` Kefeng Wang
2024-04-16  4:58             ` Kefeng Wang
2024-04-16  5:26               ` Barry Song
2024-04-16  7:03                 ` David Hildenbrand
2024-04-16  8:06                   ` Kefeng Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e89b6f99-5af4-4112-98ca-10134181b783@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox