From: Tim Chen <tim.c.chen@linux.intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Honglei Wang <honglei.wang@oracle.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@intel.com>
Subject: memcgroup lruvec_lru_size scaling issue
Date: Mon, 14 Oct 2019 10:17:45 -0700 [thread overview]
Message-ID: <a64eecf1-81d4-371f-ff6d-1cb057bd091c@linux.intel.com> (raw)
We were running a database benchmark in mem cgroup and found that lruvec_lru_size
is taking up a huge chunk of CPU cycles (about 25% of our kernel time) on 5.3 kernel.
The main issue is the loop in lruvec_page_state_local called by lruvec_lru_size
in the mem cgroup path:
for_each_possible_cpu(cpu)
x += per_cpu(pn->lruvec_stat_local->count[idx], cpu);
It is costly looping through all the cpus to get the lru vec size info.
And doing this on our workload with 96 cpu threads and 500 mem cgroups
makes things much worse. We might end up having 96 cpus * 500 cgroups * 2 (main) LRUs pagevecs,
which is a lot of data structures to be running through all the time.
Hongwei's patch
(https://lore.kernel.org/linux-mm/991b4719-a2a0-9efe-de02-56a928752fe3@oracle.com/)
restores the previous method for computing lru_size and is much more efficient in getting the lru_size.
We got a 20% throughput improvement in our database benchmark with Hongwei's patch, and
lruvec_lru_size's cpu overhead completely disappeared from the cpu profile.
We'll like to see Hongwei's patch getting merged.
The problem can also be reproduced by running simple multi-threaded pmbench benchmark
with a fast Optane SSD swap (see profile below).
6.15% 3.08% pmbench [kernel.vmlinux] [k] lruvec_lru_size
|
|--3.07%--lruvec_lru_size
| |
| |--2.11%--cpumask_next
| | |
| | --1.66%--find_next_bit
| |
| --0.57%--call_function_interrupt
| |
| --0.55%--smp_call_function_interrupt
|
|--1.59%--0x441f0fc3d009
| _ops_rdtsc_init_base_freq
| access_histogram
| page_fault
| __do_page_fault
| handle_mm_fault
| __handle_mm_fault
| |
| --1.54%--do_swap_page
| swapin_readahead
| swap_cluster_readahead
| |
| --1.53%--read_swap_cache_async
| __read_swap_cache_async
| alloc_pages_vma
| __alloc_pages_nodemask
| __alloc_pages_slowpath
| try_to_free_pages
| do_try_to_free_pages
| shrink_node
| shrink_node_memcg
| |
| |--0.77%--lruvec_lru_size
| |
| --0.76%--inactive_list_is_low
| |
| --0.76%--lruvec_lru_size
|
--1.50%--measure_read
page_fault
__do_page_fault
handle_mm_fault
__handle_mm_fault
do_swap_page
swapin_readahead
swap_cluster_readahead
|
--1.48%--read_swap_cache_async
__read_swap_cache_async
alloc_pages_vma
__alloc_pages_nodemask
__alloc_pages_slowpath
try_to_free_pages
do_try_to_free_pages
shrink_node
shrink_node_memcg
|
|--0.75%--inactive_list_is_low
| |
| --0.75%--lruvec_lru_size
|
--0.73%--lruvec_lru_size
Thanks.
Tim
next reply other threads:[~2019-10-14 17:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-14 17:17 Tim Chen [this message]
2019-10-14 17:37 ` Michal Hocko
2019-10-14 17:49 ` Dave Hansen
2019-10-14 17:59 ` Michal Hocko
2019-10-14 18:06 ` Tim Chen
2019-10-14 18:31 ` Michal Hocko
2019-10-14 22:14 ` Andrew Morton
2019-10-15 6:19 ` Michal Hocko
2019-10-15 20:38 ` Andrew Morton
2019-10-16 7:25 ` Michal Hocko
2019-10-15 18:23 ` Tim Chen
2019-10-14 18:11 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a64eecf1-81d4-371f-ff6d-1cb057bd091c@linux.intel.com \
--to=tim.c.chen@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hannes@cmpxchg.org \
--cc=honglei.wang@oracle.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox