From: Qi Zheng <zhengqi.arch@bytedance.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: tkhai@ya.ru, hannes@cmpxchg.org, shakeelb@google.com,
mhocko@kernel.org, roman.gushchin@linux.dev,
muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com,
sultan@kerneltoast.com, dave@stgolabs.net,
penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/8] make slab shrink lockless
Date: Mon, 27 Feb 2023 21:31:51 +0800 [thread overview]
Message-ID: <b7e8929c-8fd5-a248-f8a8-d9177fc01b4b@bytedance.com> (raw)
In-Reply-To: <20230226115100.7e12bda7931dd65dbabcebe3@linux-foundation.org>
On 2023/2/27 03:51, Andrew Morton wrote:
> On Sun, 26 Feb 2023 22:46:47 +0800 Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>
>> Hi all,
>>
>> This patch series aims to make slab shrink lockless.
>
> What an awesome changelog.
>
>> 2. Survey
>> =========
>
> Especially this part.
>
> Looking through all the prior efforts and at this patchset I am not
> immediately seeing any statements about the overall effect upon
> real-world workloads. For a good example, does this patchset
> measurably improve throughput or energy consumption on your servers?
Hi Andrew,
I re-tested with the following physical machines:
Architecture: x86_64
CPU(s): 96
On-line CPU(s) list: 0-95
Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
I found that the reason for the hotspot I described in cover letter is
wrong. The reason for the down_read_trylock() hotspot is not because of
the failure to trylock, but simply because of the atomic operation
(cmpxchg). And this will lead to a significant reduction in IPC (insn
per cycle).
To verify this, I did the following tests:
1. Run the following script to create down_read_trylock() hotspots:
```
#!/bin/bash
DIR="/root/shrinker/memcg/mnt"
do_create()
{
mkdir -p /sys/fs/cgroup/memory/test
mkdir -p /sys/fs/cgroup/perf_event/test
echo 4G > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
for i in `seq 0 $1`;
do
mkdir -p /sys/fs/cgroup/memory/test/$i;
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs;
mkdir -p $DIR/$i;
done
}
do_mount()
{
for i in `seq $1 $2`;
do
mount -t tmpfs $i $DIR/$i;
done
}
do_touch()
{
for i in `seq $1 $2`;
do
echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs;
echo $$ > /sys/fs/cgroup/perf_event/test/cgroup.procs;
dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 &
done
}
case "$1" in
touch)
do_touch $2 $3
;;
test)
do_create 4000
do_mount 0 4000
do_touch 0 3000
;;
*)
exit 1
;;
esac
```
Save the above script, then run test and touch commands.
Then we can use the following perf command to view hotspots:
perf top -U -F 999
1) Before applying this patchset:
32.31% [kernel] [k] down_read_trylock
19.40% [kernel] [k] pv_native_safe_halt
16.24% [kernel] [k] up_read
15.70% [kernel] [k] shrink_slab
4.69% [kernel] [k] _find_next_bit
2.62% [kernel] [k] shrink_node
1.78% [kernel] [k] shrink_lruvec
0.76% [kernel] [k] do_shrink_slab
2) After applying this patchset:
27.83% [kernel] [k] _find_next_bit
16.97% [kernel] [k] shrink_slab
15.82% [kernel] [k] pv_native_safe_halt
9.58% [kernel] [k] shrink_node
8.31% [kernel] [k] shrink_lruvec
5.64% [kernel] [k] do_shrink_slab
3.88% [kernel] [k] mem_cgroup_iter
2. At the same time, we use the following perf command to capture IPC
information:
perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10
1) Before applying this patchset:
Performance counter stats for 'system wide' (5 runs):
454187219766 cycles test
( +- 1.84% )
78896433101 instructions test # 0.17 insn
per cycle ( +- 0.44% )
10.0020430 +- 0.0000366 seconds time elapsed ( +- 0.00% )
2) After applying this patchset:
Performance counter stats for 'system wide' (5 runs):
841954709443 cycles test
( +- 15.80% ) (98.69%)
527258677936 instructions test # 0.63 insn
per cycle ( +- 15.11% ) (98.68%)
10.01064 +- 0.00831 seconds time elapsed ( +- 0.08% )
We can see that IPC drops very seriously when calling
down_read_trylock() at high frequency. After using SRCU,
the IPC is at a normal level.
Thanks,
Qi
>
>
next prev parent reply other threads:[~2023-02-27 13:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-26 14:46 Qi Zheng
2023-02-26 14:46 ` [PATCH v3 1/8] mm: vmscan: add a map_nr_max field to shrinker_info Qi Zheng
2023-02-26 14:54 ` Qi Zheng
2023-02-26 14:46 ` [PATCH v3 2/8] mm: vmscan: make global slab shrink lockless Qi Zheng
2023-02-26 14:46 ` [PATCH v3 3/8] mm: vmscan: make memcg " Qi Zheng
2023-02-26 14:46 ` [PATCH v3 4/8] mm: vmscan: add shrinker_srcu_generation Qi Zheng
2023-02-26 14:46 ` [PATCH v3 5/8] mm: shrinkers: make count and scan in shrinker debugfs lockless Qi Zheng
2023-02-26 14:46 ` [PATCH v3 6/8] mm: vmscan: hold write lock to reparent shrinker nr_deferred Qi Zheng
2023-02-26 14:46 ` [PATCH v3 7/8] mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers() Qi Zheng
2023-02-26 14:46 ` [PATCH v3 8/8] mm: shrinkers: convert shrinker_rwsem to mutex Qi Zheng
2023-02-26 19:51 ` [PATCH v3 0/8] make slab shrink lockless Andrew Morton
2023-02-27 13:31 ` Qi Zheng [this message]
2023-02-27 15:08 ` Mike Rapoport
2023-02-27 19:20 ` Kirill Tkhai
2023-02-27 19:32 ` Roman Gushchin
2023-02-27 19:47 ` Kirill Tkhai
2023-02-28 10:08 ` Qi Zheng
2023-02-28 10:04 ` Qi Zheng
2023-02-28 10:53 ` Qi Zheng
2023-02-28 18:40 ` Michal Hocko
2023-03-01 2:27 ` Qi Zheng
2023-02-27 19:02 ` Roman Gushchin
2023-02-28 10:11 ` Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7e8929c-8fd5-a248-f8a8-d9177fc01b4b@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=paulmck@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
--cc=sultan@kerneltoast.com \
--cc=tkhai@ya.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox