From: Kairui Song <ryncsn@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm: memcontrol: make cgroup_memory_noswap a static key
Date: Tue, 30 Aug 2022 16:50:38 +0800 [thread overview]
Message-ID: <CAMgjq7CM_SX3jLj9yp5hzAr6c3hBtS5nd4Nh4z8bTY8yWx-3KQ@mail.gmail.com> (raw)
In-Reply-To: <Yw21uOyEz9lLkI3p@dhcp22.suse.cz>
Michal Hocko <mhocko@suse.com> 于2022年8月30日周二 15:01写道:
>
> On Tue 30-08-22 13:59:49, Kairui Song wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > cgroup_memory_noswap is used in many hot path, so make it a static key
> > to lower the kernel overhead.
> >
> > Using 8G of ZRAM as SWAP, benchmark using `perf stat -d -d -d --repeat 100`
> > with the following code snip in a non-root cgroup:
> >
> > #include <stdio.h>
> > #include <string.h>
> > #include <linux/mman.h>
> > #include <sys/mman.h>
> > #define MB 1024UL * 1024UL
> > int main(int argc, char **argv){
> > void *p = mmap(NULL, 8000 * MB, PROT_READ | PROT_WRITE,
> > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > memset(p, 0xff, 8000 * MB);
> > madvise(p, 8000 * MB, MADV_PAGEOUT);
> > memset(p, 0xff, 8000 * MB);
> > return 0;
> > }
> >
> > Before:
> > 7,021.43 msec task-clock # 0.967 CPUs utilized ( +- 0.03% )
> > 4,010 context-switches # 573.853 /sec ( +- 0.01% )
> > 0 cpu-migrations # 0.000 /sec
> > 2,052,057 page-faults # 293.661 K/sec ( +- 0.00% )
> > 12,616,546,027 cycles # 1.805 GHz ( +- 0.06% ) (39.92%)
> > 156,823,666 stalled-cycles-frontend # 1.25% frontend cycles idle ( +- 0.10% ) (40.25%)
> > 310,130,812 stalled-cycles-backend # 2.47% backend cycles idle ( +- 4.39% ) (40.73%)
> > 18,692,516,591 instructions # 1.49 insn per cycle
> > # 0.01 stalled cycles per insn ( +- 0.04% ) (40.75%)
> > 4,907,447,976 branches # 702.283 M/sec ( +- 0.05% ) (40.30%)
> > 13,002,578 branch-misses # 0.26% of all branches ( +- 0.08% ) (40.48%)
> > 7,069,786,296 L1-dcache-loads # 1.012 G/sec ( +- 0.03% ) (40.32%)
> > 649,385,847 L1-dcache-load-misses # 9.13% of all L1-dcache accesses ( +- 0.07% ) (40.10%)
> > 1,485,448,688 L1-icache-loads # 212.576 M/sec ( +- 0.15% ) (39.49%)
> > 31,628,457 L1-icache-load-misses # 2.13% of all L1-icache accesses ( +- 0.40% ) (39.57%)
> > 6,667,311 dTLB-loads # 954.129 K/sec ( +- 0.21% ) (39.50%)
> > 5,668,555 dTLB-load-misses # 86.40% of all dTLB cache accesses ( +- 0.12% ) (39.03%)
> > 765 iTLB-loads # 109.476 /sec ( +- 21.81% ) (39.44%)
> > 4,370,351 iTLB-load-misses # 214320.09% of all iTLB cache accesses ( +- 1.44% ) (39.86%)
> > 149,207,254 L1-dcache-prefetches # 21.352 M/sec ( +- 0.13% ) (40.27%)
> >
> > 7.25869 +- 0.00203 seconds time elapsed ( +- 0.03% )
> >
> > After:
> > 6,576.16 msec task-clock # 0.953 CPUs utilized ( +- 0.10% )
> > 4,020 context-switches # 605.595 /sec ( +- 0.01% )
> > 0 cpu-migrations # 0.000 /sec
> > 2,052,056 page-faults # 309.133 K/sec ( +- 0.00% )
> > 11,967,619,180 cycles # 1.803 GHz ( +- 0.36% ) (38.76%)
> > 161,259,240 stalled-cycles-frontend # 1.38% frontend cycles idle ( +- 0.27% ) (36.58%)
> > 253,605,302 stalled-cycles-backend # 2.16% backend cycles idle ( +- 4.45% ) (34.78%)
> > 19,328,171,892 instructions # 1.65 insn per cycle
> > # 0.01 stalled cycles per insn ( +- 0.10% ) (31.46%)
> > 5,213,967,902 branches # 785.461 M/sec ( +- 0.18% ) (30.68%)
> > 12,385,170 branch-misses # 0.24% of all branches ( +- 0.26% ) (34.13%)
> > 7,271,687,822 L1-dcache-loads # 1.095 G/sec ( +- 0.12% ) (35.29%)
> > 649,873,045 L1-dcache-load-misses # 8.93% of all L1-dcache accesses ( +- 0.11% ) (41.41%)
> > 1,950,037,608 L1-icache-loads # 293.764 M/sec ( +- 0.33% ) (43.11%)
> > 31,365,566 L1-icache-load-misses # 1.62% of all L1-icache accesses ( +- 0.39% ) (45.89%)
> > 6,767,809 dTLB-loads # 1.020 M/sec ( +- 0.47% ) (48.42%)
> > 6,339,590 dTLB-load-misses # 95.43% of all dTLB cache accesses ( +- 0.50% ) (46.60%)
> > 736 iTLB-loads # 110.875 /sec ( +- 1.79% ) (48.60%)
> > 4,314,836 iTLB-load-misses # 518653.73% of all iTLB cache accesses ( +- 0.63% ) (42.91%)
> > 144,950,156 L1-dcache-prefetches # 21.836 M/sec ( +- 0.37% ) (41.39%)
> >
> > 6.89935 +- 0.00703 seconds time elapsed ( +- 0.10% )
>
> Do you happen to have a perf profile before and after to see which of
> the paths really benefits from this?
No I don't have a clear profile data about which path benefit the most.
The performance benchmark result can be stably reproduced, but perf
record & report & diff doesn't seems too helpful, as I can't see a
significant change of any single symbols.
There are quite a few callers of memcg_swap_enabled and
do_memsw_account (which also calls memcg_swap_enabled), to me, it
seems multiple pieces of optimization caused an overall improvement.
And a lower overhead for the branch predictor may also help in
general.
Any other suggestion about how to collect such data?
next prev parent reply other threads:[~2022-08-30 8:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-30 5:59 [PATCH 0/2] mm: memcontrol: cleanup and optimize for accounting params Kairui Song
2022-08-30 5:59 ` [PATCH 1/2] mm: memcontrol: remove mem_cgroup_kmem_disabled Kairui Song
2022-08-30 6:44 ` Michal Hocko
2022-08-30 7:06 ` Kairui Song
2022-08-30 7:12 ` Michal Hocko
2022-08-30 7:45 ` Kairui Song
2022-08-30 18:03 ` kernel test robot
2022-08-30 5:59 ` [PATCH 2/2] mm: memcontrol: make cgroup_memory_noswap a static key Kairui Song
2022-08-30 7:01 ` Michal Hocko
2022-08-30 8:50 ` Kairui Song [this message]
2022-08-30 10:12 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMgjq7CM_SX3jLj9yp5hzAr6c3hBtS5nd4Nh4z8bTY8yWx-3KQ@mail.gmail.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox