Re: [PATCH 2/2] mm: memcontrol: make cgroup_memory_noswap a static key

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kairui Song <ryncsn@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm: memcontrol: make cgroup_memory_noswap a static key
Date: Tue, 30 Aug 2022 16:50:38 +0800	[thread overview]
Message-ID: <CAMgjq7CM_SX3jLj9yp5hzAr6c3hBtS5nd4Nh4z8bTY8yWx-3KQ@mail.gmail.com> (raw)
In-Reply-To: <Yw21uOyEz9lLkI3p@dhcp22.suse.cz>

Michal Hocko <mhocko@suse.com> 于2022年8月30日周二 15:01写道：
>
> On Tue 30-08-22 13:59:49, Kairui Song wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > cgroup_memory_noswap is used in many hot path, so make it a static key
> > to lower the kernel overhead.
> >
> > Using 8G of ZRAM as SWAP, benchmark using `perf stat -d -d -d --repeat 100`
> > with the following code snip in a non-root cgroup:
> >
> >    #include <stdio.h>
> >    #include <string.h>
> >    #include <linux/mman.h>
> >    #include <sys/mman.h>
> >    #define MB 1024UL * 1024UL
> >    int main(int argc, char **argv){
> >       void *p = mmap(NULL, 8000 * MB, PROT_READ | PROT_WRITE,
> >                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> >       memset(p, 0xff, 8000 * MB);
> >       madvise(p, 8000 * MB, MADV_PAGEOUT);
> >       memset(p, 0xff, 8000 * MB);
> >       return 0;
> >    }
> >
> > Before:
> >           7,021.43 msec task-clock                #    0.967 CPUs utilized            ( +-  0.03% )
> >              4,010      context-switches          #  573.853 /sec                     ( +-  0.01% )
> >                  0      cpu-migrations            #    0.000 /sec
> >          2,052,057      page-faults               #  293.661 K/sec                    ( +-  0.00% )
> >     12,616,546,027      cycles                    #    1.805 GHz                      ( +-  0.06% )  (39.92%)
> >        156,823,666      stalled-cycles-frontend   #    1.25% frontend cycles idle     ( +-  0.10% )  (40.25%)
> >        310,130,812      stalled-cycles-backend    #    2.47% backend cycles idle      ( +-  4.39% )  (40.73%)
> >     18,692,516,591      instructions              #    1.49  insn per cycle
> >                                                   #    0.01  stalled cycles per insn  ( +-  0.04% )  (40.75%)
> >      4,907,447,976      branches                  #  702.283 M/sec                    ( +-  0.05% )  (40.30%)
> >         13,002,578      branch-misses             #    0.26% of all branches          ( +-  0.08% )  (40.48%)
> >      7,069,786,296      L1-dcache-loads           #    1.012 G/sec                    ( +-  0.03% )  (40.32%)
> >        649,385,847      L1-dcache-load-misses     #    9.13% of all L1-dcache accesses  ( +-  0.07% )  (40.10%)
> >      1,485,448,688      L1-icache-loads           #  212.576 M/sec                    ( +-  0.15% )  (39.49%)
> >         31,628,457      L1-icache-load-misses     #    2.13% of all L1-icache accesses  ( +-  0.40% )  (39.57%)
> >          6,667,311      dTLB-loads                #  954.129 K/sec                    ( +-  0.21% )  (39.50%)
> >          5,668,555      dTLB-load-misses          #   86.40% of all dTLB cache accesses  ( +-  0.12% )  (39.03%)
> >                765      iTLB-loads                #  109.476 /sec                     ( +- 21.81% )  (39.44%)
> >          4,370,351      iTLB-load-misses          # 214320.09% of all iTLB cache accesses  ( +-  1.44% )  (39.86%)
> >        149,207,254      L1-dcache-prefetches      #   21.352 M/sec                    ( +-  0.13% )  (40.27%)
> >
> >            7.25869 +- 0.00203 seconds time elapsed  ( +-  0.03% )
> >
> > After:
> >           6,576.16 msec task-clock                #    0.953 CPUs utilized            ( +-  0.10% )
> >              4,020      context-switches          #  605.595 /sec                     ( +-  0.01% )
> >                  0      cpu-migrations            #    0.000 /sec
> >          2,052,056      page-faults               #  309.133 K/sec                    ( +-  0.00% )
> >     11,967,619,180      cycles                    #    1.803 GHz                      ( +-  0.36% )  (38.76%)
> >        161,259,240      stalled-cycles-frontend   #    1.38% frontend cycles idle     ( +-  0.27% )  (36.58%)
> >        253,605,302      stalled-cycles-backend    #    2.16% backend cycles idle      ( +-  4.45% )  (34.78%)
> >     19,328,171,892      instructions              #    1.65  insn per cycle
> >                                                   #    0.01  stalled cycles per insn  ( +-  0.10% )  (31.46%)
> >      5,213,967,902      branches                  #  785.461 M/sec                    ( +-  0.18% )  (30.68%)
> >         12,385,170      branch-misses             #    0.24% of all branches          ( +-  0.26% )  (34.13%)
> >      7,271,687,822      L1-dcache-loads           #    1.095 G/sec                    ( +-  0.12% )  (35.29%)
> >        649,873,045      L1-dcache-load-misses     #    8.93% of all L1-dcache accesses  ( +-  0.11% )  (41.41%)
> >      1,950,037,608      L1-icache-loads           #  293.764 M/sec                    ( +-  0.33% )  (43.11%)
> >         31,365,566      L1-icache-load-misses     #    1.62% of all L1-icache accesses  ( +-  0.39% )  (45.89%)
> >          6,767,809      dTLB-loads                #    1.020 M/sec                    ( +-  0.47% )  (48.42%)
> >          6,339,590      dTLB-load-misses          #   95.43% of all dTLB cache accesses  ( +-  0.50% )  (46.60%)
> >                736      iTLB-loads                #  110.875 /sec                     ( +-  1.79% )  (48.60%)
> >          4,314,836      iTLB-load-misses          # 518653.73% of all iTLB cache accesses  ( +-  0.63% )  (42.91%)
> >        144,950,156      L1-dcache-prefetches      #   21.836 M/sec                    ( +-  0.37% )  (41.39%)
> >
> >            6.89935 +- 0.00703 seconds time elapsed  ( +-  0.10% )
>
> Do you happen to have a perf profile before and after to see which of
> the paths really benefits from this?

No I don't have a clear profile data about which path benefit the most.
The performance benchmark result can be stably reproduced, but perf
record & report & diff doesn't seems too helpful, as I can't see a
significant change of any single symbols.

There are quite a few callers of memcg_swap_enabled and
do_memsw_account (which also calls memcg_swap_enabled), to me, it
seems multiple pieces of optimization caused an overall improvement.
And a lower overhead for the branch predictor may also help in
general.

Any other suggestion about how to collect such data?

next prev parent reply	other threads:[~2022-08-30  8:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30  5:59 [PATCH 0/2] mm: memcontrol: cleanup and optimize for accounting params Kairui Song
2022-08-30  5:59 ` [PATCH 1/2] mm: memcontrol: remove mem_cgroup_kmem_disabled Kairui Song
2022-08-30  6:44   ` Michal Hocko
2022-08-30  7:06     ` Kairui Song
2022-08-30  7:12       ` Michal Hocko
2022-08-30  7:45         ` Kairui Song
2022-08-30 18:03   ` kernel test robot
2022-08-30  5:59 ` [PATCH 2/2] mm: memcontrol: make cgroup_memory_noswap a static key Kairui Song
2022-08-30  7:01   ` Michal Hocko
2022-08-30  8:50     ` Kairui Song [this message]
2022-08-30 10:12       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7CM_SX3jLj9yp5hzAr6c3hBtS5nd4Nh4z8bTY8yWx-3KQ@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox