linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Kirill Tkhai <tkhai@ya.ru>
Cc: Mike Rapoport <rppt@kernel.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org,
	muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com,
	sultan@kerneltoast.com, dave@stgolabs.net,
	penguin-kernel@i-love.sakura.ne.jp, paulmck@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/8] make slab shrink lockless
Date: Mon, 27 Feb 2023 11:32:34 -0800	[thread overview]
Message-ID: <Y/0FUjmqDVF1lhfn@P9FQF9L96D> (raw)
In-Reply-To: <dcf1d336-cfe1-964e-efe7-4aa40d4a3520@ya.ru>

On Mon, Feb 27, 2023 at 10:20:59PM +0300, Kirill Tkhai wrote:
> On 27.02.2023 18:08, Mike Rapoport wrote:
> > Hi,
> > 
> > On Mon, Feb 27, 2023 at 09:31:51PM +0800, Qi Zheng wrote:
> >>
> >>
> >> On 2023/2/27 03:51, Andrew Morton wrote:
> >>> On Sun, 26 Feb 2023 22:46:47 +0800 Qi Zheng <zhengqi.arch@bytedance.com> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> This patch series aims to make slab shrink lockless.
> >>>
> >>> What an awesome changelog.
> >>>
> >>>> 2. Survey
> >>>> =========
> >>>
> >>> Especially this part.
> >>>
> >>> Looking through all the prior efforts and at this patchset I am not
> >>> immediately seeing any statements about the overall effect upon
> >>> real-world workloads.  For a good example, does this patchset
> >>> measurably improve throughput or energy consumption on your servers?
> >>
> >> Hi Andrew,
> >>
> >> I re-tested with the following physical machines:
> >>
> >> Architecture:        x86_64
> >> CPU(s):              96
> >> On-line CPU(s) list: 0-95
> >> Model name:          Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
> >>
> >> I found that the reason for the hotspot I described in cover letter is
> >> wrong. The reason for the down_read_trylock() hotspot is not because of
> >> the failure to trylock, but simply because of the atomic operation
> >> (cmpxchg). And this will lead to a significant reduction in IPC (insn
> >> per cycle).
> > 
> > ... 
> >  
> >> Then we can use the following perf command to view hotspots:
> >>
> >> perf top -U -F 999
> >>
> >> 1) Before applying this patchset:
> >>
> >>   32.31%  [kernel]           [k] down_read_trylock
> >>   19.40%  [kernel]           [k] pv_native_safe_halt
> >>   16.24%  [kernel]           [k] up_read
> >>   15.70%  [kernel]           [k] shrink_slab
> >>    4.69%  [kernel]           [k] _find_next_bit
> >>    2.62%  [kernel]           [k] shrink_node
> >>    1.78%  [kernel]           [k] shrink_lruvec
> >>    0.76%  [kernel]           [k] do_shrink_slab
> >>
> >> 2) After applying this patchset:
> >>
> >>   27.83%  [kernel]           [k] _find_next_bit
> >>   16.97%  [kernel]           [k] shrink_slab
> >>   15.82%  [kernel]           [k] pv_native_safe_halt
> >>    9.58%  [kernel]           [k] shrink_node
> >>    8.31%  [kernel]           [k] shrink_lruvec
> >>    5.64%  [kernel]           [k] do_shrink_slab
> >>    3.88%  [kernel]           [k] mem_cgroup_iter
> >>
> >> 2. At the same time, we use the following perf command to capture IPC
> >> information:
> >>
> >> perf stat -e cycles,instructions -G test -a --repeat 5 -- sleep 10
> >>
> >> 1) Before applying this patchset:
> >>
> >>  Performance counter stats for 'system wide' (5 runs):
> >>
> >>       454187219766      cycles                    test                    (
> >> +-  1.84% )
> >>        78896433101      instructions              test #    0.17  insn per
> >> cycle           ( +-  0.44% )
> >>
> >>         10.0020430 +- 0.0000366 seconds time elapsed  ( +-  0.00% )
> >>
> >> 2) After applying this patchset:
> >>
> >>  Performance counter stats for 'system wide' (5 runs):
> >>
> >>       841954709443      cycles                    test                    (
> >> +- 15.80% )  (98.69%)
> >>       527258677936      instructions              test #    0.63  insn per
> >> cycle           ( +- 15.11% )  (98.68%)
> >>
> >>           10.01064 +- 0.00831 seconds time elapsed  ( +-  0.08% )
> >>
> >> We can see that IPC drops very seriously when calling
> >> down_read_trylock() at high frequency. After using SRCU,
> >> the IPC is at a normal level.
> > 
> > The results you present do show improvement in IPC for an artificial test
> > script. But more interesting would be to see how a real world workloads
> > benefit from your changes.
> 
> One of the real workloads from my experience is start of an overcommitted node
> containing many starting containers after node crash (or many resuming containers
> after reboot for kernel update). In these cases memory pressure is huge, and
> the node goes round in long reclaim.
> 
> This patch patchset makes prealloc_memcg_shrinker() independent of do_shrink_slab(),
> so prealloc_memcg_shrinker() won't have to wait till shrink_slab_memcg() completes its
> current bit iteration, sees rwsem_is_contended() and the iteration breaks.
> 
> Also, it's important to mention that currently we have the strange behavior:
> 
> prealloc_memcg_shrinker()
>   down_write(&shrinker_rwsem)
>   idr_alloc()
>     reclaim
>       for each child memcg
>         shrink_slab_memcg()
>           down_read_trylock(&shrinker_rwsem) -> fail

But this can happen only if we get -ENOMEM in idr_alloc()?
Doesn't seem to be a very hot path.

Thanks!


  reply	other threads:[~2023-02-27 19:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-26 14:46 Qi Zheng
2023-02-26 14:46 ` [PATCH v3 1/8] mm: vmscan: add a map_nr_max field to shrinker_info Qi Zheng
2023-02-26 14:54   ` Qi Zheng
2023-02-26 14:46 ` [PATCH v3 2/8] mm: vmscan: make global slab shrink lockless Qi Zheng
2023-02-26 14:46 ` [PATCH v3 3/8] mm: vmscan: make memcg " Qi Zheng
2023-02-26 14:46 ` [PATCH v3 4/8] mm: vmscan: add shrinker_srcu_generation Qi Zheng
2023-02-26 14:46 ` [PATCH v3 5/8] mm: shrinkers: make count and scan in shrinker debugfs lockless Qi Zheng
2023-02-26 14:46 ` [PATCH v3 6/8] mm: vmscan: hold write lock to reparent shrinker nr_deferred Qi Zheng
2023-02-26 14:46 ` [PATCH v3 7/8] mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers() Qi Zheng
2023-02-26 14:46 ` [PATCH v3 8/8] mm: shrinkers: convert shrinker_rwsem to mutex Qi Zheng
2023-02-26 19:51 ` [PATCH v3 0/8] make slab shrink lockless Andrew Morton
2023-02-27 13:31   ` Qi Zheng
2023-02-27 15:08     ` Mike Rapoport
2023-02-27 19:20       ` Kirill Tkhai
2023-02-27 19:32         ` Roman Gushchin [this message]
2023-02-27 19:47           ` Kirill Tkhai
2023-02-28 10:08         ` Qi Zheng
2023-02-28 10:04       ` Qi Zheng
2023-02-28 10:53         ` Qi Zheng
2023-02-28 18:40       ` Michal Hocko
2023-03-01  2:27         ` Qi Zheng
2023-02-27 19:02     ` Roman Gushchin
2023-02-28 10:11       ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/0FUjmqDVF1lhfn@P9FQF9L96D \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rppt@kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=sultan@kerneltoast.com \
    --cc=tkhai@ya.ru \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox