linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Daniel Dao <dqminh@cloudflare.com>, Ivan Babrou <ivan@cloudflare.com>
Cc: kernel-team <kernel-team@cloudflare.com>,
	"Linux MM" <linux-mm@kvack.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Roman Gushchin" <guro@fb.com>, "Feng Tang" <feng.tang@intel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Hillf Danton" <hdanton@sina.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>
Subject: Re: Regression in workingset_refault latency on 5.15
Date: Wed, 23 Feb 2022 08:00:44 -0800	[thread overview]
Message-ID: <CALvZod76u0qXBzrCZ16j9xTVkPPvqyw9+4s2K67bVwB1fQHnPw@mail.gmail.com> (raw)
In-Reply-To: <CALvZod4+-q7ciz3A_-J0UWjQNe1P==WMyNNKFu4dqiy8oGisTQ@mail.gmail.com>

On Wed, Feb 23, 2022 at 7:57 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> +Ivan Babrou
>
> Hi Daniel,
>
> On Wed, Feb 23, 2022 at 5:51 AM Daniel Dao <dqminh@cloudflare.com> wrote:
> >
> > Hi all,
> >
> > We are observing some regressions in workingset_refault on our newly upgraded
> > 5.15.19 nodes with zram as swap. This manifests in several ways:
> >
> > 1) Regression of workingset_refault duration observed in flamegraph
> >
> > We regularly collect flamegraphs for running services on the node. Since upgrade
> > to 5.15.19, we see that workingset_refault occupied a more significant part of
> > the service flamegraph (13%) with the following call trace
> >
> >   workingset_refault+0x128
> >   add_to_page_cache_lru+0x9f
> >   page_cache_ra_unbounded+0x154
> >   force_page_cache_ra+0xe2
> >   filemap_get_pages+0xe9
> >   filemap_read+0xa4
> >   xfs_file_buffered_read+0x98
> >   xfs_file_read_iter+0x6a
> >   new_sync_read+0x118
> >   vfs_read+0xf2
> >   __x64_sys_pread64+0x89
> >   do_syscall_64+0x3b
> >   entry_SYSCALL_64_after_hwframe+0x44
> >
> > 2) Regression of userspace performance sensitive code
> >
> > We have some performance sensentive code running in userspace that have their
> > runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as:
> >
> >   now = clock_gettime(CLOCK_THREAD_CPUTIME_ID)
> >   func()
> >   elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now
> >
> > Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more
> > frequently than before.  This went away after we disabled swap for the service
> > using `memory.swap.max=0` memcg configuration.
> >
> > The new thing in 5.15 workingset_refault seems to be introduction of
> > mem_cgroup_flush_stats()
> > by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush
> > lruvec stats in the
> > refault).
> >
> > Given that mem_cgroup_flush_stats can take quite a long time for us on the
> > standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service )
> >
> >   sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats
> >   Tracing function calls slower than 10 ms... Ctrl+C to quit.
> >   TIME       COMM           PID    LAT(ms)             RVAL FUNC
> >   0.000000   <redacted>     804776   11.50              200
> > mem_cgroup_flush_stats
> >   0.343383   <redacted>     647496   10.58              200
> > mem_cgroup_flush_stats
> >   0.604309   <redacted>     804776   10.50              200
> > mem_cgroup_flush_stats
> >   1.230416   <redacted>     803293   10.01              200
> > mem_cgroup_flush_stats
> >   1.248442   <redacted>     646400   11.02              200
> > mem_cgroup_flush_stats
> >
> > could it be possible that workingset_refault in some unfortunate case can take
> > much longer than before such that it increases the time observed by
> > CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of
> > workingset_refault
> > observed by perf ?
>
> Ivan reported this issue last month and since then we have backported
> three patches to 5.15-stable to fix the issue. These got merged into
> 5.15.18. However you are reporting the issue is still present 5.15.19.
>
> Can you share a bit more detail on your hardware configuration (num of
> cpus) and if possible the flamegraph?
>

Also if you can reproduce the issue, can you try the patch at
https://lore.kernel.org/all/20210929235936.2859271-1-shakeelb@google.com/
?


  reply	other threads:[~2022-02-23 16:00 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 13:51 Daniel Dao
2022-02-23 15:57 ` Shakeel Butt
2022-02-23 16:00   ` Shakeel Butt [this message]
2022-02-23 17:07     ` Daniel Dao
2022-02-23 17:36       ` Shakeel Butt
2022-02-23 19:28         ` Ivan Babrou
2022-02-23 20:28           ` Shakeel Butt
2022-02-23 21:16             ` Ivan Babrou
2022-02-24 14:46               ` Daniel Dao
2022-02-24 16:58                 ` Shakeel Butt
2022-02-24 17:34                   ` Daniel Dao
2022-02-24 18:00                     ` Shakeel Butt
2022-02-24 18:52                       ` Shakeel Butt
2022-02-25 10:23                         ` Daniel Dao
2022-02-25 17:08                           ` Ivan Babrou
2022-02-25 17:22                             ` Shakeel Butt
2022-02-25 18:03                             ` Michal Koutný
2022-02-25 18:08                               ` Ivan Babrou
2022-02-28 23:09                                 ` Shakeel Butt
2022-02-28 23:34                                   ` Ivan Babrou
2022-02-28 23:43                                     ` Shakeel Butt
2022-03-02  0:48                                     ` Ivan Babrou
2022-03-02  2:50                                       ` Shakeel Butt
2022-03-02  3:40                                         ` Ivan Babrou
2022-03-02 22:33                                           ` Ivan Babrou
2022-03-03  2:32                                             ` Shakeel Butt
2022-03-03  2:35                                             ` Shakeel Butt
2022-03-04  0:21                                               ` Ivan Babrou
2022-03-04  1:05                                                 ` Shakeel Butt
2022-03-04  1:12                                                   ` Ivan Babrou
2022-03-02 11:49                                         ` Frank Hofmann
2022-03-02 15:52                                           ` Shakeel Butt
2022-03-02 10:08                                       ` Michal Koutný
2022-03-02 15:53                                         ` Shakeel Butt
2022-03-02 17:28                                           ` Ivan Babrou
2022-02-24  9:22 ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod76u0qXBzrCZ16j9xTVkPPvqyw9+4s2K67bVwB1fQHnPw@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dqminh@cloudflare.com \
    --cc=feng.tang@intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=ivan@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox