From: Daniel Dao <dqminh@cloudflare.com>
To: shakeelb@google.com
Cc: kernel-team <kernel-team@cloudflare.com>,
linux-mm@kvack.org, hannes@cmpxchg.org, guro@fb.com,
feng.tang@intel.com, mhocko@kernel.org, hdanton@sina.com,
mkoutny@suse.com, akpm@linux-foundation.org,
torvalds@linux-foundation.org
Subject: Regression in workingset_refault latency on 5.15
Date: Wed, 23 Feb 2022 13:51:18 +0000 [thread overview]
Message-ID: <CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com> (raw)
Hi all,
We are observing some regressions in workingset_refault on our newly upgraded
5.15.19 nodes with zram as swap. This manifests in several ways:
1) Regression of workingset_refault duration observed in flamegraph
We regularly collect flamegraphs for running services on the node. Since upgrade
to 5.15.19, we see that workingset_refault occupied a more significant part of
the service flamegraph (13%) with the following call trace
workingset_refault+0x128
add_to_page_cache_lru+0x9f
page_cache_ra_unbounded+0x154
force_page_cache_ra+0xe2
filemap_get_pages+0xe9
filemap_read+0xa4
xfs_file_buffered_read+0x98
xfs_file_read_iter+0x6a
new_sync_read+0x118
vfs_read+0xf2
__x64_sys_pread64+0x89
do_syscall_64+0x3b
entry_SYSCALL_64_after_hwframe+0x44
2) Regression of userspace performance sensitive code
We have some performance sensentive code running in userspace that have their
runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as:
now = clock_gettime(CLOCK_THREAD_CPUTIME_ID)
func()
elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now
Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more
frequently than before. This went away after we disabled swap for the service
using `memory.swap.max=0` memcg configuration.
The new thing in 5.15 workingset_refault seems to be introduction of
mem_cgroup_flush_stats()
by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush
lruvec stats in the
refault).
Given that mem_cgroup_flush_stats can take quite a long time for us on the
standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service )
sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats
Tracing function calls slower than 10 ms... Ctrl+C to quit.
TIME COMM PID LAT(ms) RVAL FUNC
0.000000 <redacted> 804776 11.50 200
mem_cgroup_flush_stats
0.343383 <redacted> 647496 10.58 200
mem_cgroup_flush_stats
0.604309 <redacted> 804776 10.50 200
mem_cgroup_flush_stats
1.230416 <redacted> 803293 10.01 200
mem_cgroup_flush_stats
1.248442 <redacted> 646400 11.02 200
mem_cgroup_flush_stats
could it be possible that workingset_refault in some unfortunate case can take
much longer than before such that it increases the time observed by
CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of
workingset_refault
observed by perf ?
next reply other threads:[~2022-02-23 13:51 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-23 13:51 Daniel Dao [this message]
2022-02-23 15:57 ` Shakeel Butt
2022-02-23 16:00 ` Shakeel Butt
2022-02-23 17:07 ` Daniel Dao
2022-02-23 17:36 ` Shakeel Butt
2022-02-23 19:28 ` Ivan Babrou
2022-02-23 20:28 ` Shakeel Butt
2022-02-23 21:16 ` Ivan Babrou
2022-02-24 14:46 ` Daniel Dao
2022-02-24 16:58 ` Shakeel Butt
2022-02-24 17:34 ` Daniel Dao
2022-02-24 18:00 ` Shakeel Butt
2022-02-24 18:52 ` Shakeel Butt
2022-02-25 10:23 ` Daniel Dao
2022-02-25 17:08 ` Ivan Babrou
2022-02-25 17:22 ` Shakeel Butt
2022-02-25 18:03 ` Michal Koutný
2022-02-25 18:08 ` Ivan Babrou
2022-02-28 23:09 ` Shakeel Butt
2022-02-28 23:34 ` Ivan Babrou
2022-02-28 23:43 ` Shakeel Butt
2022-03-02 0:48 ` Ivan Babrou
2022-03-02 2:50 ` Shakeel Butt
2022-03-02 3:40 ` Ivan Babrou
2022-03-02 22:33 ` Ivan Babrou
2022-03-03 2:32 ` Shakeel Butt
2022-03-03 2:35 ` Shakeel Butt
2022-03-04 0:21 ` Ivan Babrou
2022-03-04 1:05 ` Shakeel Butt
2022-03-04 1:12 ` Ivan Babrou
2022-03-02 11:49 ` Frank Hofmann
2022-03-02 15:52 ` Shakeel Butt
2022-03-02 10:08 ` Michal Koutný
2022-03-02 15:53 ` Shakeel Butt
2022-03-02 17:28 ` Ivan Babrou
2022-02-24 9:22 ` Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com' \
--to=dqminh@cloudflare.com \
--cc=akpm@linux-foundation.org \
--cc=feng.tang@intel.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=shakeelb@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox