From: Michal Hocko <mhocko@suse.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christoph Lameter <cl@linux.com>,
Aaron Tomlin <atomlin@atomlin.com>,
Frederic Weisbecker <frederic@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Russell King <linux@armlinux.org.uk>,
Huacai Chen <chenhuacai@kernel.org>,
Heiko Carstens <hca@linux.ibm.com>,
x86@kernel.org, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v7 00/13] fold per-CPU vmstats remotely
Date: Wed, 22 Mar 2023 11:13:02 +0100 [thread overview]
Message-ID: <ZBrUruIsOjdaqiFv@dhcp22.suse.cz> (raw)
In-Reply-To: <ZBiu8csaxB/zrOAS@tpad>
On Mon 20-03-23 16:07:29, Marcelo Tosatti wrote:
> On Mon, Mar 20, 2023 at 07:25:55PM +0100, Michal Hocko wrote:
> > On Mon 20-03-23 15:03:32, Marcelo Tosatti wrote:
> > > This patch series addresses the following two problems:
> > >
> > > 1. A customer provided evidence indicating that a process
> > > was stalled in direct reclaim:
> > >
> > This is addressed by the trivial patch 1.
> >
> > [...]
> > > 2. With a task that busy loops on a given CPU,
> > > the kworker interruption to execute vmstat_update
> > > is undesired and may exceed latency thresholds
> > > for certain applications.
> >
> > Yes it can but why does that matter?
>
> It matters for the application that is executing and expects
> not to be interrupted.
Those workloads shouldn't enter the kernel in the first place, no?
Otherwise the in kernel execution with all the direct or indirect
dependencies (e.g. via locks) can throw any latency expectations off the
window.
> > > By having vmstat_shepherd flush the per-CPU counters to the
> > > global counters from remote CPUs.
> > >
> > > This is done using cmpxchg to manipulate the counters,
> > > both CPU locally (via the account functions),
> > > and remotely (via cpu_vm_stats_fold).
> > >
> > > Thanks to Aaron Tomlin for diagnosing issue 1 and writing
> > > the initial patch series.
> > >
> > >
> > > Performance details for the kworker interruption:
> > >
> > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
> > >
> > > The example above shows an additional 7us for the
> > >
> > > oslat -> kworker -> oslat
> > >
> > > switches. In the case of a virtualized CPU, and the vmstat_update
> > > interruption in the host (of a qemu-kvm vcpu), the latency penalty
> > > observed in the guest is higher than 50us, violating the acceptable
> > > latency threshold for certain applications.
> >
> > I do not think we have ever promissed any specific latency guarantees
> > for vmstat. These are statistics have been mostly used for debugging
> > purposes AFAIK. I am not aware of any specific user space use case that
> > would be latency sensitive. Your changelog doesn't go into details there
> > either.
>
> There is a class of workloads for which response time can be
> of interest. MAC scheduler is an example:
>
> https://par.nsf.gov/servlets/purl/10090368
Yes, I am not disputing low latency workloads in general. I am just
saying that you haven't really established a very sound justification
here. Of course there are workloads which do not want to conflict with
any in kernel house keeping. Those have to be configured and implemented
very carefully though. Vmstat as such should not collide with those
workloads as long as they do not interact with the kernel in a way
counters are updated. Is this hard or impossible to avoid?
I can imagine that those workloads have an start up sequence where the
kernel is involved and counters updated so that deferred flushing could
interfere with the later and latency sensitive phase. Is that a real
problem in practice? Please tell us much more why we need to make the
vmstat code more complex.
Thanks!
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2023-03-22 10:13 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-20 18:03 Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 01/13] vmstat: allow_direct_reclaim should use zone_page_state_snapshot Marcelo Tosatti
2023-03-20 18:21 ` Michal Hocko
2023-03-20 18:32 ` Marcelo Tosatti
2023-03-22 10:03 ` Michal Hocko
2023-03-20 18:03 ` [PATCH v7 02/13] this_cpu_cmpxchg: ARM64: switch this_cpu_cmpxchg to locked, add _local function Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 03/13] this_cpu_cmpxchg: loongarch: " Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 04/13] this_cpu_cmpxchg: S390: " Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 05/13] this_cpu_cmpxchg: x86: " Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 06/13] add this_cpu_cmpxchg_local and asm-generic definitions Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 07/13] convert this_cpu_cmpxchg users to this_cpu_cmpxchg_local Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 08/13] mm/vmstat: switch counter modification to cmpxchg Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 09/13] vmstat: switch per-cpu vmstat counters to 32-bits Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 10/13] mm/vmstat: use xchg in cpu_vm_stats_fold Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 11/13] mm/vmstat: switch vmstat shepherd to flush per-CPU counters remotely Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 12/13] mm/vmstat: refresh stats remotely instead of via work item Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 13/13] vmstat: add pcp remote node draining via cpu_vm_stats_fold Marcelo Tosatti
2023-03-20 20:43 ` Tim Chen
2023-03-22 1:20 ` Marcelo Tosatti
2023-03-20 18:25 ` [PATCH v7 00/13] fold per-CPU vmstats remotely Michal Hocko
2023-03-20 19:07 ` Marcelo Tosatti
2023-03-22 10:13 ` Michal Hocko [this message]
2023-03-22 11:23 ` Marcelo Tosatti
2023-03-22 13:35 ` Michal Hocko
2023-03-22 14:20 ` Marcelo Tosatti
2023-03-23 7:51 ` Michal Hocko
2023-03-23 10:52 ` Marcelo Tosatti
2023-03-23 10:59 ` Marcelo Tosatti
2023-03-23 12:17 ` Michal Hocko
2023-03-23 13:30 ` Marcelo Tosatti
2023-03-23 13:32 ` Marcelo Tosatti
2023-04-18 22:02 ` Andrew Morton
2023-04-19 11:14 ` Marcelo Tosatti
2023-04-19 11:15 ` Marcelo Tosatti
2023-04-19 13:44 ` Andrew Theurer
2023-04-20 7:55 ` Michal Hocko
2023-04-23 1:25 ` Marcelo Tosatti
2023-04-19 11:29 ` Marcelo Tosatti
2023-04-19 11:59 ` Marcelo Tosatti
2023-04-19 12:24 ` Frederic Weisbecker
2023-04-19 13:48 ` Marcelo Tosatti
2023-04-19 14:35 ` Michal Hocko
2023-04-19 16:35 ` Marcelo Tosatti
2023-04-20 8:40 ` Michal Hocko
2023-04-23 1:10 ` Marcelo Tosatti
2023-04-20 13:45 ` Marcelo Tosatti
2023-04-26 14:34 ` Marcelo Tosatti
2023-04-27 8:31 ` Michal Hocko
2023-04-27 14:59 ` Marcelo Tosatti
2023-04-26 15:04 ` Vlastimil Babka
2023-04-26 16:10 ` Marcelo Tosatti
2023-04-27 8:39 ` Michal Hocko
2023-04-27 16:25 ` Marcelo Tosatti
2023-04-19 16:47 ` Vlastimil Babka
2023-04-19 19:15 ` Marcelo Tosatti
2023-05-03 13:51 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZBrUruIsOjdaqiFv@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@atomlin.com \
--cc=chenhuacai@kernel.org \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=mtosatti@redhat.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox