From: Marcelo Tosatti <mtosatti@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Aaron Tomlin <atomlin@atomlin.com>,
Frederic Weisbecker <frederic@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Russell King <linux@armlinux.org.uk>,
Huacai Chen <chenhuacai@kernel.org>,
Heiko Carstens <hca@linux.ibm.com>,
x86@kernel.org, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v7 00/13] fold per-CPU vmstats remotely
Date: Wed, 3 May 2023 10:51:21 -0300 [thread overview]
Message-ID: <ZFJm2QJrwDpeAzvi@tpad> (raw)
In-Reply-To: <ZEA95uBeUECRvO5e@tpad>
On Wed, Apr 19, 2023 at 04:15:50PM -0300, Marcelo Tosatti wrote:
> On Wed, Apr 19, 2023 at 06:47:30PM +0200, Vlastimil Babka wrote:
> > On 4/19/23 13:29, Marcelo Tosatti wrote:
> > > On Wed, Apr 19, 2023 at 08:14:09AM -0300, Marcelo Tosatti wrote:
> > >> This was tried before:
> > >> https://lore.kernel.org/lkml/20220127173037.318440631@fedora.localdomain/
> > >>
> > >> My conclusion from that discussion (and work) is that a special system
> > >> call:
> > >>
> > >> 1) Does not allow the benefits to be widely applied (only modified
> > >> applications will benefit). Is not portable across different operating systems.
> > >>
> > >> Removing the vmstat_work interruption is a benefit for HPC workloads,
> > >> for example (in fact, it is a benefit for any kind of application,
> > >> since the interruption causes cache misses).
> > >>
> > >> 2) Increases the system call cost for applications which would use
> > >> the interface.
> > >>
> > >> So avoiding the vmstat_update update interruption, without userspace
> > >> knowledge and modifications, is a better than solution than a modified
> > >> userspace.
> > >
> > > Another important point is this: if an application dirties
> > > its own per-CPU vmstat cache, while performing a system call,
> > > and a vmstat sync event is triggered on a different CPU, you'd have to:
> > >
> > > 1) Wait for that CPU to return to userspace and sync its stats
> > > (unfeasible).
> > >
> > > 2) Queue work to execute on that CPU (undesirable, as that causes
> > > an interruption).
> >
> > So you're saying the application might do a syscall from the isolcpu, so
> > IIUC it cannot expect any latency guarantees at that very moment,
>
> Why not? cyclictest uses nanosleep and its the main tool for measuring
> latency.
>
> > but then
> > it immediately starts expecting them again after returning to userspace,
>
> No, the expectation more generally is this:
>
> For certain types of applications (for example PLC software or
> RAN processing), upon occurrence of an event, it is necessary to
> complete a certain task in a maximum amount of time (deadline).
>
> One way to express this requirement is with a pair of numbers,
> deadline time and execution time, where:
>
> * deadline time: length of time between event and deadline.
> * execution time: length of time it takes for processing of event
> to occur on a particular hardware platform
> (uninterrupted).
>
> The particular values depend on use-case. For the case
> where the realtime application executes in a virtualized
> guest, an interruption which must be serviced in the host will cause
> the following sequence of events:
>
> 1) VM-exit
> 2) execution of IPI (and function call) (or switch to kwork
> thread to execute some work item).
> 3) VM-entry
>
> Which causes an excess of 50us latency as observed by cyclictest
> (this violates the latency requirement of vRAN application with 1ms TTI,
> for example).
>
> > and
> > a single interruption for a one-time flush after the syscall would be too
> > intrusive?
>
> Generally, if you can't complete the task (which involves executing a
> number of instructions) before the deadline, then its a problem.
>
> One-time flush? You mean to switch between:
>
> rt-app -> kworker (to execute vmstat_update flush) -> rt-app
>
> My measurement, which probably had vmstat_update code/data in cache, took 7us.
> It might be the case that the code to execute must be brought in from
> memory, which takes even longer.
>
> > (elsewhere in the thread you described an RT app initialization that may
> > generate vmstats to flush and then entry userspace loop, again, would a
> > single interruption soon after entering the loop be so critical?)
>
> 1) It depends on the application. For the use-case above, where < 50us
> interruption is desired, yes it is critical.
>
> 2) The interruptions can come from different sources.
>
> Time
> 0 rt-app executing instruction 1
> 1 rt-app executing instruction 2
> 2 scheduler switches between rt-app and kworker
> 3 kworker runs vmstat_work
> 4 scheduler switches between kworker and rt-app
> 5 rt-app executing instruction 3
> 6 ipi to handle a KVM request IPI
> 7 fill in your preferred IPI handler
>
> So the argument "a single interruption might not cause your deadline
> to be exceeded" fails (because the time to handle the
> different interruptions might sum).
>
> Does that make sense?
Ping ? (just want to double check the reasoning above makes sense).
prev parent reply other threads:[~2023-05-03 13:53 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-20 18:03 Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 01/13] vmstat: allow_direct_reclaim should use zone_page_state_snapshot Marcelo Tosatti
2023-03-20 18:21 ` Michal Hocko
2023-03-20 18:32 ` Marcelo Tosatti
2023-03-22 10:03 ` Michal Hocko
2023-03-20 18:03 ` [PATCH v7 02/13] this_cpu_cmpxchg: ARM64: switch this_cpu_cmpxchg to locked, add _local function Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 03/13] this_cpu_cmpxchg: loongarch: " Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 04/13] this_cpu_cmpxchg: S390: " Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 05/13] this_cpu_cmpxchg: x86: " Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 06/13] add this_cpu_cmpxchg_local and asm-generic definitions Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 07/13] convert this_cpu_cmpxchg users to this_cpu_cmpxchg_local Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 08/13] mm/vmstat: switch counter modification to cmpxchg Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 09/13] vmstat: switch per-cpu vmstat counters to 32-bits Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 10/13] mm/vmstat: use xchg in cpu_vm_stats_fold Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 11/13] mm/vmstat: switch vmstat shepherd to flush per-CPU counters remotely Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 12/13] mm/vmstat: refresh stats remotely instead of via work item Marcelo Tosatti
2023-03-20 18:03 ` [PATCH v7 13/13] vmstat: add pcp remote node draining via cpu_vm_stats_fold Marcelo Tosatti
2023-03-20 20:43 ` Tim Chen
2023-03-22 1:20 ` Marcelo Tosatti
2023-03-20 18:25 ` [PATCH v7 00/13] fold per-CPU vmstats remotely Michal Hocko
2023-03-20 19:07 ` Marcelo Tosatti
2023-03-22 10:13 ` Michal Hocko
2023-03-22 11:23 ` Marcelo Tosatti
2023-03-22 13:35 ` Michal Hocko
2023-03-22 14:20 ` Marcelo Tosatti
2023-03-23 7:51 ` Michal Hocko
2023-03-23 10:52 ` Marcelo Tosatti
2023-03-23 10:59 ` Marcelo Tosatti
2023-03-23 12:17 ` Michal Hocko
2023-03-23 13:30 ` Marcelo Tosatti
2023-03-23 13:32 ` Marcelo Tosatti
2023-04-18 22:02 ` Andrew Morton
2023-04-19 11:14 ` Marcelo Tosatti
2023-04-19 11:15 ` Marcelo Tosatti
2023-04-19 13:44 ` Andrew Theurer
2023-04-20 7:55 ` Michal Hocko
2023-04-23 1:25 ` Marcelo Tosatti
2023-04-19 11:29 ` Marcelo Tosatti
2023-04-19 11:59 ` Marcelo Tosatti
2023-04-19 12:24 ` Frederic Weisbecker
2023-04-19 13:48 ` Marcelo Tosatti
2023-04-19 14:35 ` Michal Hocko
2023-04-19 16:35 ` Marcelo Tosatti
2023-04-20 8:40 ` Michal Hocko
2023-04-23 1:10 ` Marcelo Tosatti
2023-04-20 13:45 ` Marcelo Tosatti
2023-04-26 14:34 ` Marcelo Tosatti
2023-04-27 8:31 ` Michal Hocko
2023-04-27 14:59 ` Marcelo Tosatti
2023-04-26 15:04 ` Vlastimil Babka
2023-04-26 16:10 ` Marcelo Tosatti
2023-04-27 8:39 ` Michal Hocko
2023-04-27 16:25 ` Marcelo Tosatti
2023-04-19 16:47 ` Vlastimil Babka
2023-04-19 19:15 ` Marcelo Tosatti
2023-05-03 13:51 ` Marcelo Tosatti [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZFJm2QJrwDpeAzvi@tpad \
--to=mtosatti@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@atomlin.com \
--cc=chenhuacai@kernel.org \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox