From: Peter Zijlstra <peterz@infradead.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, kvm@vger.kernel.org,
linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org,
"Nicolas Saenz Julienne" <nsaenzju@redhat.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Wanpeng Li" <wanpengli@tencent.com>,
"Vitaly Kuznetsov" <vkuznets@redhat.com>,
"Andy Lutomirski" <luto@kernel.org>,
"Frederic Weisbecker" <frederic@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Uladzislau Rezki" <urezki@gmail.com>,
"Christoph Hellwig" <hch@infradead.org>,
"Lorenzo Stoakes" <lstoakes@gmail.com>,
"Josh Poimboeuf" <jpoimboe@kernel.org>,
"Kees Cook" <keescook@chromium.org>,
"Sami Tolvanen" <samitolvanen@google.com>,
"Ard Biesheuvel" <ardb@kernel.org>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Juerg Haefliger" <juerg.haefliger@canonical.com>,
"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Nadav Amit" <namit@vmware.com>,
"Dan Carpenter" <error27@gmail.com>,
"Chuang Wang" <nashuiliang@gmail.com>,
"Yang Jihong" <yangjihong1@huawei.com>,
"Petr Mladek" <pmladek@suse.com>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
"Song Liu" <song@kernel.org>,
"Julian Pidancet" <julian.pidancet@oracle.com>,
"Tom Lendacky" <thomas.lendacky@amd.com>,
"Dionna Glaze" <dionnaglaze@google.com>,
"Thomas Weißschuh" <linux@weissschuh.net>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Daniel Bristot de Oliveira" <bristot@redhat.com>,
"Marcelo Tosatti" <mtosatti@redhat.com>,
"Yair Podemsky" <ypodemsk@redhat.com>
Subject: Re: [RFC PATCH 11/14] context-tracking: Introduce work deferral infrastructure
Date: Thu, 6 Jul 2023 00:39:56 +0200 [thread overview]
Message-ID: <20230705223956.GD2813335@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20230705181256.3539027-12-vschneid@redhat.com>
On Wed, Jul 05, 2023 at 07:12:53PM +0100, Valentin Schneider wrote:
> Note: A previous approach by PeterZ [1] used an extra bit in
> context_tracking.state to flag the presence of deferred callbacks to
> execute, and the actual callbacks were stored in a separate atomic
> variable.
>
> This meant that the atomic read of context_tracking.state was sufficient to
> determine whether there are any deferred callbacks to execute.
> Unfortunately, it presents a race window. Consider the work setting
> function as:
>
> preempt_disable();
> seq = atomic_read(&ct->seq);
> if (__context_tracking_seq_in_user(seq)) {
> /* ctrl-dep */
> atomic_or(work, &ct->work);
> ret = atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK);
> }
> preempt_enable();
>
> return ret;
>
> Then the following can happen:
>
> CPUx CPUy
> CT_SEQ_WORK \in context_tracking.state
> atomic_or(WORK_N, &ct->work);
> ct_kernel_enter()
> ct_state_inc();
> atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK);
>
> The cmpxchg() would fail, ultimately causing an IPI for WORK_N to be
> sent. Unfortunately, the work bit would remain set, and it can't be sanely
> cleared in case another CPU set it concurrently - this would ultimately
> lead to a double execution of the callback, one as a deferred callback and
> one in the IPI. As not all IPI callbacks are idempotent, this is
> undesirable.
So adding another atomic is arguably worse.
The thing is, if the NOHZ_FULL CPU is actually doing context transitions
(SYSCALLs etc..) then everything is fundamentally racy, there is no
winning that game, we could find the remote CPU is in-kernel, send an
IPI, the remote CPU does return-to-user and receives the IPI.
And then the USER is upset... because he got an IPI.
The whole NOHZ_FULL thing really only works if userspace does not do
SYSCALLs.
But the sad sad state of affairs is that some people think it is
acceptable to do SYSCALLs while NOHZ_FULL and cry about how slow stuff
is.
next prev parent reply other threads:[~2023-07-05 22:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 18:12 [RFC PATCH 00/14] context_tracking,x86: Defer some IPIs until a user->kernel transition Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 01/14] tracing/filters: Dynamically allocate filter_pred.regex Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 02/14] tracing/filters: Enable filtering a cpumask field by another cpumask Valentin Schneider
2023-07-05 18:54 ` Steven Rostedt
2023-07-05 18:12 ` [RFC PATCH 03/14] tracing/filters: Enable filtering a scalar field by a cpumask Valentin Schneider
2023-07-05 19:00 ` Steven Rostedt
2023-07-05 18:12 ` [RFC PATCH 04/14] tracing/filters: Enable filtering the CPU common " Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 05/14] tracing/filters: Document cpumask filtering Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 06/14] objtool: Flesh out warning related to pv_ops[] calls Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 07/14] objtool: Warn about non __ro_after_init static key usage in .noinstr Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 08/14] BROKEN: context_tracking: Make context_tracking_key __ro_after_init Valentin Schneider
2023-07-05 20:41 ` Peter Zijlstra
2023-07-05 18:12 ` [RFC PATCH 09/14] x86/kvm: Make kvm_async_pf_enabled __ro_after_init Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 10/14] x86/sev-es: Make sev_es_enable_key __ro_after_init Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 11/14] context-tracking: Introduce work deferral infrastructure Valentin Schneider
2023-07-05 22:23 ` Frederic Weisbecker
2023-07-05 22:41 ` Peter Zijlstra
2023-07-06 9:53 ` Frederic Weisbecker
2023-07-06 10:01 ` Frederic Weisbecker
2023-07-06 11:30 ` Valentin Schneider
2023-07-06 11:40 ` Frederic Weisbecker
2023-07-06 16:39 ` Paul E. McKenney
2023-07-06 18:05 ` Valentin Schneider
2023-07-06 11:55 ` Frederic Weisbecker
2023-07-05 22:39 ` Peter Zijlstra [this message]
2023-07-06 11:31 ` Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 12/14] context_tracking,x86: Defer kernel text patching IPIs Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 13/14] context_tracking,x86: Add infrastructure to defer kernel TLBI Valentin Schneider
2023-07-05 18:12 ` [RFC PATCH 14/14] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Valentin Schneider
2023-07-05 18:48 ` [RFC PATCH 00/14] context_tracking,x86: Defer some IPIs until a user->kernel transition Nadav Amit
2023-07-06 11:29 ` Valentin Schneider
2023-07-05 19:03 ` Steven Rostedt
2023-07-06 11:30 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230705223956.GD2813335@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=Jason@zx2c4.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=bristot@redhat.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=dionnaglaze@google.com \
--cc=error27@gmail.com \
--cc=frederic@kernel.org \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=jpoimboe@kernel.org \
--cc=juerg.haefliger@canonical.com \
--cc=julian.pidancet@oracle.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux@weissschuh.net \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=namit@vmware.com \
--cc=nashuiliang@gmail.com \
--cc=npiggin@gmail.com \
--cc=nsaenz@kernel.org \
--cc=nsaenzju@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=urezki@gmail.com \
--cc=vkuznets@redhat.com \
--cc=vschneid@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
--cc=yangjihong1@huawei.com \
--cc=ypodemsk@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox