From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C985EB64DA for ; Wed, 5 Jul 2023 22:40:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 903328D0002; Wed, 5 Jul 2023 18:40:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B30E8D0001; Wed, 5 Jul 2023 18:40:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77A948D0002; Wed, 5 Jul 2023 18:40:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 67D138D0001 for ; Wed, 5 Jul 2023 18:40:39 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A8171A0350 for ; Wed, 5 Jul 2023 22:40:38 +0000 (UTC) X-FDA: 80979028956.02.A109941 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf21.hostedemail.com (Postfix) with ESMTP id 5C4EE1C0006 for ; Wed, 5 Jul 2023 22:40:36 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jj71nrv2; dmarc=none; spf=none (imf21.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688596836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Eq7GFzHwB8dLC3Kv8OaOR922WQenfWCiVLZ0gHnT8To=; b=xu0C1U3T2N3x5cDdh67O834BAPtyuy7aT0TryT9uuEWbtNVUdyHx/wakml2rJxDLwPCnPc SEOkxo+s3/CrZX6XFnp0+mGO4H+8ByB7g5bQgcoVfhdV1NI282d1GPEnj+Bbm9SU0t4HYs 6tmd8JqNrgWHx/cFFG68vTgc77UAxHA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jj71nrv2; dmarc=none; spf=none (imf21.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688596836; a=rsa-sha256; cv=none; b=k3TKjQRt+r5mx6nJdHX8/pPd+5/K8UaC7s3WKJNhEqGSi2VLMHSlCTdpciaCcT5kM1S5dr tvPK+U1XvHEJ3MbtufRG3xHxzMQA7Ls5+Sq11XSaJfwQR86dVqNrVjJPXcyy8TBvkPvmve xZMG508B8jz+JXSB+EB4ak7XCiVxRkA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Eq7GFzHwB8dLC3Kv8OaOR922WQenfWCiVLZ0gHnT8To=; b=jj71nrv2yDZIZoyg0kgVc66xE+ LgxCxU8C3fuH26VNB+BolbkGuVU9YQIXj20ZwFO64vsBGGEEJyftWrEe1H2wtXhJkrECC6tXXmpTC buncs5BZihDY0ZiB/ikj8+qLq1vHaJoH4+na0H46uiJTjlFUIwcLMqsgG+s76HNOO6xmn+TbUOK6E Tk4ZToHymU/7/o6NPWhwRH+PyI8I57c0F7UfGYdUWtoRGlrezsIN+2ej4RPG5VKvDaxg5DC6cjPSf Zi1KLvtF8wCyi36QOgM3yKcycOkWi09rB/VjPbOwKfGbzqIK+PEmWVczyHGZE5G5O6BfTWV6JMcFt 95hID4oQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qHBAH-00AUts-3n; Wed, 05 Jul 2023 22:40:01 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 7F4E83001E7; Thu, 6 Jul 2023 00:39:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 6968B200BDD20; Thu, 6 Jul 2023 00:39:56 +0200 (CEST) Date: Thu, 6 Jul 2023 00:39:56 +0200 From: Peter Zijlstra To: Valentin Schneider Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, Nicolas Saenz Julienne , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: Re: [RFC PATCH 11/14] context-tracking: Introduce work deferral infrastructure Message-ID: <20230705223956.GD2813335@hirez.programming.kicks-ass.net> References: <20230705181256.3539027-1-vschneid@redhat.com> <20230705181256.3539027-12-vschneid@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230705181256.3539027-12-vschneid@redhat.com> X-Rspam-User: X-Stat-Signature: fhocd5b7mftss438xemfxoikneorac8e X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5C4EE1C0006 X-HE-Tag: 1688596835-343066 X-HE-Meta: U2FsdGVkX1+a5fnRFRvmcgw3TFZCegdUOEcOy3iBPcUlR+EPfsdpXmkr25lhp3SGH5l+9lrNDwt/0YBPaDPVexg+BpjWs0bEhg0PIYwiQrRIQ8haIKAguLCm1GZTIMvrDM6pu0MndOKL1zgH3m0VsoYPKrRvkP0oEzr+w0ZSvQh3ujEnexlUCDhek7Ce2mC+FawsZDbtHEDzX8fd3uFnWFa7sZ5zCeSbIOAePi4I+AzmMdB45hL51o4faX12WU+ofaqTSHOVXTsrfLTsMIv29WcC9GGaksTsecqpA/i3qnrGNPqE6cS4gOxZ/64C35luzWkPhnftrY7nJJ3VGUM6DIrA69VOt9DAx0OzObolh0SHgGsu8/JmPw1amHx9m9fm/kHrlLSclPVFEh41AjrAUVV2E7GchHcRSbpnemh7HMkMjCdN7h/kTVrJfjUO5C5uqq1pcDpcGVk+8Dnp9zBNHGeC3EpJiwhKA0wyVid76tQik/8/hH/w1s9TBy40Dn89Ko47VV68aGrARjRnmkwNeJ5B4TD0vLJA+gVeuj4EuWDLga2r+1T7Yk57Di+o+VcYerewishJr9v4PFnx60p+zqd7/UWokIQD5cjBXNl1lqHqWBx686tUCXVn2yWEKcVaiqbq3mDwX4NPyPg2q5NBQGxj7vGKEHhh+KEko9izieDaQTmJLr9H2cVXWfYwlt2hbNqM23Q3jfiQXgwOzJssZRQh0IJn1kZNZZbFR89PWMcL4vvFnQiHmUDJT5/c7oy7itbRcgqjNHF3QQXPYVbU5UzPraGmmKO2cLN05dSs5UVuEj0PlZ3XjWS0o1CBzWMACMAj24L9Mc6RPIUc7cnZ/vFFVe4qCLmSEkCw9QeT6iPizG2n/oSCX2j/njnEx81uqMitR2tpA6Jfd/UAopYdSvhpfTasWSjdfSNQUbUqS3cW8FtYlilKK715dQKMlggyEZMamXBct1i9HNdleFE V7KPBWdH 8/vbRdShdZrqQhYH2pG48kbBaP/dVXE9YLgOeT8l9AKQ9QFLHcmLaLjp6AwLyO0NfCP+ZxanokLKDSoXcs4Adq2bfHVLBSDMz7dNsuLMpJl9go+18t5pmUvQFom0idlNBM4eSR/tCK27tsiQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 05, 2023 at 07:12:53PM +0100, Valentin Schneider wrote: > Note: A previous approach by PeterZ [1] used an extra bit in > context_tracking.state to flag the presence of deferred callbacks to > execute, and the actual callbacks were stored in a separate atomic > variable. > > This meant that the atomic read of context_tracking.state was sufficient to > determine whether there are any deferred callbacks to execute. > Unfortunately, it presents a race window. Consider the work setting > function as: > > preempt_disable(); > seq = atomic_read(&ct->seq); > if (__context_tracking_seq_in_user(seq)) { > /* ctrl-dep */ > atomic_or(work, &ct->work); > ret = atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK); > } > preempt_enable(); > > return ret; > > Then the following can happen: > > CPUx CPUy > CT_SEQ_WORK \in context_tracking.state > atomic_or(WORK_N, &ct->work); > ct_kernel_enter() > ct_state_inc(); > atomic_try_cmpxchg(&ct->seq, &seq, seq|CT_SEQ_WORK); > > The cmpxchg() would fail, ultimately causing an IPI for WORK_N to be > sent. Unfortunately, the work bit would remain set, and it can't be sanely > cleared in case another CPU set it concurrently - this would ultimately > lead to a double execution of the callback, one as a deferred callback and > one in the IPI. As not all IPI callbacks are idempotent, this is > undesirable. So adding another atomic is arguably worse. The thing is, if the NOHZ_FULL CPU is actually doing context transitions (SYSCALLs etc..) then everything is fundamentally racy, there is no winning that game, we could find the remote CPU is in-kernel, send an IPI, the remote CPU does return-to-user and receives the IPI. And then the USER is upset... because he got an IPI. The whole NOHZ_FULL thing really only works if userspace does not do SYSCALLs. But the sad sad state of affairs is that some people think it is acceptable to do SYSCALLs while NOHZ_FULL and cry about how slow stuff is.