From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A84EC001DF for ; Mon, 24 Jul 2023 14:52:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AC388E0002; Mon, 24 Jul 2023 10:52:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 935428E0001; Mon, 24 Jul 2023 10:52:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 787CA8E0002; Mon, 24 Jul 2023 10:52:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 63BB48E0001 for ; Mon, 24 Jul 2023 10:52:37 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3678B40A9A for ; Mon, 24 Jul 2023 14:52:37 +0000 (UTC) X-FDA: 81046796754.30.D7901E5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 613D940007 for ; Mon, 24 Jul 2023 14:52:35 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CJj9nVWe; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf11.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690210355; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=U8qgAcO2QGTSng2vxQheQOtEl8EcXWLHdAMLNtxXDWM=; b=lq3DchcBjGWAfvYqMhL6Xh5k+TGwl6diqUM9wvVC2iTjfagSCqSD9rXmgCCrlPnTYBIrK6 cPAM3MbvJ6ZqWnPxwQwCWgQ0EP1kGVtAQlxG20CSjaVFBmKeV6yOxBKQKoBwICmDSpkIuh nrLhIwZrzbC8KxkTYjQ0DcOTp98lVP0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CJj9nVWe; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf11.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690210355; a=rsa-sha256; cv=none; b=acVuDzC5Uz+RRiLVAwUKzZoHvbsSYVZxCvIMMsk0MFDIHXD9y9k5jg187ebW3rg/JIT3w1 GX7z+MB10i+crFU6k8ulLdymYv3zCWS369AhqMB/qF6DneMCHtx2pjxAkO2P54+3lj7QWx QWozTySQxXRu/c6k/55MVr90Gx5Z5u0= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 628AB611D6; Mon, 24 Jul 2023 14:52:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA815C433C8; Mon, 24 Jul 2023 14:52:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690210353; bh=Io85B1vzGj1NiepBNP1CXkgl9j43eYMLYiUZyMtHevU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CJj9nVWeVnJDFBR0XVDtKFcRx4DdPvYzBxJs7NdyqhCdkyJSi7ns/QunyuF5T7ps6 XU2A2MDFy7UQQzJsIx/xDqLe0FliBY/Oeg0aB/OquAS8cepo48SytLWwhoRHBnqAgE wTZhFOxngaubHLo/KhuNmg/icDkWf3KABmSQcw7hRGodIB5PblbvuCT6kpl36W6FZr hIH+s80Togjk4lAfltww4D/tmAcJlqIFj1q84XSHWsxVyk2brxNcyWdq/AUnuQb02p QLpBDuBflcJsU3fMXLT8vIMOo2OPS1Po+gVqEtzb2kBqqlYChJ+9nWv8THywbJoX77 fSGCA/VpqlDFg== Date: Mon, 24 Jul 2023 16:52:19 +0200 From: Frederic Weisbecker To: Valentin Schneider Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Nicolas Saenz Julienne , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: Re: [RFC PATCH v2 15/20] context-tracking: Introduce work deferral infrastructure Message-ID: References: <20230720163056.2564824-1-vschneid@redhat.com> <20230720163056.2564824-16-vschneid@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230720163056.2564824-16-vschneid@redhat.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 613D940007 X-Stat-Signature: xdqsc8pj5oid7u97rgr8nqhj6e4ekfdh X-Rspam-User: X-HE-Tag: 1690210355-565540 X-HE-Meta: U2FsdGVkX18bqEytt/vEfDtyi2x7sETcotPuGPkFCk79kwc2RAnw60Ab8KW2E+QCUrfhAFTeGXKEy0wB+Sqs9cfoB7oOyZQQofMX4VTGxs2Vs4HUsvbkZF2GvgrqR1DuS49E/TJqhw7bo9aMGnlaz5lRxIpDtynQuLg3OjyWJjB0v6999dSyPJIfxsdVpaACpbTMAUhCqI+bNKTKybCmc7lOR/yoVorwLduNKQhe28VqajUMv8LGrKUmiMwjYsFxP+nyIGbtRHceZtZgublxq37kWG1pfV8ukkEPgHaYNShrm+Ys3pTHf3QFclJuc4DWoK2HkACSDHIlsZVvZQz6TbwfD2CcosSDTjM5B6H5TV3TBuFqpvkjbeuX28Ejg3kbM2rljZd1fjtQY1KXg1JFkzo+ihmPnuCrTUFQc+uVXH/nTNHOtFh8IDy+a2nJXDCK8Bo4OaknuqrjS+uuF2yq8avDcevhyRHhH5m1ZPqLVLuyn2pt6SsRD2c6TgBo0i/ieUlKqs4YckyUPPkom3dH2gnYlXL9ls5LjRENGUHN1VtBwbxdS61BxzdxjP7nMfY+eRlTj5SdDXfR35mx7kJ4CJzQlGhybX8cqB1OPIQePYtbd+2C4QZFGI6RfX31VrW90FLzk2ThlK0h1ozc8SNuZAsmAt8RFwi8fqvu2tuv1xQziAatmiVc31P187dfcjBUqYKVW2lA8X8cQKG4l81x4cxDSElbmU3YmQNqj7CeZdMQIUtslBud+UORaubjBtmYzDBwA/9B/sNC0gcRcs/++5FXn7y5mbMgDkwi2w0+D1VvDh9pzb+0QOi1icp2KZXwOFUG8DSmc+fcWu/zdOtWOzulwjvb0BbvUpXDHkDPbvg9C+/lL1Y2BHLP3NN5+cnXvMjpsBt9y53r5ToY++KTbP/bHGjJhryJAgGinwGLhcbVbybtU+zFyys/5c2v+XhvcMoTk/hYcaNaiQb3TUj dkqQpgZS MBP5HHk3hpaKbZh+MVOSD9Uflv9hJRpOaI7qMqmBUZ5hdkAmaidCBZfyHe35EiXOdsLvbrBq8UXqAUAI4OgTfCNA/b+GjYJ2Mk4/52rC6pfzFxDo12RyZlIE2SWcvcDJ2BUlVUBf7bQCsC6WWRf/lclXyQK6UkBAQqoEWOUGFjavqbFQEBQx91yajpBmUdlk+1RJTmQ8gJA/dDyhspBu8SAMQVeAqHVgFEa3t2Ym475o8lUUglHdSVjz96wJmscbporvWTROLG+x6n6RYwO/ZVyrhpcwe+xC+e5xwG2f4UQaQqTrPcxKBOn6w4t0fSCY5u1NT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le Thu, Jul 20, 2023 at 05:30:51PM +0100, Valentin Schneider a écrit : > +enum ctx_state { > + /* Following are values */ > + CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */ > + CONTEXT_KERNEL = 0, > + CONTEXT_IDLE = 1, > + CONTEXT_USER = 2, > + CONTEXT_GUEST = 3, > + CONTEXT_MAX = 4, > +}; > + > +/* > + * We cram three different things within the same atomic variable: > + * > + * CONTEXT_STATE_END RCU_DYNTICKS_END > + * | CONTEXT_WORK_END | > + * | | | > + * v v v > + * [ context_state ][ context work ][ RCU dynticks counter ] > + * ^ ^ ^ > + * | | | > + * | CONTEXT_WORK_START | > + * CONTEXT_STATE_START RCU_DYNTICKS_START Should the layout be displayed in reverse? Well at least I always picture bitmaps in reverse, that's probably due to the direction of the shift arrows. Not sure what is the usual way to picture it though... > + */ > + > +#define CT_STATE_SIZE (sizeof(((struct context_tracking *)0)->state) * BITS_PER_BYTE) > + > +#define CONTEXT_STATE_START 0 > +#define CONTEXT_STATE_END (bits_per(CONTEXT_MAX - 1) - 1) Since you have non overlapping *_START symbols, perhaps the *_END are superfluous? > + > +#define RCU_DYNTICKS_BITS (IS_ENABLED(CONFIG_CONTEXT_TRACKING_WORK) ? 16 : 31) > +#define RCU_DYNTICKS_START (CT_STATE_SIZE - RCU_DYNTICKS_BITS) > +#define RCU_DYNTICKS_END (CT_STATE_SIZE - 1) > +#define RCU_DYNTICKS_IDX BIT(RCU_DYNTICKS_START) Might be the right time to standardize and fix our naming: CT_STATE_START, CT_STATE_KERNEL, CT_STATE_USER, ... CT_WORK_START, CT_WORK_*, ... CT_RCU_DYNTICKS_START, CT_RCU_DYNTICKS_IDX > +bool ct_set_cpu_work(unsigned int cpu, unsigned int work) > +{ > + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); > + unsigned int old; > + bool ret = false; > + > + preempt_disable(); > + > + old = atomic_read(&ct->state); > + /* > + * Try setting the work until either > + * - the target CPU no longer accepts any more deferred work > + * - the work has been set > + * > + * NOTE: CONTEXT_GUEST intersects with CONTEXT_USER and CONTEXT_IDLE > + * as they are regular integers rather than bits, but that doesn't > + * matter here: if any of the context state bit is set, the CPU isn't > + * in kernel context. > + */ > + while ((old & (CONTEXT_GUEST | CONTEXT_USER | CONTEXT_IDLE)) && !ret) That may still miss a recent entry to userspace due to the first plain read, ending with an undesired interrupt. You need at least one cmpxchg. Well, of course that stays racy by nature because between the cmpxchg() returning CONTEXT_KERNEL and the actual IPI raised and received, the remote CPU may have gone to userspace already. But still it limits a little the window. Thanks. > + ret = atomic_try_cmpxchg(&ct->state, &old, old | (work << CONTEXT_WORK_START)); > + > + preempt_enable(); > + return ret; > +} > +#else > +static __always_inline void ct_work_flush(unsigned long work) { } > +static __always_inline void ct_work_clear(struct context_tracking *ct) { } > +#endif > + > /* > * Record entry into an extended quiescent state. This is only to be > * called when not already in an extended quiescent state, that is, > @@ -88,7 +133,8 @@ static noinstr void ct_kernel_exit_state(int offset) > * next idle sojourn. > */ > rcu_dynticks_task_trace_enter(); // Before ->dynticks update! > - seq = ct_state_inc(offset); > + seq = ct_state_inc_clear_work(offset); > + > // RCU is no longer watching. Better be in extended quiescent state! > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & RCU_DYNTICKS_IDX)); > } > @@ -100,7 +146,7 @@ static noinstr void ct_kernel_exit_state(int offset) > */ > static noinstr void ct_kernel_enter_state(int offset) > { > - int seq; > + unsigned long seq; > > /* > * CPUs seeing atomic_add_return() must see prior idle sojourns, > @@ -108,6 +154,7 @@ static noinstr void ct_kernel_enter_state(int offset) > * critical section. > */ > seq = ct_state_inc(offset); > + ct_work_flush(seq); > // RCU is now watching. Better not be in an extended quiescent state! > rcu_dynticks_task_trace_exit(); // After ->dynticks update! > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !(seq & RCU_DYNTICKS_IDX)); > diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig > index bae8f11070bef..fdb266f2d774b 100644 > --- a/kernel/time/Kconfig > +++ b/kernel/time/Kconfig > @@ -181,6 +181,11 @@ config CONTEXT_TRACKING_USER_FORCE > Say N otherwise, this option brings an overhead that you > don't want in production. > > +config CONTEXT_TRACKING_WORK > + bool > + depends on HAVE_CONTEXT_TRACKING_WORK && CONTEXT_TRACKING_USER > + default y > + > config NO_HZ > bool "Old Idle dynticks config" > help > -- > 2.31.1 >