From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2109C761A6 for ; Tue, 4 Apr 2023 15:12:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DCF66B0071; Tue, 4 Apr 2023 11:12:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 565756B0072; Tue, 4 Apr 2023 11:12:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E0686B0074; Tue, 4 Apr 2023 11:12:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 263846B0071 for ; Tue, 4 Apr 2023 11:12:54 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 94693140E80 for ; Tue, 4 Apr 2023 15:12:53 +0000 (UTC) X-FDA: 80644051026.16.BB2D07C Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf23.hostedemail.com (Postfix) with ESMTP id 7C512140019 for ; Tue, 4 Apr 2023 15:12:51 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=bcnFN54+; spf=none (imf23.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680621171; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZvG/eQpMaXzqTjgxevrqYb3nMrHM03KndTh2qrpj9U0=; b=iMNCuYBZS3BSdoSS0cm5AZkgEYVX41Z/aS9O2Ktl7TXjp8UiloTdHw24eL0Mc0gpXRYze7 1JI9i1MVAmrTRBMUwx8HyE8jJGT1mVKlUmOyuVmhJSrTNtdlVyW50VNq5j0gsQE0Gm4SgU ih5e+TwmUXgwiWdsw4NuN2Uiv6PHLcs= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=bcnFN54+; spf=none (imf23.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680621171; a=rsa-sha256; cv=none; b=GZifTChqMq9pGZ5ShjNjuN2cmsw5EDqFNcdwWeeHaWu4aDBLZsTyMfAQg8is2g4XTepzSS N1JWaMowim14SpNcY7TpS+K2VOM7xTvc6ym4I/N47QaJ4oWxs+5lNVYv7/Pqc+zIHKf/28 aDofCcsrwTHCG6TZcUhXLQjbqv6Dp6c= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZvG/eQpMaXzqTjgxevrqYb3nMrHM03KndTh2qrpj9U0=; b=bcnFN54+FqSTJ1jVRh6m09inYu noe6Iv3bQ7qijhJ/b9UGGN6Xt79Ty18dyrnQsn/wOzNoAcusUKSZugfCQZJIfcivHmKl5Htj45Qwd ccGGSAQj9syXjLSIRXbEvQsrpf9p0Xy0FekbeJFs47DIWWemWhZzBtG9rQKFRcwjv+cHhFYk66JGV 2HCsJDNMPUXX7me+AUz2d2QBnIrseciExrbE14eKjKA7MZLsO/Q8Jbd4n6OnSKkaSOHQYkG6uYpr6 koyhLd7cLRwuUjPUq+GJx03ybD0cikWGzZymBFhZWnuULPo+uvY3cWeZ1M+SD+eS8UBWnzMmymLz6 5MfnUH9A==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1pjiKY-00FSz8-RV; Tue, 04 Apr 2023 15:12:19 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 084FA300338; Tue, 4 Apr 2023 17:12:18 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id E4AEB26442AC2; Tue, 4 Apr 2023 17:12:17 +0200 (CEST) Date: Tue, 4 Apr 2023 17:12:17 +0200 From: Peter Zijlstra To: Yair Podemsky Cc: linux@armlinux.org.uk, mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, davem@davemloft.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, will@kernel.org, aneesh.kumar@linux.ibm.com, akpm@linux-foundation.org, arnd@arndb.de, keescook@chromium.org, paulmck@kernel.org, jpoimboe@kernel.org, samitolvanen@google.com, frederic@kernel.org, ardb@kernel.org, juerg.haefliger@canonical.com, rmk+kernel@armlinux.org.uk, geert+renesas@glider.be, tony@atomide.com, linus.walleij@linaro.org, sebastian.reichel@collabora.com, nick.hawkins@hpe.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, mtosatti@redhat.com, vschneid@redhat.com, dhildenb@redhat.com, alougovs@redhat.com, Frederic Weisbecker Subject: Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode Message-ID: <20230404151217.GB297936@hirez.programming.kicks-ass.net> References: <20230404134224.137038-1-ypodemsk@redhat.com> <20230404134224.137038-4-ypodemsk@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230404134224.137038-4-ypodemsk@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7C512140019 X-Stat-Signature: m7u11pdej8da4hm4b6696wy6wuizyjx8 X-HE-Tag: 1680621171-735206 X-HE-Meta: U2FsdGVkX18AhUSvJgBlCJHTNruBX2u/r9gzE/4RcYEUv/VfCrkIMx/64Qymk6jAJ71N6+OzHCX4rcpkvu7vYTuRARv7X7hvF4RI2h+FtjRt0cU1ye8JGldAEpBZlam8g2rZUYxSAltt4dIhiRXlg5fccZNpWtcbZAmesHgJhPKQviFS+oppE5oyZIw8PTPQY7TmpS3GEb09rHiw3g/VgTyuvUCBOXIAWFRkJ6MUL7BW4SdS6fSEhOfOzXACW17zssxiZG7w9Vq+oThfmmVz7D9JpotDmzjR2OUUONaCOe39U0SVFpCz/XyS/vBgw9CV4RW9DF52Q5iaqHxM40lUc+4PE6poQrTqNWJAJ98sfpsKaZCFA4B5rQVAzm3HJg6E+0hjW8WgAgd3lHtr3ziJ1jWHgw7H6u/e6OE3afdWpf/WnXJTtvZZyJRahaNrNFRK7NwY7ThP94v3+yQxPpyq+p/fdKRpSzgGosIQYuw9X+3iebFg0KxWc4h9Y+IrPCileNI4AVZnm5CyKaWnk3LG3tEGO5YF7cARLfOmDyConqPDxpC6Ya+dpqYrGAUifMcVXOw/ZWAzySPRtJq76/bV3aBapotVvps79RBP3RdMtWLncCfL1lFDGF+XALf52sUTBHnKKm9rCQpWq0KNAhE+dQLAcIPg1ew2SWMsRxED3q+KPJgWSWFvT9KYMWuvIy0G0gmT1QBLnvB1c+NY5M/96U5fxBHsOegq6C25k/oQpa4zjLRLJw9sk0TdgiatnxyVDSaFAsIONPgmHMQu+eSuj4fY2AwPM8ra96Z0NdoT56Fv6wuG23pPnQESHJpZ6hJPf9hD7xEzIdrpWU8wY6Fqr1HFf90omY5YYJuwI0ew7lHSGBH5wS89cP72Osnc9cHzNv8tkz14TShvpD/8jZ4yzLICE2pmTj7U7G8Cx9yE7ugYeNYl44zeZxhLcy3bvksSNuhxA+xZME87F16lkN0 2fixmYlg 5xhYSRKcgC3UEwFSlMaRjWwfyRDUbx5pczW10TDrCaH54VaC1YV3Tfd7G0N2TZ9ujkvtSlxo6JeTAoSVVE8fBgN2tAVtdLrtG1Y+XLcytLs+Fu1vg70rU77og9/+1mqL2vwISXxRjonm+O7MYL5OjkgZcC+iMoDB9HI2wFJaKz52tJPdUk9Pfg9U8jp/4MIKRwhQYQ+PLRUncgF0Jn+XTN1hfUg6pQPwOiDiOqooyC8FVKzPjXFSe99PifJp2M2r4UlScq28c+56EnAemXgUJfJZbXNbFEvftSeRg5/V3F1FUeAXrcqnVRbf0xIc1WOFNbm9VyLYZvWgpXqn8bre/rmZ0Sj9d+6Fy4Cfc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 04, 2023 at 04:42:24PM +0300, Yair Podemsky wrote: > The tlb_remove_table_smp_sync IPI is used to ensure the outdated tlb page > is not currently being accessed and can be cleared. > This occurs once all CPUs have left the lockless gup code section. > If they reenter the page table walk, the pointers will be to the new > pages. > Therefore the IPI is only needed for CPUs in kernel mode. > By preventing the IPI from being sent to CPUs not in kernel mode, > Latencies are reduced. > > Race conditions considerations: > The context state check is vulnerable to race conditions between the > moment the context state is read to when the IPI is sent (or not). > > Here are these scenarios. > case 1: > CPU-A CPU-B > > state == CONTEXT_KERNEL > int state = atomic_read(&ct->state); > Kernel-exit: > state == CONTEXT_USER > if (state & CT_STATE_MASK == CONTEXT_KERNEL) > > In this case, the IPI will be sent to CPU-B despite it is no longer in > the kernel. The consequence of which would be an unnecessary IPI being > handled by CPU-B, causing a reduction in latency. > This would have been the case every time without this patch. > > case 2: > CPU-A CPU-B > > modify pagetables > tlb_flush (memory barrier) > state == CONTEXT_USER > int state = atomic_read(&ct->state); > Kernel-enter: > state == CONTEXT_KERNEL > READ(pagetable values) > if (state & CT_STATE_MASK == CONTEXT_USER) > > In this case, the IPI will not be sent to CPU-B despite it returning to > the kernel and even reading the pagetable. > However since this CPU-B has entered the pagetable after the > modification it is reading the new, safe values. > > The only case when this IPI is truly necessary is when CPU-B has entered > the lockless gup code section before the pagetable modifications and > has yet to exit them, in which case it is still in the kernel. > > Signed-off-by: Yair Podemsky > --- > mm/mmu_gather.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c > index 5ea9be6fb87c..731d955e152d 100644 > --- a/mm/mmu_gather.c > +++ b/mm/mmu_gather.c > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -191,6 +192,20 @@ static void tlb_remove_table_smp_sync(void *arg) > /* Simply deliver the interrupt */ > } > > + > +#ifdef CONFIG_CONTEXT_TRACKING > +static bool cpu_in_kernel(int cpu, void *info) > +{ > + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); > + int state = atomic_read(&ct->state); > + /* will return true only for cpus in kernel space */ > + return state & CT_STATE_MASK == CONTEXT_KERNEL; > +} > +#define CONTEXT_PREDICATE cpu_in_kernel > +#else > +#define CONTEXT_PREDICATE NULL > +#endif /* CONFIG_CONTEXT_TRACKING */ > + > #ifdef CONFIG_ARCH_HAS_CPUMASK_BITS > #define REMOVE_TABLE_IPI_MASK mm_cpumask(mm) > #else > @@ -206,8 +221,8 @@ void tlb_remove_table_sync_one(struct mm_struct *mm) > * It is however sufficient for software page-table walkers that rely on > * IRQ disabling. > */ > - on_each_cpu_mask(REMOVE_TABLE_IPI_MASK, tlb_remove_table_smp_sync, > - NULL, true); > + on_each_cpu_cond_mask(CONTEXT_PREDICATE, tlb_remove_table_smp_sync, > + NULL, true, REMOVE_TABLE_IPI_MASK); > } I think this is correct; but... I would like much of the changelog included in a comment above cpu_in_kernel(). I'm sure someone will try and read this code and wonder about those race conditions. Of crucial importance is the fact that the page-table modification comes before the tlbi. Also, do we really not already have this helper function somewhere, it seems like something obvious to already have, Frederic?