From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC12AC433FE for ; Thu, 2 Sep 2021 22:29:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 980A0610A0 for ; Thu, 2 Sep 2021 22:29:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 980A0610A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DFA3F8D0002; Thu, 2 Sep 2021 18:29:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA82B8D0001; Thu, 2 Sep 2021 18:29:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C97528D0002; Thu, 2 Sep 2021 18:29:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id BA1B48D0001 for ; Thu, 2 Sep 2021 18:29:16 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7C67F82F12D4 for ; Thu, 2 Sep 2021 22:29:16 +0000 (UTC) X-FDA: 78544075512.30.398731C Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf19.hostedemail.com (Postfix) with ESMTP id 03D3BB000093 for ; Thu, 2 Sep 2021 22:29:15 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 6571D61041; Thu, 2 Sep 2021 22:29:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1630621755; bh=ZYUoj8uNbeoGQhvVrrklATONEnS12dpU/powW3BhYJs=; h=In-Reply-To:References:Date:From:To:Subject:From; b=qZmOlK8nLgVVavDE0Pb4n0BkLdOazut3Iiz1EENpInxQ98+Vuj+Aia0pdGiAqNAcT YNaLdpgS0MFpdpb8JLvWlaWM8/Dc3lPFlYX1ymnpympgU3plxCZJ7g5ivTbCdMQDJZ oMZVQ9mWjGdMGFbKx2gFO9/FroA8MVgOH+RRcVEUmFpnmVTm5Zoh+E15414lu/J+x0 MP00K1fJJfV/xh5vE8d2ulrcE2jqfqdNISbelX+l0K9EJvNDI+mTiS1jO1BADig/RH CX6m0KO8wmSvlJwuVFV1ZyU2fV0tVTWYB+KRiralLPnuzlh6RlVzX8JM4tz6LF8lio AGP60EzFj+iXw== Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 774EF27C0054; Thu, 2 Sep 2021 18:29:13 -0400 (EDT) Received: from imap2 ([10.202.2.52]) by compute6.internal (MEProxy); Thu, 02 Sep 2021 18:29:13 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddruddviedgudduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgfgsehtqhertderreejnecuhfhrohhmpedftehn ugihucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecugg ftrfgrthhtvghrnhepuefgueefveekhedvtdffgfekleehgfekheevteegieekgeehiedv fffgjeetudfhnecuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucevlhhushhtvghruf hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnhguhidomhgvshhmthhprghu thhhphgvrhhsohhnrghlihhthidqudduiedukeehieefvddqvdeifeduieeitdekqdhluh htoheppehkvghrnhgvlhdrohhrgheslhhinhhugidrlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 6DA71A002E4; Thu, 2 Sep 2021 18:29:12 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-1126-g6962059b07-fm-20210901.001-g6962059b Mime-Version: 1.0 Message-Id: <18b7e206-9ee6-4afe-b662-9dcbdf55a9db@www.fastmail.com> In-Reply-To: <20210902215620._WXglfIJy%akpm@linux-foundation.org> References: <20210902215620._WXglfIJy%akpm@linux-foundation.org> Date: Thu, 02 Sep 2021 15:28:52 -0700 From: "Andy Lutomirski" To: "Andrew Morton" , anton@ozlabs.org, "Benjamin Herrenschmidt" , linux-mm@kvack.org, mm-commits@vger.kernel.org, "Nicholas Piggin" , paulus@ozlabs.org, "Randy Dunlap" , "Linus Torvalds" Subject: =?UTF-8?Q?Re:_[patch_119/212]_lazy_tlb:_shoot_lazies,_a_non-refcounting_?= =?UTF-8?Q?lazy_tlb_option?= Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qZmOlK8n; spf=pass (imf19.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 03D3BB000093 X-Stat-Signature: emhtemu864nib3db1dwht44j9ompigfm X-HE-Tag: 1630621755-115250 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 2, 2021, at 2:56 PM, Andrew Morton wrote: > From: Nicholas Piggin > Subject: lazy tlb: shoot lazies, a non-refcounting lazy tlb option >=20 > On big systems, the mm refcount can become highly contented when doing= a > lot of context switching with threaded applications (particularly > switching between the idle thread and an application thread). >=20 > Abandoning lazy tlb slows switching down quite a bit in the important > user->idle->user cases, so instead implement a non-refcounted scheme t= hat > causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down any > remaining lazy ones. >=20 > Shootdown IPIs are some concern, but they have not been observed to be= a > big problem with this scheme (the powerpc implementation generated 314 > additional interrupts on a 144 CPU system during a kernel compile). T= here > are a number of strategies that could be employed to reduce IPIs if th= ey > turn out to be a problem for some workload. This pile is: Nacked-by: Andy Lutomirski For reasons that have been discussed previously. My series is still in p= rogress. It=E2=80=99s moving slowly for two reasons. First, I have lim= ited time to work on it. Second, the existing mm refcounting is a giant = pile of worms, and that needs fixing one way or another before we add ye= t more complexity. For example, has anyone noticed that kthread mms are = refcounted using different rules than everything else? Even if my modified refcounting scheme isn=E2=80=99t the eventual winner= , the prerequisite cleanups are still prerequisites. I absolutely nack a= nything that adds yet more nonsensical complexity to the existing scheme= , makes it substantially more fragile, and does not fix the underlying c= rap that makes speeding up responsibly such a mess. Nick or anyone else, you=E2=80=99re welcome to finish up my series (and = I can give pointers) or you can wait. >=20 > [npiggin@gmail.com: update comments] > Link: https://lkml.kernel.org/r/1623121901.mszkmmum0n.astroid@bobo.n= one > Link: https://lkml.kernel.org/r/20210605014216.446867-4-npiggin@gmail.= com > Signed-off-by: Nicholas Piggin > Cc: Anton Blanchard > Cc: Andy Lutomirski > Cc: Randy Dunlap > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Signed-off-by: Andrew Morton > --- >=20 > arch/Kconfig | 14 +++++++++++++ > kernel/fork.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 65 insertions(+) >=20 > --- a/arch/Kconfig~lazy-tlb-shoot-lazies-a-non-refcounting-lazy-tlb-op= tion > +++ a/arch/Kconfig > @@ -438,6 +438,20 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM > # to a kthread ->active_mm (non-arch code has been converted already). > config MMU_LAZY_TLB_REFCOUNT > def_bool y > + depends on !MMU_LAZY_TLB_SHOOTDOWN > + > +# This option allows MMU_LAZY_TLB_REFCOUNT=3Dn. It ensures no CPUs ar= e using an > +# mm as a lazy tlb beyond its last reference count, by shooting down = these > +# users before the mm is deallocated. __mmdrop() first IPIs all CPUs = that may > +# be using the mm as a lazy tlb, so that they may switch themselves t= o using > +# init_mm for their active mm. mm_cpumask(mm) is used to determine wh= ich CPUs > +# may be using mm as a lazy tlb mm. > +# > +# To implement this, an arch must ensure mm_cpumask(mm) contains at l= east all > +# possible CPUs in which the mm is lazy, and it must meet the require= ments for > +# MMU_LAZY_TLB_REFCOUNT=3Dn (see above). > +config MMU_LAZY_TLB_SHOOTDOWN > + bool > =20 > config ARCH_HAVE_NMI_SAFE_CMPXCHG > bool > --- a/kernel/fork.c~lazy-tlb-shoot-lazies-a-non-refcounting-lazy-tlb-o= ption > +++ a/kernel/fork.c > @@ -674,6 +674,53 @@ static void check_mm(struct mm_struct *m > #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) > #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) > =20 > +static void do_shoot_lazy_tlb(void *arg) > +{ > + struct mm_struct *mm =3D arg; > + > + if (current->active_mm =3D=3D mm) { > + WARN_ON_ONCE(current->mm); > + current->active_mm =3D &init_mm; > + switch_mm(mm, &init_mm, current); > + } > +} > + > +static void do_check_lazy_tlb(void *arg) > +{ > + struct mm_struct *mm =3D arg; > + > + WARN_ON_ONCE(current->active_mm =3D=3D mm); > +} > + > +static void shoot_lazy_tlbs(struct mm_struct *mm) > +{ > + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { > + /* > + * IPI overheads have not found to be expensive, but they could > + * be reduced in a number of possible ways, for example (in > + * roughly increasing order of complexity): > + * - A batch of mms requiring IPIs could be gathered and freed > + * at once. > + * - CPUs could store their active mm somewhere that can be > + * remotely checked without a lock, to filter out > + * false-positives in the cpumask. > + * - After mm_users or mm_count reaches zero, switching away > + * from the mm could clear mm_cpumask to reduce some IPIs > + * (some batching or delaying would help). > + * - A delayed freeing and RCU-like quiescing sequence based on > + * mm switching to avoid IPIs completely. > + */ > + on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1); > + if (IS_ENABLED(CONFIG_DEBUG_VM)) > + on_each_cpu(do_check_lazy_tlb, (void *)mm, 1); > + } else { > + /* > + * In this case, lazy tlb mms are refounted and would not reach > + * __mmdrop until all CPUs have switched away and mmdrop()ed. > + */ > + } > +} > + > /* > * Called when the last reference to the mm > * is dropped: either by a lazy thread or by > @@ -683,6 +730,10 @@ void __mmdrop(struct mm_struct *mm) > { > BUG_ON(mm =3D=3D &init_mm); > WARN_ON_ONCE(mm =3D=3D current->mm); > + > + /* Ensure no CPUs are using this as their lazy tlb mm */ > + shoot_lazy_tlbs(mm); > + > WARN_ON_ONCE(mm =3D=3D current->active_mm); > mm_free_pgd(mm); > destroy_context(mm); > _ >=20