From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AACF1C71156 for ; Tue, 1 Dec 2020 23:04:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 123B52220B for ; Tue, 1 Dec 2020 23:04:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 123B52220B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C5496B005C; Tue, 1 Dec 2020 18:04:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 373626B005D; Tue, 1 Dec 2020 18:04:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28C6C8D0001; Tue, 1 Dec 2020 18:04:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 132076B005C for ; Tue, 1 Dec 2020 18:04:42 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D0F9E181AEF09 for ; Tue, 1 Dec 2020 23:04:41 +0000 (UTC) X-FDA: 77546244762.11.test84_140530a273ae Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id B145F180F8B86 for ; Tue, 1 Dec 2020 23:04:41 +0000 (UTC) X-HE-Tag: test84_140530a273ae X-Filterd-Recvd-Size: 7077 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 1 Dec 2020 23:04:40 +0000 (UTC) Date: Tue, 1 Dec 2020 23:04:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606863879; bh=+P5vOdE+JdhBYVVmnBPwF/BQp7C9Ej9MDqmd8+IjdKI=; h=From:To:Cc:Subject:References:In-Reply-To:From; b=rAt6FAvKgNYwFAgXKvslaKL2L6D4TLQGFxZmDaTouwi97GTnBFyrm06gN7lGsh3ep JNTvMNBGYwgPEMCS26CZ+p8j7CC0jYFldWx8GhpDNTh5+gMpmQzAV9Pn/tYxecOCxN bwTthtT3eT5majrxU3YmECb7LZk3Mpa8R6JOHjpc= From: Will Deacon To: Andy Lutomirski Cc: Catalin Marinas , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Dave Hansen , Nicholas Piggin , LKML , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , linux-arch , linuxppc-dev , Linux-MM , Anton Blanchard Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Message-ID: <20201201230432.GC28496@willie-the-truck> References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-7-npiggin@gmail.com> <20201201212758.GA28300@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 01, 2020 at 01:50:38PM -0800, Andy Lutomirski wrote: > On Tue, Dec 1, 2020 at 1:28 PM Will Deacon wrote: > > > > On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote: > > > other arch folk: there's some background here: > > > > > > https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com > > > > > > On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski wrote: > > > > > > > > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote: > > > > > > > > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > > > > > > > > > > > > On big systems, the mm refcount can become highly contented when doing > > > > > > a lot of context switching with threaded applications (particularly > > > > > > switching between the idle thread and an application thread). > > > > > > > > > > > > Abandoning lazy tlb slows switching down quite a bit in the important > > > > > > user->idle->user cases, so so instead implement a non-refcounted scheme > > > > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > > > > > > any remaining lazy ones. > > > > > > > > > > > > Shootdown IPIs are some concern, but they have not been observed to be > > > > > > a big problem with this scheme (the powerpc implementation generated > > > > > > 314 additional interrupts on a 144 CPU system during a kernel compile). > > > > > > There are a number of strategies that could be employed to reduce IPIs > > > > > > if they turn out to be a problem for some workload. > > > > > > > > > > I'm still wondering whether we can do even better. > > > > > > > > > > > > > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes > > > > the TLB. On x86, this will shoot down all lazies as long as even a > > > > single pagetable was freed. (Or at least it will if we don't have a > > > > serious bug, but the code seems okay. We'll hit pmd_free_tlb, which > > > > sets tlb->freed_tables, which will trigger the IPI.) So, on > > > > architectures like x86, the shootdown approach should be free. The > > > > only way it ought to have any excess IPIs is if we have CPUs in > > > > mm_cpumask() that don't need IPI to free pagetables, which could > > > > happen on paravirt. > > > > > > Indeed, on x86, we do this: > > > > > > [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d > > > [ 11.559905] tlb_finish_mmu+0x10e/0x1a0 > > > [ 11.561068] exit_mmap+0xc8/0x1a0 > > > [ 11.561932] mmput+0x29/0xd0 > > > [ 11.562688] do_exit+0x316/0xa90 > > > [ 11.563588] do_group_exit+0x34/0xb0 > > > [ 11.564476] __x64_sys_exit_group+0xf/0x10 > > > [ 11.565512] do_syscall_64+0x34/0x50 > > > > > > and we have info->freed_tables set. > > > > > > What are the architectures that have large systems like? > > > > > > x86: we already zap lazies, so it should cost basically nothing to do > > > a little loop at the end of __mmput() to make sure that no lazies are > > > left. If we care about paravirt performance, we could implement one > > > of the optimizations I mentioned above to fix up the refcounts instead > > > of sending an IPI to any remaining lazies. > > > > > > arm64: AFAICT arm64's flush uses magic arm64 hardware support for > > > remote flushes, so any lazy mm references will still exist after > > > exit_mmap(). (arm64 uses lazy TLB, right?) So this is kind of like > > > the x86 paravirt case. Are there large enough arm64 systems that any > > > of this matters? > > > > Yes, there are large arm64 systems where performance of TLB invalidation > > matters, but they're either niche (supercomputers) or not readily available > > (NUMA boxes). > > > > But anyway, we blow away the TLB for everybody in tlb_finish_mmu() after > > freeing the page-tables. We have an optimisation to avoid flushing if > > we're just unmapping leaf entries when the mm is going away, but we don't > > have a choice once we get to actually reclaiming the page-tables. > > > > One thing I probably should mention, though, is that we don't maintain > > mm_cpumask() because we're not able to benefit from it and the atomic > > update is a waste of time. > > Do you do anything special for lazy TLB or do you just use the generic > code? (i.e. where do your user pagetables point when you go from a > user task to idle or to a kernel thread?) We don't do anything special (there's something funny with the PAN emulation but you can ignore that); the page-table just points wherever it did before for userspace. Switching explicitly to the init_mm, however, causes us to unmap userspace entirely. Since we have ASIDs, switch_mm() generally doesn't have to care about the TLBs at all. > Do you end up with all cpus set in mm_cpumask or can you have the mm > loaded on a CPU that isn't in mm_cpumask? I think the mask is always zero (we never set anything in there). Will