From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48997C64E7B for ; Mon, 30 Nov 2020 18:32:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B8D5F207F7 for ; Mon, 30 Nov 2020 18:32:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="E+N75pAf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8D5F207F7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3B9A36B0036; Mon, 30 Nov 2020 13:32:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36ABD6B006C; Mon, 30 Nov 2020 13:32:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 280F48D0001; Mon, 30 Nov 2020 13:32:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 1477F6B0036 for ; Mon, 30 Nov 2020 13:32:09 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D46C21839A9C0 for ; Mon, 30 Nov 2020 18:32:08 +0000 (UTC) X-FDA: 77541929136.07.class83_3a12437273a4 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id BB9021839A2E1 for ; Mon, 30 Nov 2020 18:32:08 +0000 (UTC) X-HE-Tag: class83_3a12437273a4 X-Filterd-Recvd-Size: 6528 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Nov 2020 18:32:07 +0000 (UTC) Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 79AEA20855 for ; Mon, 30 Nov 2020 18:32:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606761126; bh=1+dkmgdZJ7A0XGOrHZ54DD0H5z4iBA9FoEkfBleyJuY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=E+N75pAfswxeCzlmUfdYFaAfJdyhCr88GdgNA+vYDu+vRp/DMAd3A1tTDLo8eOe3l jq1Q25CW+GhROdJlSMw44bGNpCfAk3NZSjtqSvcVqfYt2bwNgIyKHg8LwAdjgroeU1 Of/Ltl09+ycHVRyBE7iH8o3f/rVW310krhjO43ns= Received: by mail-wm1-f51.google.com with SMTP id f190so341287wme.1 for ; Mon, 30 Nov 2020 10:32:06 -0800 (PST) X-Gm-Message-State: AOAM533FM7EPe7wEcQe5si+JfT2lh+gvtL2GRbBxX9vf+WyP01rYHv80 Qa7K2aHEDdsv2j83tuWtUI1M12/Bxx5z7fzlWou2OQ== X-Google-Smtp-Source: ABdhPJw56yUaWzacL4hoCkQBOBa3nkJxkwwfEXbkRg72FmSGtd4KgK5OT4oDDcgUyJ0h6IZ4izTuF89S8vIfMMUXW10= X-Received: by 2002:a7b:c303:: with SMTP id k3mr134104wmj.21.1606761125012; Mon, 30 Nov 2020 10:32:05 -0800 (PST) MIME-Version: 1.0 References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-7-npiggin@gmail.com> In-Reply-To: From: Andy Lutomirski Date: Mon, 30 Nov 2020 10:31:51 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Andy Lutomirski , Will Deacon , Catalin Marinas , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Dave Hansen Cc: Nicholas Piggin , LKML , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , linux-arch , linuxppc-dev , Linux-MM , Anton Blanchard Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: other arch folk: there's some background here: https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski wrote: > > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote: > > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > > > > > > On big systems, the mm refcount can become highly contented when doing > > > a lot of context switching with threaded applications (particularly > > > switching between the idle thread and an application thread). > > > > > > Abandoning lazy tlb slows switching down quite a bit in the important > > > user->idle->user cases, so so instead implement a non-refcounted scheme > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > > > any remaining lazy ones. > > > > > > Shootdown IPIs are some concern, but they have not been observed to be > > > a big problem with this scheme (the powerpc implementation generated > > > 314 additional interrupts on a 144 CPU system during a kernel compile). > > > There are a number of strategies that could be employed to reduce IPIs > > > if they turn out to be a problem for some workload. > > > > I'm still wondering whether we can do even better. > > > > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes > the TLB. On x86, this will shoot down all lazies as long as even a > single pagetable was freed. (Or at least it will if we don't have a > serious bug, but the code seems okay. We'll hit pmd_free_tlb, which > sets tlb->freed_tables, which will trigger the IPI.) So, on > architectures like x86, the shootdown approach should be free. The > only way it ought to have any excess IPIs is if we have CPUs in > mm_cpumask() that don't need IPI to free pagetables, which could > happen on paravirt. Indeed, on x86, we do this: [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d [ 11.559905] tlb_finish_mmu+0x10e/0x1a0 [ 11.561068] exit_mmap+0xc8/0x1a0 [ 11.561932] mmput+0x29/0xd0 [ 11.562688] do_exit+0x316/0xa90 [ 11.563588] do_group_exit+0x34/0xb0 [ 11.564476] __x64_sys_exit_group+0xf/0x10 [ 11.565512] do_syscall_64+0x34/0x50 and we have info->freed_tables set. What are the architectures that have large systems like? x86: we already zap lazies, so it should cost basically nothing to do a little loop at the end of __mmput() to make sure that no lazies are left. If we care about paravirt performance, we could implement one of the optimizations I mentioned above to fix up the refcounts instead of sending an IPI to any remaining lazies. arm64: AFAICT arm64's flush uses magic arm64 hardware support for remote flushes, so any lazy mm references will still exist after exit_mmap(). (arm64 uses lazy TLB, right?) So this is kind of like the x86 paravirt case. Are there large enough arm64 systems that any of this matters? s390x: The code has too many acronyms for me to understand it fully, but I think it's more or less the same situation as arm64. How big do s390x systems come? power: Ridiculously complicated, seems to vary by system and kernel config. So, Nick, your unconditional IPI scheme is apparently a big improvement for power, and it should be an improvement and have low cost for x86. On arm64 and s390x it will add more IPIs on process exit but reduce contention on context switching depending on how lazy TLB works. I suppose we could try it for all architectures without any further optimizations. Or we could try one of the perhaps excessively clever improvements I linked above. arm64, s390x people, what do you think?