From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 087F7C64E7A for ; Thu, 3 Dec 2020 05:05:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6CFFD22249 for ; Thu, 3 Dec 2020 05:05:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CFFD22249 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=amacapital.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E21896B0068; Thu, 3 Dec 2020 00:05:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DF85C6B006C; Thu, 3 Dec 2020 00:05:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D34D96B006E; Thu, 3 Dec 2020 00:05:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id BED8C6B0068 for ; Thu, 3 Dec 2020 00:05:43 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 886DD181AEF1F for ; Thu, 3 Dec 2020 05:05:43 +0000 (UTC) X-FDA: 77550783366.12.actor06_2700374273b9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 678F318022F3D for ; Thu, 3 Dec 2020 05:05:43 +0000 (UTC) X-HE-Tag: actor06_2700374273b9 X-Filterd-Recvd-Size: 6704 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Dec 2020 05:05:42 +0000 (UTC) Received: by mail-wm1-f67.google.com with SMTP id v14so1520005wml.1 for ; Wed, 02 Dec 2020 21:05:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=XUBoIVUFuTda74sYhc9AaUfIF0VYHNgbgX6L3CSfTWQ=; b=FuUcSxWrRc7RgsOSwe/Ar1EO5Ff3N9odSv2XY0Zqm4g9XP/wOGYy94T5KC5HEmT6Zb CllAZYKNVheNctLUSpCVA6zIh60iTARxAlH2IqE8hO8QtRArgPC4FG1cylYSpVKMkKqk l/WPKsc5i7rBRvupJpVLvzY7Lu6Dye39+pVi0u97RbqVKvYtdT/QyH/uSImekLuKVcZI H2TXJq9pUhq9WbHGZbspm0eqJESP82vmo4Z90qkMrkjglVKyU2w+Rac/C03QfedVdBX2 LYJ5cEWr8ojZ0eYqFB+t9XwtqTSSuxp1lTUuvAXfuVZpKCZbQNzq80tI6A6eI04Y+GWe mHVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=XUBoIVUFuTda74sYhc9AaUfIF0VYHNgbgX6L3CSfTWQ=; b=glcTvzku7qmlPI42wetKKHA77axUW5VK5I+g+99tMy0h79lZb3YmHQaw0L1Dp3+zTi gvg5lrgYx30Ij8vW56lzoQxXl0ozYmTTDaN0KGZFuN7KrrgZ8QjsuZ56rnS0xVX16DMO 2kCHf2N3b84KgzHwE1pXtbtLDDbT6QmwxfK3EwiwX3FZXY+ttVPjs07TU5qqDthMR7MH 7/Da2PYdIzjm7RCXnQwHfZI9EjxF0YFsf0qr+pVvGssZF7n815j/pHqgSICiJp0Gt4y8 XVhNhMm0xjo0Qv49tk78INCrhPJW/UHE9IDgO8v2NTQbJ6fPVtu1cUI2IQ/GwJCM04dv gc+w== X-Gm-Message-State: AOAM531nJdCXrNWrdbeA8yWfdUqxMaGARJ/fJ/HnJVh6kHTa4X6QXVio 43WD3Rp5mtHu3hKTtg/Uso5GdP8yNLZX3gBqHveb0g== X-Google-Smtp-Source: ABdhPJw58yLVfBKzewS+37BuoRbKB9WSMia7+fhXYWK9vuZSkJTJI8hhAr2AGPTvmZJTPupRjW/sGVxlMYUtiS9xZq8= X-Received: by 2002:a1c:1d85:: with SMTP id d127mr1216271wmd.49.1606971941686; Wed, 02 Dec 2020 21:05:41 -0800 (PST) MIME-Version: 1.0 References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-7-npiggin@gmail.com> <1606879302.tdngvs3yq4.astroid@bobo.none> In-Reply-To: <1606879302.tdngvs3yq4.astroid@bobo.none> From: Andy Lutomirski Date: Wed, 2 Dec 2020 21:05:30 -0800 Message-ID: Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Nicholas Piggin Cc: Christian Borntraeger , Catalin Marinas , Dave Hansen , Vasily Gorbik , Heiko Carstens , Andy Lutomirski , Will Deacon , Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Peter Zijlstra , X86 ML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Dec 1, 2020, at 7:47 PM, Nicholas Piggin wrote: > > =EF=BB=BFExcerpts from Andy Lutomirski's message of December 1, 2020 4:31= am: >> other arch folk: there's some background here: >> >> https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31S= aTOZw@mail.gmail.com >> >>> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski wrot= e: >>> >>> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote= : >>>> >>>> On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wr= ote: >>>>> >>>>> On big systems, the mm refcount can become highly contented when doin= g >>>>> a lot of context switching with threaded applications (particularly >>>>> switching between the idle thread and an application thread). >>>>> >>>>> Abandoning lazy tlb slows switching down quite a bit in the important >>>>> user->idle->user cases, so so instead implement a non-refcounted sche= me >>>>> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot do= wn >>>>> any remaining lazy ones. >>>>> >>>>> Shootdown IPIs are some concern, but they have not been observed to b= e >>>>> a big problem with this scheme (the powerpc implementation generated >>>>> 314 additional interrupts on a 144 CPU system during a kernel compile= ). >>>>> There are a number of strategies that could be employed to reduce IPI= s >>>>> if they turn out to be a problem for some workload. >>>> >>>> I'm still wondering whether we can do even better. >>>> >>> >>> Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes >>> the TLB. On x86, this will shoot down all lazies as long as even a >>> single pagetable was freed. (Or at least it will if we don't have a >>> serious bug, but the code seems okay. We'll hit pmd_free_tlb, which >>> sets tlb->freed_tables, which will trigger the IPI.) So, on >>> architectures like x86, the shootdown approach should be free. The >>> only way it ought to have any excess IPIs is if we have CPUs in >>> mm_cpumask() that don't need IPI to free pagetables, which could >>> happen on paravirt. >> >> Indeed, on x86, we do this: >> >> [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d >> [ 11.559905] tlb_finish_mmu+0x10e/0x1a0 >> [ 11.561068] exit_mmap+0xc8/0x1a0 >> [ 11.561932] mmput+0x29/0xd0 >> [ 11.562688] do_exit+0x316/0xa90 >> [ 11.563588] do_group_exit+0x34/0xb0 >> [ 11.564476] __x64_sys_exit_group+0xf/0x10 >> [ 11.565512] do_syscall_64+0x34/0x50 >> >> and we have info->freed_tables set. >> >> What are the architectures that have large systems like? >> >> x86: we already zap lazies, so it should cost basically nothing to do > > This is not zapping lazies, this is freeing the user page tables. > > "lazy mm" is where a switch to a kernel thread takes on the > previous mm for its kernel mapping rather than switch to init_mm. The intent of the code is to flush the TLB after freeing user pages tables, but, on bare metal, lazies get zapped as a side effect. Anyway, I'm going to send out a mockup of an alternative approach shortly.