From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A051FC77B75 for ; Wed, 17 May 2023 22:57:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22B9C900004; Wed, 17 May 2023 18:57:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B45A900003; Wed, 17 May 2023 18:57:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02DDF900004; Wed, 17 May 2023 18:57:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E29C5900003 for ; Wed, 17 May 2023 18:57:38 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B4E6C1C74BA for ; Wed, 17 May 2023 22:57:38 +0000 (UTC) X-FDA: 80801260596.15.3191A12 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf21.hostedemail.com (Postfix) with ESMTP id CA91A1C0006 for ; Wed, 17 May 2023 22:57:36 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=SfJvLMD5; spf=pass (imf21.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684364256; a=rsa-sha256; cv=none; b=3XS8DsguulcAqqqNJdejKxEMehWWU2ZeA4RQnlGwuDcC0PmuHBssq8KOn7vyLPYWbv/fhq rmRTmYwrEsi6iG+WsMDoOT9j9G+HhXMQOrL27tHSkcH6Sg+F5jzlXXnmr4m4yEDV9Go5+G S6fg25/j6c/SapE34hSZXU0c5y2yFxQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=SfJvLMD5; spf=pass (imf21.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684364256; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OeUbbbIZIfEcJKHthjGPkTQQCNovTADoGt0YUiK4Keo=; b=N3A5wVv62JI6PbsuhS+0Njl0ZsOtSUh7+Bb4ObwNMoaKeTOLibknl1ad790WPYEUber1Cl FgJc6TCAY0QUTOEsSUCzGsLfMOOBzFDezQ9hr2v4gfKe4KHa583M7sfb4f3n3rAA0g6NRh Nl7if9bsLQ4P9qIED+vbmYqufqxrZ7c= Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-64a9335a8e7so13656425b3a.0 for ; Wed, 17 May 2023 15:57:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684364256; x=1686956256; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OeUbbbIZIfEcJKHthjGPkTQQCNovTADoGt0YUiK4Keo=; b=SfJvLMD53SUFMNb56WvXfy59YMO8mQNfwi4fv0hlVVK4KfF7eP2TVQ9LPkIcrauVMU NiDb9/VTgUai4BBN3qL1Hb1OLg4kNa1lsNfDslaVrYpJe319AALrDpumjmhi7+Hq7m4u ff9RlOdNV5ksXRKaLhYTF4EX+hfaWFqNsPNkrw+HgKGPjxieUVJJCuHwBU7Gq0PIv3rH f3AaJvEk02ldPwKu9lkqNhkZB4jhSd92FjppzDUV5L+olLNfFRd3td4wJK/7Ufo2jyii mZRy9H/VJUsrT6PiLlNoqc4p7EZDGGrR1ione0lr6qK1JEYkAducHLA36+dVtPqYlw3e 0XWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684364256; x=1686956256; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OeUbbbIZIfEcJKHthjGPkTQQCNovTADoGt0YUiK4Keo=; b=Vi3HKEjV0BmGaAzlfsZTF+oNJ1Ft2gAEXXs2eO60taIqAZVvJiU+SqyetOE/B5lRyh xXM4uCImWjmkRP3wdobnWlseNRT/Lel2Ls58v8+mBa5xKaDU0lBbJplhv6GphxMcbrU9 uEkp5SRO5XEBQh30T3KjvISBPHbCwiLvZfyHHHbN2LfzPmQwCYFhGjUed+x7fwhrwVwL E+8cmDnOlAgbfLvse6644loGuuzvqH0Xp/ZA/ca+JtF2MQy2HMwIZrfhjojXoxfWAGf9 CQYfnRecglEgXI7bBYaRl21pf7E8gZuQ1T4CD0qwz0wAAm/+jIyCgxXXBcKoarRgpkfT tYGQ== X-Gm-Message-State: AC+VfDz1VNUU2SU3OOo0eUSjQH66gytdkDzoFKduU44edDxG4OMnXyMU GUWdYOVbwduJQDrGtRc5xpc= X-Google-Smtp-Source: ACHHUZ4YFJV2ov9FdVX3OjN/UFnCrvUIvh+Ec85qmDfftZU6P5+6c4UZzJIAPx51sluT6JoXUCI7QQ== X-Received: by 2002:a05:6a20:938e:b0:102:a593:a17c with SMTP id x14-20020a056a20938e00b00102a593a17cmr4974264pzh.0.1684364255270; Wed, 17 May 2023 15:57:35 -0700 (PDT) Received: from smtpclient.apple ([66.170.99.95]) by smtp.gmail.com with ESMTPSA id c7-20020a6566c7000000b0051b5d0fe708sm15141094pgw.43.2023.05.17.15.57.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 May 2023 15:57:34 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\)) Subject: Re: Excessive TLB flush ranges From: Nadav Amit In-Reply-To: <87ttwb5jx3.ffs@tglx> Date: Wed, 17 May 2023 15:57:22 -0700 Cc: Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm , Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <87a5y5a6kj.ffs@tglx> <87353x9y3l.ffs@tglx> <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87cz308y3s.ffs@tglx> <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <7ED917BC-420F-47D4-8956-8984205A75F0@gmail.com> <87bkik6pin.ffs@tglx> <87353v7qms.ffs@tglx> <87ttwb5jx3.ffs@tglx> To: Thomas Gleixner X-Mailer: Apple Mail (2.3731.500.231) X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CA91A1C0006 X-Stat-Signature: f4uu14fhwkrgbki3rzuy5xrf8uyjr48r X-HE-Tag: 1684364256-558260 X-HE-Meta: U2FsdGVkX18pLot8+VCzEUdn1fMqR/lCbyalmOJUoVUry/OoQc2UAvRmcGka9Sa682zAK1tZQOhTb5dCIf32s5vXKCZyLQGVtRhct6+mLHbsOXxubYKnqzCYOQbBdzGjWM2gL9QUVbuaTXbMcwEdhPdZ3+XPmCydiIAiaSxscvCknn6SfXcA4FrKGxvwgaJLE14GLMW3wDbKTQjboSR/xBu2RyPqRhUfi6V25dSk5/yUDRB1/CifXNmGQMqCPl/jn59GS3ykY5K5YgBb05LD92+aMQyVrhNsIcxhNtylDMRvBnIUBfQ9DOCKeY4asFy2FsE5i/J+KoYQZnkQJ2pMmYB9bFaejq7Angat6VTSnD4UrGs2+HGG5flRux4MeDydhKEysYx6bBfV1Yi8lbBY7lWmzGlhL3s1qDZELFsGcl6vSNQ7P34dZ++xMh8KJNL7c7Mo16wu9hbCdv/EBrrxuc8jz3GZS8x7gRYCVDroUBTF+IH6B1C/9EeDJHYhyXcRW0t+jbq8s2WFU2gkSJOp485EXNlCNxIgXdQNTAZv7+7qFDnEE2ZIMaVi6mjOR3WDezfygpGvgHJYN+hBu1LztRQihSXAlf5ujuFfL5ZjndN9G/XPo0Rt7COSUVj26/a4xGoCL4Rbi4LXVVbiQ5o2qlJIDjw3qP5EUB9z81hB7NUUk1Yk6q2tu15+6lNutigfrlwijTzb6f0oabez/BDP9FMj1BWtSawwcwkhnMte4IOVyzx/xIpib4enTsDO6hcSRlTbVcrZ+wjruZUXg/ZOGP6Rr1s1EZEcb36nGUFug3LSnrD9fvBs33X2LJZYw53/H2MF6DxIROJ9N63HAi67dhD1hiczRZtlNQm0PyGem1B0+VRBCJWZcDM44IkksnB2loB4W/K5NHwzj33dvVY7kF+IWZkyRYCvic4tkR+NHx1DzVLWUzFgXLa5Q5FjS9R/w6ZMq72/LUKFMJyN5WL i368V+CX GjDdrsruC/D4g3gAOlM1Dr0OEFmBb4yHNJE6Ldq4Ef1DtbFi2n3UGHgVCL3gV/2lhpEusXxnSJArRWir70SC3nXaZY46/BAQ0REkmx8oIuZZWf3eQdp62PFvHxpdufeinkM5GKhNZmK5pcvuw99SkT+hSpSJCxjBZI+sV0VAak5VlKJBgsUh7TO2u3izQgh2m0UhfGlmPheZlq614/SxbOPbntgzNYXwXlN0lg493FqgdkxGUTxKEbmqp11WlmqDJeF+Rj7zuZnQf0LgAObERpi/zgZVIoU7Fdn4JHrTZGRm1gARX2tVIRxOa2c4cMOa8mlJpEiPhWw9Gx72IC6J3gnUQGIkkEdrmYRe+tt9xNUbBzuDFDkpVwAxqMhIDd7+H40iWarLeCjVh+KHf7vZIN9WyPAQKR6LGxcrgi2mzxQq11y2bQ0vrKZBFawet+NMUW38LqIdF4HFsGeUEaZPmkAjqp9WbuYzypkN9pr3hNacwtbPOQ0lHcqXsgEO3ZXYyRHAmAvVzWk3CfvEaaiXoPD8hK8cfl0LWkN+koX2WpSgpdnV63jLZCyWiiivs0b6100nO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On May 17, 2023, at 3:31 AM, Thomas Gleixner = wrote: >=20 >>> The point is that the generic vmalloc code is making assumptions = which >>> are x86 centric on not even necessarily true on x86. >>>=20 >>> Whether or not this is benefitial on x86 that's a completey separate >>> debate. >>=20 >> I fully understand that if you reduce multiple TLB shootdowns = (IPI-wise) >> into 1, it is (pretty much) all benefit and there is no tradeoff. I = was >> focusing on the question of whether it is beneficial also to do = precise >> TLB flushing, and the tradeoff there is less clear (especially that = the >> kernel uses 2MB pages). >=20 > For the vmalloc() area mappings? Not really. The main penalty of doing a global flush are the innocent bystanders TLB translations. These are likely the regular mappings, not the = malloc-ones. >=20 >> My experience with non-IPI based TLB invalidations is more limited. = IIUC >> the usage model is that the TLB shootdowns should be invoked ASAP >> (perhaps each range can be batched, but there is no sense of batching >> multiple ranges), and then later you would issue some barrier to = ensure >> prior TLB shootdown invocations have been completed. >>=20 >> If that is the (use) case, I am not sure the abstraction you used in >> your prototype is the best one. >=20 > The way how arm/arm64 implement that in software is: >=20 > magic_barrier1(); > flush_range_with_magic_opcodes(); > magic_barrier2(); >=20 > And for that use case having the list with individual ranges is not > really wrong. >=20 > Maybe ARM[64] could do this smarter, but that would require to rewrite = a > lot of code I assume. What you say makes sense - and I actually see that = flush_tlb_page_nosync() needs a memory barrier. I just encountered recent patches that did the flushing on ARM in an async manner as I described. That is the reason I assumed it is more = efficient. = https://lore.kernel.org/linux-mm/20230410134352.4519-3-yangyicong@huawei.c= om/ >=20 >>> There is also a debate required whether a wholesale "flush on _ALL_ >>> CPUs' is justified when some of those CPUs are completely isolated = and >>> have absolutely no chance to be affected by that. This process bound >>> seccomp/BPF muck clearly does not justify to kick isolated CPUs out = of >>> their computation in user space just because=E2=80=A6 >>=20 >> I hope you would excuse my ignorance (I am sure you won=E2=80=99t), = but isn=E2=80=99t >> the seccomp/BPF VMAP ranges are mapped on all processes (considering >> PTI of course)? Are you suggesting you want a per-process kernel >> address space? (which can make senes, I guess) >=20 > Right. The BPF muck is mapped in the global kernel space, but e.g. the > seccomp filters are individual per process. At least that's how I > understand it, but I might be completely wrong. After rehashing the seccomp man page, they are not entirely = =E2=80=9Cprivate=E2=80=9D for each process, as they are maintained after fork/exec. Yet, one can = imagine that it is possible to create non-global kernel mappings that would be mapped per-process and would hold the seccomp filters. This would remove the need to do system-wide flushes when the process dies.