From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A5B4C77B75 for ; Mon, 15 May 2023 21:11:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D59A900004; Mon, 15 May 2023 17:11:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88662900002; Mon, 15 May 2023 17:11:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74DC1900004; Mon, 15 May 2023 17:11:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 636A5900002 for ; Mon, 15 May 2023 17:11:50 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2000AC0542 for ; Mon, 15 May 2023 21:11:50 +0000 (UTC) X-FDA: 80793736380.18.C2DE6B1 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf08.hostedemail.com (Postfix) with ESMTP id 3C791160016 for ; Mon, 15 May 2023 21:11:48 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="my60MP9/"; dkim=pass header.d=linutronix.de header.s=2020e header.b=EdE2vnWM; spf=pass (imf08.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684185108; a=rsa-sha256; cv=none; b=OzvMZhu6Eusa273lUFLIpHzEtLwjd9UNB1UmTuObFvHU5BYiwwd0R2dKUnWRAs+0WdolnB TGvHC1U8+mEreu/bU7OvTCQrxOeIpqjPeK9guzcsZjuifk7tvjUq1IVSOe6HdnTWUCcqMV dAWfSc4djeZZ9DkgwFM/j6N/a57mtjk= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="my60MP9/"; dkim=pass header.d=linutronix.de header.s=2020e header.b=EdE2vnWM; spf=pass (imf08.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684185108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qzz+druBBUZYkIqG0zGjkJ0nslu4gQbC3p4KBl41HPY=; b=oDAKC8rfqXszY3t6+keqZGVDsnehh5qunnD0ClEU2XBKg6/r3EzletKVbsPy01mKahX/5f YfQyl2IUP4E1moDR+TYHwXbmkJjLAsUe33H9vDl/rBUITZgBe/m+Gdj5Pn7m10Ojl/eIXE U52qSdCG/N5+XvXo6u8WbF8apvJ/S/w= From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1684185105; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Qzz+druBBUZYkIqG0zGjkJ0nslu4gQbC3p4KBl41HPY=; b=my60MP9/JVonqjieyB3Xvh467EaBHhVdPNcNdM2i4cgoY/VaG6Ft7bL0uqckGc3AdRGX6O PL/cusCH/Fsgs1ZYr3bHm0/X/YuSafWKQjNqQJFjgfu/RwO+GQtpiFw6LjJZsBxKzs0LQE XqZp1SjgiBgplVLql4kSdMGnCYXNcjA4aZHppHVgCdG2VgLfJjG83Yq4ELUCyaIn8H1OTf hDVPdep7BzknLg22kkvcaVdxzoVdZCpKNZX+FD0qrl/7U7fjUuKjA374aADuHySQUqsA8+ K1GU3kIhzWiX30UVHRNDlS35n0baireWn+UbMMRbpDI0Zj9IeMTe5yJDFMj76Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1684185105; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Qzz+druBBUZYkIqG0zGjkJ0nslu4gQbC3p4KBl41HPY=; b=EdE2vnWMnow5zL7WjSFaZBCVHJW8+PJdSdke3T7nBIdx5fvyHemY5hi3xZonsWIFL4JanF sAgT0jeRW2NfqCAQ== To: "Russell King (Oracle)" Cc: Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges In-Reply-To: <87353x9y3l.ffs@tglx> References: <87a5y5a6kj.ffs@tglx> <87353x9y3l.ffs@tglx> Date: Mon, 15 May 2023 23:11:45 +0200 Message-ID: <87zg658fla.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Rspam-User: X-Stat-Signature: d44d5bdjy87ubkaxxmdxu7qfojbftnyy X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 3C791160016 X-HE-Tag: 1684185108-703893 X-HE-Meta: U2FsdGVkX19xiFCe11iA0j69ldaZLvOxFQzP9oOgdZgpSQc534PYbzKKshJmlDNQkClhSny7BuoCjLWZdDsI3Whj46c/36BYPJvUgkNYRZDiknOgK+3enIW8k+/bzErQwhJywtxQMLTOUgGWwsTfw6Hgl3Ezc2E6vjH4OBh6+Zx3GwAdeUJOJm5l0W0FoM66a0Gckl2J9B1eL7LKus3IcuXWsSIY0QQ4MXOKke9+OyzGy0Mxmo649/aV5hXNvKrrHiim6b7+Y3fGMczgoZnM+VbFFY51FBlsMbsA2TCBvfJlMywRqzkE/CjeqSjXZNoEwJhwNbiO0cMxYuniKGqqxMch+vOAfa+t/t8xTBYG7O75N3GItPCAB1z2T2Nx2ubdh3P74THQ3NyRy03uNNAKFcwkuAGKV+QCGp+9yssyJYL2CxUSi9vftFAcuyFTrc1eWqWEu+ie2QAvde+yakUeQF7MKIl6zN1X+DZth2W7qo+mZ4NUZqlVJHLK0S+3/S3p/KrgjqH5lqZiP4kixiLmLPg9sQeTAYJjF0YmCupi1i7QbeLMy/6sFZbl/W9Usdb0KPzKnE/n0bY6LPR3mh+6Bp9YvsEzv9JssizJTz2Ic7BgYql3nEK53hmR+Thk4ThZpofQYUAZdvEyi/2QDK9ETyIYPYmjwvK4nGEIUui+7cgFxDfRUj0Q1OC7bhzTk0K5HdjYKZASwwgU/rCgdAYGj4U+4I148RGj8M3eONKU1v6Tag+dyDY1rfneIJXnHg+HE6dC/wxpmdbEPlzii3h656MsQS+vg9aHaihYfWJ2yr015xIt6YZckVMwG/oPSc9zQCPmYSlB5GRt2AEnWwWRabkut35FjHcncNfR9a4997XpTCrp+DJTWSRi0pLtRNGJcEn5fS5P8cLqS0wroNqEUyCpVPnwWneYDwB1empGGi8Y/llrWCfuNBUS+eQFROE0PItKr5AG3A6HnsddUK/ ZUomTMjg 4IiRdFmbpUYv2rYSuODt9MNYg/TuOYG4/Ft7sOUwlnuhWV+w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 15 2023 at 21:46, Thomas Gleixner wrote: > On Mon, May 15 2023 at 17:59, Russell King wrote: >> On Mon, May 15, 2023 at 06:43:40PM +0200, Thomas Gleixner wrote: > That reproduces in a VM easily and has exactly the same behaviour: > > Extra page[s] via The actual allocation > _vm_unmap_aliases() Pages Pages Flush start Pages > alloc: ffffc9000058e000 2 > free : ffff888144751000 1 ffffc9000058e000 2 ffff888144751000 17312759359 > > alloc: ffffc90000595000 2 > free : ffff8881424f0000 1 ffffc90000595000 2 ffff8881424f0000 17312768167 > > ..... > > seccomp seems to install 29 BPF programs for that process. So on exit() > this results in 29 full TLB flushes on x86, where each of them is used > to flush exactly three TLB entries. > > The actual two page allocation (ffffc9...) is in the vmalloc space, the > extra page (ffff88...) is in the direct mapping. I tried to flush them one by one, which is actually slightly slower. That's not surprising as there are 3 * 29 instead of 29 IPIs and the IPIs dominate the picture. But that's not necessarily true for ARM32 as there are no IPIs involved on the machine we are using, which is a dual-core Cortex-A9. So I came up with the hack below, which is equally fast as the full flush variant while the performance impact on the other CPUs is minimally lower according to perf. That probably should have another argument which tells how many TLBs this flush affects, i.e. 3 in this example, so an architecture can sensibly decide whether it wants to use flush all or not. Thanks, tglx --- --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1728,6 +1728,7 @@ static bool __purge_vmap_area_lazy(unsig unsigned int num_purged_areas = 0; struct list_head local_purge_list; struct vmap_area *va, *n_va; + struct vmap_area tmp = { .va_start = start, .va_end = end }; lockdep_assert_held(&vmap_purge_lock); @@ -1747,7 +1748,12 @@ static bool __purge_vmap_area_lazy(unsig list_last_entry(&local_purge_list, struct vmap_area, list)->va_end); - flush_tlb_kernel_range(start, end); + if (tmp.va_end > tmp.va_start) + list_add(&tmp.list, &local_purge_list); + flush_tlb_kernel_vas(&local_purge_list); + if (tmp.va_end > tmp.va_start) + list_del(&tmp.list); + resched_threshold = lazy_max_pages() << 1; spin_lock(&free_vmap_area_lock); --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -1081,6 +1082,24 @@ void flush_tlb_kernel_range(unsigned lon } } +static void do_flush_vas(void *arg) +{ + struct list_head *list = arg; + struct vmap_area *va; + unsigned long addr; + + list_for_each_entry(va, list, list) { + /* flush range by one by one 'invlpg' */ + for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE) + flush_tlb_one_kernel(addr); + } +} + +void flush_tlb_kernel_vas(struct list_head *list) +{ + on_each_cpu(do_flush_vas, list, 1); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -295,4 +295,6 @@ bool vmalloc_dump_obj(void *object); static inline bool vmalloc_dump_obj(void *object) { return false; } #endif +void flush_tlb_kernel_vas(struct list_head *list); + #endif /* _LINUX_VMALLOC_H */