From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E82FC77B75 for ; Tue, 16 May 2023 09:03:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2292900005; Tue, 16 May 2023 05:03:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD278900002; Tue, 16 May 2023 05:03:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7403900005; Tue, 16 May 2023 05:03:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A6B93900002 for ; Tue, 16 May 2023 05:03:39 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7BC59160162 for ; Tue, 16 May 2023 09:03:39 +0000 (UTC) X-FDA: 80795530158.28.0D14755 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf22.hostedemail.com (Postfix) with ESMTP id 8A33CC0012 for ; Tue, 16 May 2023 09:03:37 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="Fsv/BpBB"; dkim=pass header.d=linutronix.de header.s=2020e header.b=FHCuMamz; spf=pass (imf22.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684227817; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S2P5Z3IXnbTSulbv5yfvKztFo0j2Pg9wfHd+Uzr6jzg=; b=S90pZoUfLaiQwTo6yYNb5JssUfalhEF6nCN1t5mM1aQipQLl8r8bmB5Y+u6M0JCXsvDJqo 7lLQtL5w7rcrNm9OGuuxd/AKbE5ydIotq7yGKT4lG8zELEOz5uHZ1zxPvwacgf7I2kn4MQ av2DR4bDT0KQrmsEMZfYJl7Je+rzR14= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684227817; a=rsa-sha256; cv=none; b=B/E6NZ/OUGnYbGrIDKCAr15cTha9zEhTI1HFOte6G1b5QZXm2XqDMa4mzRHyy6O3XiVqkc Q60BzBV2hy++UEDj+sFOviqAR2zBiTHQAV4Sq4zvpXR/3FEHB99m6uAOFjGGQSbGEduwaO JcdBXlBKc+MIxTj7bePxcQAf3KDMUJE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="Fsv/BpBB"; dkim=pass header.d=linutronix.de header.s=2020e header.b=FHCuMamz; spf=pass (imf22.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1684227815; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S2P5Z3IXnbTSulbv5yfvKztFo0j2Pg9wfHd+Uzr6jzg=; b=Fsv/BpBBilc65+Lp5TifuGS+eQoJsZVfdauIq/aqpDZuHgEd5/uFNF70YNC291Is9/AquE 5YOiMj3mojcBcUWCrjehfIYX1EM2qmNVN3xau9plsGHJvlkBwBCXYvdY/EOigPwSn5JkgM 1Vlzl05cfr+afditF3d4WgbeMkp026+3YAdxQRcSQFcuiRABy6e8YwKsaSlRyqufMDug3D qCQzgr6heZLad92vcIQRY7ugSkFoUYEaWn+yML4K7KBLsUkLGosSXFhVPTLockbYWBTjlL 5+e+BJqHW2m0uCt4oYf0jTqD0NYjK5nv+sTcsh+aWfOMhMu7SaCUCYF3iwmrRQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1684227815; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=S2P5Z3IXnbTSulbv5yfvKztFo0j2Pg9wfHd+Uzr6jzg=; b=FHCuMamzYYBPBGHMA05BQxpmDTxnjQJ1CjGvuAppjpWFV9ZMmVOZkarATyp3fPp5Rp3HKQ nHUyu78UF4NjusAg== To: "Russell King (Oracle)" Cc: Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges In-Reply-To: References: <87a5y5a6kj.ffs@tglx> <87353x9y3l.ffs@tglx> <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> Date: Tue, 16 May 2023 11:03:34 +0200 Message-ID: <874joc8x7d.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Rspamd-Queue-Id: 8A33CC0012 X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: h49b8bqfmjge7mhgzigwanomtzt1sqk7 X-HE-Tag: 1684227817-499065 X-HE-Meta: U2FsdGVkX1+Wrb2DWvjbXsUWb/VRVyKXp3sD2//BA6KjoNu0jlZB8eKGWYxgSYnXnveTP1t8bNq4zupDhu+DHlb9gfet+5eiuaVLIbXiXsdkHwGpfuDMLVjDmBwmbMJe0qCd997TujQhzkQmxTePlmm3WurR/+Ge/d4vIpDgyTdOJhcH+/dSwc+SWb7b7taQdXl1w7YDhrwXufvKd9pIC35o3wRU/Cx62ApXP3qVXL7Brpb/DFqcp9SNxN+Td792+FL3GE3VPajvA6vO7I7w8/e6xACDsF8T1+1132djN3ZnkLF3/qlJ0STMwnCjZi05WN7kXYuyI7mqaRtZM+BO86yCMTmYsAYlsq3XeSt800037cQCKa5J4N+0wS3XWS+d8HOZi7DROpcb4rydydbWw6BRQAeXTaUzCjj4GJOl1UAyllALV3GxrqqS++Yw7lh2fy+MynTJCGv2PNZ9Iq6EYY8EiNSuF+DTMAQCgsBRjqLNORvX6xbqVq3kDt58P9LKDQguhkG+evpMSG0Gjzq3ua3BuqsCTAJCdoknIiuxbVC7XuqXLcvSW2ecArMbdXhLDqR7eACkTcaloAPbDcFw1c8esq8bx1sF4ypnlhz+sdM7cJ5FWrhZZqgbE1Ts1pMeujHpn6u30rQd51mKKSzT4PbfJh2NHleSQEnY+pXkwNQnyVemkL3v9qEep0hRDMqJSCTzskAsGdoGVZPyNTrAduZF1zkRqTl8XdYLaDoBVmBS8OfhJOsr1yfQeNXB949a7q2kA5F3oVLllhTnT69zA+uzpNA5f/rKV8TCTK1Z2qhwlT46k0lI/ZuH73P1Apsc4PrwGwMfE5+dEYBsa9kVamyn2Lr8R9/JRyEZXycMNRiYt7OaJrPb8+iNLRgukaKNX7GncTT7DEshqMukcnehyJxwl3Kl6luPg8gGKohGMjWQ8dHr+P6UAdGudMJMpuV6nt+M0r/unU2h+erPK/D y6FVWsS8 ygzXNKRnIv1cG2+0maEgUC5+4y5/kbR8SXm+dcQEp1XA6hIE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 16 2023 at 09:27, Russell King wrote: > On Tue, May 16, 2023 at 10:20:37AM +0200, Thomas Gleixner wrote: >> On Tue, May 16 2023 at 10:18, Thomas Gleixner wrote: >> >> > On Tue, May 16 2023 at 08:37, Thomas Gleixner wrote: >> >> On Mon, May 15 2023 at 22:31, Russell King wrote: >> >>>> + list_for_each_entry(va, list, list) { >> >>>> + /* flush range by one by one 'invlpg' */ >> >>>> + for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE) >> >>>> + flush_tlb_one_kernel(addr); >> >>> >> >>> Isn't this just the same as: >> >>> flush_tlb_kernel_range(va->va_start, va->va_end); >> >> >> >> Indeed. >> > >> > Actually not. At least not on x86 where it'd end up with 3 IPIs for that >> > case again, instead of having one which walks the list on each CPU. >> >> ARM32 has the same problem when tlb_ops_need_broadcast() is true. > > If tlb_ops_need_broadcast() is true, then isn't it one IPI to other > CPUs to flush the range, and possibly another for the Cortex-A15 > erratum? > > I've no idea what flush_tlb_one_kernel() is. I can find no such The patch is against x86 and that function exists there. At lease git grep claims so. :) > implementation, there is flush_tlb_kernel_page() though, which I > think is what you're referring to above. On ARM32, that will issue > one IPI each time it's called, and possibly another IPI for the > Cortex-A15 erratum. > > Given that, flush_tlb_kernel_range() is still going to be more > efficient on ARM32 when tlb_ops_need_broadcast() is true than doing > it page by page. Something like the untested below? I did not attempt anything to decide whether a full flush might be worth it, but that's a separate problem. Thanks, tglx --- --- a/arch/Kconfig +++ b/arch/Kconfig @@ -270,6 +270,10 @@ config ARCH_HAS_SET_MEMORY config ARCH_HAS_SET_DIRECT_MAP bool +# Select if architecture provides flush_tlb_kernel_vas() +config ARCH_HAS_FLUSH_TLB_KERNEL_VAS + bool + # # Select if the architecture provides the arch_dma_set_uncached symbol to # either provide an uncached segment alias for a DMA allocation, or --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -10,6 +10,7 @@ config ARM select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV select ARCH_HAS_MEMBARRIER_SYNC_CORE --- a/arch/arm/kernel/smp_tlb.c +++ b/arch/arm/kernel/smp_tlb.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,19 @@ static inline void ipi_flush_tlb_kernel_ local_flush_tlb_kernel_range(ta->ta_start, ta->ta_end); } +static inline void local_flush_tlb_kernel_vas(struct list_head *vmap_list) +{ + struct vmap_area *va; + + list_for_each_entry(va, vmap_list, list) + local_flush_tlb_kernel_range(va->va_start, va->va_end); +} + +static inline void ipi_flush_tlb_kernel_vas(void *arg) +{ + local_flush_tlb_kernel_vas(arg); +} + static inline void ipi_flush_bp_all(void *ignored) { local_flush_bp_all(); @@ -244,6 +258,15 @@ void flush_tlb_kernel_range(unsigned lon broadcast_tlb_a15_erratum(); } +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (tlb_ops_need_broadcast()) { + on_each_cpu(ipi_flush_tlb_kernel_vas, vmap_list, 1); + } else + local_flush_tlb_kernel_vas(vmap_list); + broadcast_tlb_a15_erratum(); +} + void flush_bp_all(void) { if (tlb_ops_need_broadcast()) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -77,6 +77,7 @@ config X86 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_EARLY_DEBUG if KGDB select ARCH_HAS_ELF_RANDOMIZE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -1081,6 +1082,27 @@ void flush_tlb_kernel_range(unsigned lon } } +static void do_flush_tlb_vas(void *arg) +{ + struct list_head *vmap_list = arg; + struct vmap_area *va; + unsigned long addr; + + list_for_each_entry(va, vmap_list, list) { + /* flush range by one by one 'invlpg' */ + for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE) + flush_tlb_one_kernel(addr); + } +} + +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (num_entries > tlb_single_page_flush_ceiling) + on_each_cpu(do_flush_tlb_all, NULL, 1); + else + on_each_cpu(do_flush_tlb_vas, vmap_list, 1); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -295,4 +295,6 @@ bool vmalloc_dump_obj(void *object); static inline bool vmalloc_dump_obj(void *object) { return false; } #endif +void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries); + #endif /* _LINUX_VMALLOC_H */ --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1724,7 +1724,8 @@ static void purge_fragmented_blocks_allc */ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) { - unsigned long resched_threshold; + unsigned long resched_threshold, num_entries = 0, num_alias_entries = 0; + struct vmap_area alias_va = { .va_start = start, .va_end = end }; unsigned int num_purged_areas = 0; struct list_head local_purge_list; struct vmap_area *va, *n_va; @@ -1736,18 +1737,29 @@ static bool __purge_vmap_area_lazy(unsig list_replace_init(&purge_vmap_area_list, &local_purge_list); spin_unlock(&purge_vmap_area_lock); - if (unlikely(list_empty(&local_purge_list))) - goto out; + start = min(start, list_first_entry(&local_purge_list, struct vmap_area, list)->va_start); + end = max(end, list_last_entry(&local_purge_list, struct vmap_area, list)->va_end); - start = min(start, - list_first_entry(&local_purge_list, - struct vmap_area, list)->va_start); - - end = max(end, - list_last_entry(&local_purge_list, - struct vmap_area, list)->va_end); + if (IS_ENABLED(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS)) { + list_for_each_entry(va, &local_purge_list, list) + num_entries += (va->va_end - va->va_start) >> PAGE_SHIFT; + + if (unlikely(!num_entries)) + goto out; + + if (alias_va.va_end > alias_va.va_start) { + num_alias_entries = (alias_va.va_end - alias_va.va_start) >> PAGE_SHIFT; + list_add(&alias_va.list, &local_purge_list); + } + + flush_tlb_kernel_vas(&local_purge_list, num_entries + num_alias_entries); + + if (num_alias_entries) + list_del(&alias_va.list); + } else { + flush_tlb_kernel_range(start, end); + } - flush_tlb_kernel_range(start, end); resched_threshold = lazy_max_pages() << 1; spin_lock(&free_vmap_area_lock);