linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Uladzislau Rezki <urezki@gmail.com>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Baoquan He <bhe@redhat.com>, John Ogness <jogness@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	x86@kernel.org
Subject: Re: Excessive TLB flush ranges
Date: Mon, 15 May 2023 23:11:45 +0200	[thread overview]
Message-ID: <87zg658fla.ffs@tglx> (raw)
In-Reply-To: <87353x9y3l.ffs@tglx>

On Mon, May 15 2023 at 21:46, Thomas Gleixner wrote:
> On Mon, May 15 2023 at 17:59, Russell King wrote:
>> On Mon, May 15, 2023 at 06:43:40PM +0200, Thomas Gleixner wrote:
> That reproduces in a VM easily and has exactly the same behaviour:
>
>        Extra page[s] via         The actual allocation
>        _vm_unmap_aliases() Pages                     Pages Flush start       Pages
> alloc:                           ffffc9000058e000      2
> free : ffff888144751000      1   ffffc9000058e000      2   ffff888144751000  17312759359
>
> alloc:                           ffffc90000595000      2
> free : ffff8881424f0000      1   ffffc90000595000      2   ffff8881424f0000  17312768167
>
> .....
>
> seccomp seems to install 29 BPF programs for that process. So on exit()
> this results in 29 full TLB flushes on x86, where each of them is used
> to flush exactly three TLB entries.
>
> The actual two page allocation (ffffc9...) is in the vmalloc space, the
> extra page (ffff88...) is in the direct mapping.

I tried to flush them one by one, which is actually slightly slower.
That's not surprising as there are 3 * 29 instead of 29 IPIs and the
IPIs dominate the picture.

But that's not necessarily true for ARM32 as there are no IPIs involved
on the machine we are using, which is a dual-core Cortex-A9.

So I came up with the hack below, which is equally fast as the full
flush variant while the performance impact on the other CPUs is minimally
lower according to perf.

That probably should have another argument which tells how many TLBs
this flush affects, i.e. 3 in this example, so an architecture can
sensibly decide whether it wants to use flush all or not.

Thanks,

        tglx
---
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1728,6 +1728,7 @@ static bool __purge_vmap_area_lazy(unsig
 	unsigned int num_purged_areas = 0;
 	struct list_head local_purge_list;
 	struct vmap_area *va, *n_va;
+	struct vmap_area tmp = { .va_start = start, .va_end = end };
 
 	lockdep_assert_held(&vmap_purge_lock);
 
@@ -1747,7 +1748,12 @@ static bool __purge_vmap_area_lazy(unsig
 		list_last_entry(&local_purge_list,
 			struct vmap_area, list)->va_end);
 
-	flush_tlb_kernel_range(start, end);
+	if (tmp.va_end > tmp.va_start)
+		list_add(&tmp.list, &local_purge_list);
+	flush_tlb_kernel_vas(&local_purge_list);
+	if (tmp.va_end > tmp.va_start)
+		list_del(&tmp.list);
+
 	resched_threshold = lazy_max_pages() << 1;
 
 	spin_lock(&free_vmap_area_lock);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -10,6 +10,7 @@
 #include <linux/debugfs.h>
 #include <linux/sched/smt.h>
 #include <linux/task_work.h>
+#include <linux/vmalloc.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -1081,6 +1082,24 @@ void flush_tlb_kernel_range(unsigned lon
 	}
 }
 
+static void do_flush_vas(void *arg)
+{
+	struct list_head *list = arg;
+	struct vmap_area *va;
+	unsigned long addr;
+
+	list_for_each_entry(va, list, list) {
+		/* flush range by one by one 'invlpg' */
+		for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE)
+			flush_tlb_one_kernel(addr);
+	}
+}
+
+void flush_tlb_kernel_vas(struct list_head *list)
+{
+	on_each_cpu(do_flush_vas, list, 1);
+}
+
 /*
  * This can be used from process context to figure out what the value of
  * CR3 is without needing to do a (slow) __read_cr3().
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -295,4 +295,6 @@ bool vmalloc_dump_obj(void *object);
 static inline bool vmalloc_dump_obj(void *object) { return false; }
 #endif
 
+void flush_tlb_kernel_vas(struct list_head *list);
+
 #endif /* _LINUX_VMALLOC_H */







  reply	other threads:[~2023-05-15 21:11 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 16:43 Thomas Gleixner
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 19:46   ` Thomas Gleixner
2023-05-15 21:11     ` Thomas Gleixner [this message]
2023-05-15 21:31       ` Russell King (Oracle)
2023-05-16  6:37         ` Thomas Gleixner
2023-05-16  6:46           ` Thomas Gleixner
2023-05-16  8:18           ` Thomas Gleixner
2023-05-16  8:20             ` Thomas Gleixner
2023-05-16  8:27               ` Russell King (Oracle)
2023-05-16  9:03                 ` Thomas Gleixner
2023-05-16 10:05                   ` Baoquan He
2023-05-16 14:21                     ` Thomas Gleixner
2023-05-16 19:03                       ` Thomas Gleixner
2023-05-17  9:38                         ` Thomas Gleixner
2023-05-17 10:52                           ` Baoquan He
2023-05-19 11:22                             ` Thomas Gleixner
2023-05-19 11:49                               ` Baoquan He
2023-05-19 14:13                                 ` Thomas Gleixner
2023-05-19 12:01                         ` [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Baoquan He
2023-05-19 14:16                           ` Thomas Gleixner
2023-05-19 12:02                         ` [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Baoquan He
2023-05-19 12:03                         ` [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Baoquan He
2023-05-19 14:17                           ` Thomas Gleixner
2023-05-19 18:38                           ` Thomas Gleixner
2023-05-19 23:46                             ` Baoquan He
2023-05-21 23:10                               ` Thomas Gleixner
2023-05-22 11:21                                 ` Baoquan He
2023-05-22 12:02                                   ` Thomas Gleixner
2023-05-22 14:34                                     ` Baoquan He
2023-05-22 20:21                                       ` Thomas Gleixner
2023-05-22 20:44                                         ` Thomas Gleixner
2023-05-23  9:35                                         ` Baoquan He
2023-05-19 13:49                   ` Excessive TLB flush ranges Thomas Gleixner
2023-05-16  8:21             ` Russell King (Oracle)
2023-05-16  8:19           ` Russell King (Oracle)
2023-05-16  8:44             ` Thomas Gleixner
2023-05-16  8:48               ` Russell King (Oracle)
2023-05-16 12:09                 ` Thomas Gleixner
2023-05-16 13:42                   ` Uladzislau Rezki
2023-05-16 14:38                     ` Thomas Gleixner
2023-05-16 15:01                       ` Uladzislau Rezki
2023-05-16 17:04                         ` Thomas Gleixner
2023-05-17 11:26                           ` Uladzislau Rezki
2023-05-17 11:58                             ` Thomas Gleixner
2023-05-17 12:15                               ` Uladzislau Rezki
2023-05-17 16:32                                 ` Thomas Gleixner
2023-05-19 10:01                                   ` Uladzislau Rezki
2023-05-19 14:56                                     ` Thomas Gleixner
2023-05-19 15:14                                       ` Uladzislau Rezki
2023-05-19 16:32                                         ` Thomas Gleixner
2023-05-19 17:02                                           ` Uladzislau Rezki
2023-05-16 17:56                       ` Nadav Amit
2023-05-16 19:32                         ` Thomas Gleixner
2023-05-17  0:23                           ` Thomas Gleixner
2023-05-17  1:23                             ` Nadav Amit
2023-05-17 10:31                               ` Thomas Gleixner
2023-05-17 11:47                                 ` Thomas Gleixner
2023-05-17 22:41                                   ` Nadav Amit
2023-05-17 14:43                                 ` Mark Rutland
2023-05-17 16:41                                   ` Thomas Gleixner
2023-05-17 22:57                                 ` Nadav Amit
2023-05-19 11:49                                   ` Thomas Gleixner
2023-05-17 12:12                               ` Russell King (Oracle)
2023-05-17 23:14                                 ` Nadav Amit
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-16  2:26   ` Baoquan He
2023-05-16  6:40     ` Thomas Gleixner
2023-05-16  8:07       ` Baoquan He
2023-05-16  8:10         ` Baoquan He
2023-05-16  8:45         ` Russell King (Oracle)
2023-05-16  9:13           ` Thomas Gleixner
2023-05-16  8:54         ` Thomas Gleixner
2023-05-16  9:48           ` Baoquan He
2023-05-15 20:02 ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zg658fla.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@lst.de \
    --cc=jogness@linutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=lstoakes@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peterz@infradead.org \
    --cc=urezki@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox