linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	Uladzislau Rezki <urezki@gmail.com>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Baoquan He <bhe@redhat.com>, John Ogness <jogness@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	x86@kernel.org
Subject: Re: Excessive TLB flush ranges
Date: Mon, 15 May 2023 21:46:38 +0200	[thread overview]
Message-ID: <87353x9y3l.ffs@tglx> (raw)
In-Reply-To: <ZGJk3w4gUbpLl2tp@shell.armlinux.org.uk>

On Mon, May 15 2023 at 17:59, Russell King wrote:
> On Mon, May 15, 2023 at 06:43:40PM +0200, Thomas Gleixner wrote:
>> bpf_prog_free_deferred()
>>   vfree()
>>     _vm_unmap_aliases()
>>        collect_per_cpu_vmap_blocks: start:0x95c8d000 end:0x95c8e000 size:0x1000 
>>        __purge_vmap_area_lazy(start:0x95c8d000, end:0x95c8e000)
>> 
>>          va_start:0xf08a1000 va_end:0xf08a5000 size:0x00004000 gap:0x5ac13000 (371731 pages)
>>          va_start:0xf08a5000 va_end:0xf08a9000 size:0x00004000 gap:0x00000000 (     0 pages)
>>          va_start:0xf08a9000 va_end:0xf08ad000 size:0x00004000 gap:0x00000000 (     0 pages)
>>          va_start:0xf08ad000 va_end:0xf08b1000 size:0x00004000 gap:0x00000000 (     0 pages)
>>          va_start:0xf08b3000 va_end:0xf08b7000 size:0x00004000 gap:0x00002000 (     2 pages)
>>          va_start:0xf08b7000 va_end:0xf08bb000 size:0x00004000 gap:0x00000000 (     0 pages)
>>          va_start:0xf08bb000 va_end:0xf08bf000 size:0x00004000 gap:0x00000000 (     0 pages)
>>          va_start:0xf0a15000 va_end:0xf0a17000 size:0x00002000 gap:0x00156000 (   342 pages)
>> 
>>       flush_tlb_kernel_range(start:0x95c8d000, end:0xf0a17000)
>> 
>>          Does 372106 flush operations where only 31 are useful
>
> So, you asked the architecture to flush a large range, and are then
> surprised if it takes a long time. There is no way to know how many
> of those are useful.

I did not ask for that. That's the merge ranges logic in
__purge_vmap_area_lazy() which decides that the one page at 0x95c8d000
should build a flush range with the rest. I'm just the messenger :)

> Now, while using the sledge hammer of flushing all TLB entries may
> sound like a good answer, if we're only evicting 31 entries, the
> other entries are probably useful to have, no?

That's what I was asking already in the part you removed from the reply,
no?

> I think that you'd only run into this if you had a huge BPF
> program and you tore it down, no?

There was no huge BPF program. Some default seccomp muck.

I have another trace which shows that seccomp creates 10 BPF programs
for one process where each allocates 8K vmalloc memory in the
0xf0a.... address range.

On teardown this is even more horrible than the above. Every allocation
is deleted separately, i.e. 8k at a time and the pattern is always the
same. One extra page in the 0xca6..... address range is handed in via
_vm_unmap_aliases(), which expands the range insanely.

So this means ~1.5M flush operations to flush a total of 30 TLB entries.

That reproduces in a VM easily and has exactly the same behaviour:

       Extra page[s] via         The actual allocation
       _vm_unmap_aliases() Pages                     Pages Flush start       Pages
alloc:                           ffffc9000058e000      2
free : ffff888144751000      1   ffffc9000058e000      2   ffff888144751000  17312759359

alloc:                           ffffc90000595000      2
free : ffff8881424f0000      1   ffffc90000595000      2   ffff8881424f0000  17312768167

.....

seccomp seems to install 29 BPF programs for that process. So on exit()
this results in 29 full TLB flushes on x86, where each of them is used
to flush exactly three TLB entries.

The actual two page allocation (ffffc9...) is in the vmalloc space, the
extra page (ffff88...) is in the direct mapping.

This is a plain debian install with the a 6.4-rc1 kernel. The reproducer
is: # systemctl start logrotate

Thanks,

        tglx


  reply	other threads:[~2023-05-15 19:46 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 16:43 Thomas Gleixner
2023-05-15 16:59 ` Russell King (Oracle)
2023-05-15 19:46   ` Thomas Gleixner [this message]
2023-05-15 21:11     ` Thomas Gleixner
2023-05-15 21:31       ` Russell King (Oracle)
2023-05-16  6:37         ` Thomas Gleixner
2023-05-16  6:46           ` Thomas Gleixner
2023-05-16  8:18           ` Thomas Gleixner
2023-05-16  8:20             ` Thomas Gleixner
2023-05-16  8:27               ` Russell King (Oracle)
2023-05-16  9:03                 ` Thomas Gleixner
2023-05-16 10:05                   ` Baoquan He
2023-05-16 14:21                     ` Thomas Gleixner
2023-05-16 19:03                       ` Thomas Gleixner
2023-05-17  9:38                         ` Thomas Gleixner
2023-05-17 10:52                           ` Baoquan He
2023-05-19 11:22                             ` Thomas Gleixner
2023-05-19 11:49                               ` Baoquan He
2023-05-19 14:13                                 ` Thomas Gleixner
2023-05-19 12:01                         ` [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Baoquan He
2023-05-19 14:16                           ` Thomas Gleixner
2023-05-19 12:02                         ` [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Baoquan He
2023-05-19 12:03                         ` [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Baoquan He
2023-05-19 14:17                           ` Thomas Gleixner
2023-05-19 18:38                           ` Thomas Gleixner
2023-05-19 23:46                             ` Baoquan He
2023-05-21 23:10                               ` Thomas Gleixner
2023-05-22 11:21                                 ` Baoquan He
2023-05-22 12:02                                   ` Thomas Gleixner
2023-05-22 14:34                                     ` Baoquan He
2023-05-22 20:21                                       ` Thomas Gleixner
2023-05-22 20:44                                         ` Thomas Gleixner
2023-05-23  9:35                                         ` Baoquan He
2023-05-19 13:49                   ` Excessive TLB flush ranges Thomas Gleixner
2023-05-16  8:21             ` Russell King (Oracle)
2023-05-16  8:19           ` Russell King (Oracle)
2023-05-16  8:44             ` Thomas Gleixner
2023-05-16  8:48               ` Russell King (Oracle)
2023-05-16 12:09                 ` Thomas Gleixner
2023-05-16 13:42                   ` Uladzislau Rezki
2023-05-16 14:38                     ` Thomas Gleixner
2023-05-16 15:01                       ` Uladzislau Rezki
2023-05-16 17:04                         ` Thomas Gleixner
2023-05-17 11:26                           ` Uladzislau Rezki
2023-05-17 11:58                             ` Thomas Gleixner
2023-05-17 12:15                               ` Uladzislau Rezki
2023-05-17 16:32                                 ` Thomas Gleixner
2023-05-19 10:01                                   ` Uladzislau Rezki
2023-05-19 14:56                                     ` Thomas Gleixner
2023-05-19 15:14                                       ` Uladzislau Rezki
2023-05-19 16:32                                         ` Thomas Gleixner
2023-05-19 17:02                                           ` Uladzislau Rezki
2023-05-16 17:56                       ` Nadav Amit
2023-05-16 19:32                         ` Thomas Gleixner
2023-05-17  0:23                           ` Thomas Gleixner
2023-05-17  1:23                             ` Nadav Amit
2023-05-17 10:31                               ` Thomas Gleixner
2023-05-17 11:47                                 ` Thomas Gleixner
2023-05-17 22:41                                   ` Nadav Amit
2023-05-17 14:43                                 ` Mark Rutland
2023-05-17 16:41                                   ` Thomas Gleixner
2023-05-17 22:57                                 ` Nadav Amit
2023-05-19 11:49                                   ` Thomas Gleixner
2023-05-17 12:12                               ` Russell King (Oracle)
2023-05-17 23:14                                 ` Nadav Amit
2023-05-15 18:17 ` Uladzislau Rezki
2023-05-16  2:26   ` Baoquan He
2023-05-16  6:40     ` Thomas Gleixner
2023-05-16  8:07       ` Baoquan He
2023-05-16  8:10         ` Baoquan He
2023-05-16  8:45         ` Russell King (Oracle)
2023-05-16  9:13           ` Thomas Gleixner
2023-05-16  8:54         ` Thomas Gleixner
2023-05-16  9:48           ` Baoquan He
2023-05-15 20:02 ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87353x9y3l.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@lst.de \
    --cc=jogness@linutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=lstoakes@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peterz@infradead.org \
    --cc=urezki@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox