From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>, Song Liu <songliubraving@fb.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
"mcgrof@kernel.org" <mcgrof@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"hch@infradead.org" <hch@infradead.org>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"song@kernel.org" <song@kernel.org>,
Kernel Team <Kernel-team@fb.com>,
"pmladek@suse.com" <pmladek@suse.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"dborkman@redhat.com" <dborkman@redhat.com>,
"edumazet@google.com" <edumazet@google.com>,
"bp@alien8.de" <bp@alien8.de>, "mbenes@suse.cz" <mbenes@suse.cz>,
"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
Date: Tue, 19 Apr 2022 19:03:11 -0700 [thread overview]
Message-ID: <20220420020311.6ojfhcooumflnbbk@MacBook-Pro.local.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAHk-=wh6um5AFR6TObsYY0v+jUSZxReiZM_5Kh4gAMU8Z8-jVw@mail.gmail.com>
On Tue, Apr 19, 2022 at 12:20:39PM -0700, Linus Torvalds wrote:
> On Tue, Apr 19, 2022 at 11:42 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > I'd say that bpf_prog_pack was a cure for symptoms and this project tries
> > to address more general problem.
> > But you are right, it'll take some time and won't land in 5.19.
>
> Just to update people: I've just applied Song's [1/4] patch, which
> means that the whole current hugepage vmalloc thing is effectively
> disabled (because nothing opts in).
>
> And I suspect that will be the status for 5.18, unless somebody comes
> up with some very strong arguments for (re-)starting using huge pages.
Here is the quote from Song's cover letter for bpf_prog_pack series:
Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this could also add significant
pressure to instruction TLB. High iTLB pressure usually causes slow down
for the whole system, which includes visible performance degradation for
production workloads.
The last sentence is the key. We've added this feature not because of bpf
programs themselves. So calling this feature an optimization is not quite
correct. The number of bpf programs on the production server doesn't matter.
The programs come and go all the time. That is the key here. The 4k
module_alloc() plus set_memory_ro/x done by the JIT break down huge pages and
increase TLB pressure on the kernel code. That creates visible performance
degradation for normal user space workloads that are not doing anything bpf
related. mm folks can fill in the details here. My understanding it's
something to do with identity mapping.
So we're not trying to improve bpf performance. We're trying to make
sure that bpf program load/unload doesn't affect the speed of the kernel.
Generalizing bpf_prog_alloc to modules would be nice, but it's not clear
what benefits such optimization might have. It's orthogonal here.
So I argue that all 4 Song's fixes are necessary in 5.18.
We need an additional zeroing patch too, of course, to make sure huge page
doesn't have garbage at alloc time and it's cleaned after prog is unloaded.
Regarding JIT spraying and other concerns. Short answer: nothing changed.
JIT spraying was mitigated with start address randomization and invalid
instruction padding. Both features are still present.
Constant blinding is also fully functional.
Any kind of generalization of bpf_prog_pack into general mm feature would be
nice, but it cannot be done as opportunistic cache. We need a guarantee that
bpf prog/unload won't recreate the issue with kernel performance degradation. I
suspect we would need bpf_prog_pack in the current form for foreseeable future.
next prev parent reply other threads:[~2022-04-20 2:03 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-15 16:44 Song Liu
2022-04-15 16:44 ` [PATCH v4 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-15 17:43 ` Rik van Riel
2022-04-15 16:44 ` [PATCH v4 bpf 2/4] page_alloc: use vmalloc_huge for large system hash Song Liu
2022-04-15 17:43 ` Rik van Riel
2022-04-25 7:07 ` Geert Uytterhoeven
2022-04-25 8:17 ` Linus Torvalds
2022-04-25 8:24 ` Geert Uytterhoeven
2022-04-15 16:44 ` [PATCH v4 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-15 18:06 ` Rik van Riel
2022-06-16 16:10 ` Dave Hansen
2022-04-15 16:44 ` [PATCH v4 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-04-15 19:05 ` [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
2022-04-16 1:34 ` Song Liu
2022-04-16 1:42 ` Luis Chamberlain
2022-04-16 1:43 ` Luis Chamberlain
2022-04-16 5:08 ` Christoph Hellwig
2022-04-16 19:55 ` Song Liu
2022-04-16 20:30 ` Linus Torvalds
2022-04-16 22:26 ` Song Liu
2022-04-18 10:06 ` Mike Rapoport
2022-04-19 0:44 ` Luis Chamberlain
2022-04-19 1:56 ` Edgecombe, Rick P
2022-04-19 5:36 ` Song Liu
2022-04-19 18:42 ` Mike Rapoport
2022-04-19 19:20 ` Linus Torvalds
2022-04-20 2:03 ` Alexei Starovoitov [this message]
2022-04-20 2:18 ` Linus Torvalds
2022-04-20 14:42 ` Song Liu
2022-04-20 18:28 ` Luis Chamberlain
2022-04-21 7:29 ` Song Liu
2022-04-21 3:25 ` Nicholas Piggin
2022-04-21 5:48 ` Linus Torvalds
2022-04-21 6:02 ` Linus Torvalds
2022-04-21 9:07 ` Nicholas Piggin
2022-04-21 8:57 ` Nicholas Piggin
2022-04-21 15:44 ` Linus Torvalds
2022-04-21 23:30 ` Nicholas Piggin
2022-04-22 0:49 ` Linus Torvalds
2022-04-22 1:51 ` Nicholas Piggin
2022-04-22 2:31 ` Linus Torvalds
2022-04-22 2:57 ` Nicholas Piggin
2022-04-21 15:47 ` Edgecombe, Rick P
2022-04-21 16:15 ` Linus Torvalds
2022-04-22 0:12 ` Nicholas Piggin
2022-04-22 2:29 ` Edgecombe, Rick P
2022-04-22 2:47 ` Linus Torvalds
2022-04-22 16:54 ` Edgecombe, Rick P
2022-04-22 3:08 ` Nicholas Piggin
2022-04-22 4:31 ` Nicholas Piggin
2022-04-22 17:10 ` Edgecombe, Rick P
2022-04-22 20:22 ` Edgecombe, Rick P
2022-04-22 3:33 ` Nicholas Piggin
2022-04-21 9:47 ` Nicholas Piggin
2022-04-19 21:24 ` Luis Chamberlain
2022-04-19 23:58 ` Edgecombe, Rick P
2022-04-20 7:58 ` Petr Mladek
2022-04-19 18:20 ` Mike Rapoport
2022-04-24 17:43 ` Linus Torvalds
2022-04-25 6:48 ` Song Liu
2022-04-21 3:19 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220420020311.6ojfhcooumflnbbk@MacBook-Pro.local.dhcp.thefacebook.com \
--to=alexei.starovoitov@gmail.com \
--cc=Kernel-team@fb.com \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dborkman@redhat.com \
--cc=edumazet@google.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=imbrenda@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbenes@suse.cz \
--cc=mcgrof@kernel.org \
--cc=pmladek@suse.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=song@kernel.org \
--cc=songliubraving@fb.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox