linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "rppt@kernel.org" <rppt@kernel.org>,
	"p.raghav@samsung.com" <p.raghav@samsung.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"willy@infradead.org" <willy@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"song@kernel.org" <song@kernel.org>, "hch@lst.de" <hch@lst.de>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"zhengjun.xing@linux.intel.com" <zhengjun.xing@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Torvalds, Linus" <torvalds@linux-foundation.org>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"kbusch@kernel.org" <kbusch@kernel.org>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"a.manzanares@samsung.com" <a.manzanares@samsung.com>
Subject: Re: [PATCH bpf-next v1 RESEND 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec
Date: Thu, 3 Nov 2022 17:18:51 -0700	[thread overview]
Message-ID: <Y2Raa2wSQnXwd7j8@bombadil.infradead.org> (raw)
In-Reply-To: <eac58f163bd8b6829dff176e67b44c79570025f5.camel@intel.com>

On Thu, Nov 03, 2022 at 09:19:25PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2022-11-03 at 11:59 -0700, Luis Chamberlain wrote:
> > > > Mike Rapoport had presented about the Direct map fragmentation
> > > > problem
> > > > at Plumbers 2021 [0], and clearly mentioned modules / BPF /
> > > > ftrace /
> > > > kprobes as possible sources for this. Then Xing Zhengjun's 2021
> > > > performance
> > > > evaluation on whether using 2M/1G pages aggressively for the
> > > > kernel direct map
> > > > help performance [1] ends up generally recommending huge pages.
> > > > The work by Xing
> > > > though was about using huge pages *alone*, not using a strategy
> > > > such as in the
> > > > "bpf prog pack" to share one 2 MiB huge page for *all* small eBPF
> > > > programs,
> > > > and that I think is the real golden nugget here.
> > > > 
> > > > I contend therefore that the theoretical reduction of iTLB misses
> > > > by using
> > > > huge pages for "bpf prog pack" is not what gets your systems to
> > > > perform
> > > > somehow better. It should be simply that it reduces fragmentation
> > > > and
> > > > *this* generally can help with performance long term. If this is
> > > > accurate
> > > > then let's please separate the two aspects to this.
> > > 
> > > The direct map fragmentation is the reason for higher TLB miss
> > > rate, both
> > > for iTLB and dTLB.
> > 
> > OK so then whatever benchmark is running in tandem as eBPF JIT is
> > hammered
> > should *also* be measured with perf for iTLB and dTLB. ie, the patch
> > can
> > provide such results as a justifications.
> 
> Song had done some tests on the old prog pack version that to me seemed
> to indicate most (or possibly all) of the benefit was direct map
> fragmentation reduction.

Matches my observations but I also provided quite a bit of hints as to
*why* I think that is. I suggested lib/test_kmod.c as an example beefy
multithreaded selftests which really kicks the hell out of the kernel
with whatever crap you want to run. That is precicely how I uncovered
some odd kmod bug lingering for years.

> This was surprised me, since 2MB kernel text
> has shown to be beneficial.
> 
> Otherwise +1 to all these comments. This should be clear about what the
> benefits are. I would add, that this is also much nicer about TLB
> shootdowns than the existing way of loading text and saves some memory.
> 
> So I think there are sort of four areas of improvements:
> 1. Direct map fragmentation reduction (dTLB miss improvements).

The dTLB gains should be on the benchmark which runs in tandem to the
ebpf-JIT-monster-selftest, not on the ebpf-JIT-monster-selftest, right?

> This
> sort of does it as a side effect in this series, and the solution Mike
> is talking about is a more general, probably better one.
> 2. 2MB mapped JITs. This is the iTLB side. I think this is a decent
> solution for this, but surprisingly it doesn't seem to be useful for
> JITs. (modules testing TBD)

Yes I'm super eager to get this tested. In fact I wonder if one can
boot Linux with less memory too...

> 3. Loading text to reused allocation with per-cpu mappings. This
> reduces TLB shootdowns, which are a short term load and teardown time
> performance drag. My understanding is this is more of a problem on
> bigger systems with many CPUs. This series does a decent job at this,
> but the solution is not compatible with modules. Maybe ok since modules
> don't load as often as JITs.

There are some tests like fstests which make heavy use of module
removal. But as a side effect, indeed I like to reboot to have
a fresh system before running fstests. I guess fstests should run
with a heavily fragmented memory too as a side corner case thing.

> 4. Having BPF progs share pages. This saves memory. This series could
> probably easily get a number for how much.

Once that does hit modules / kprobes / ftrace, the impact is much
much greater obviously.

  Luis


  parent reply	other threads:[~2022-11-04  0:19 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 22:25 [PATCH bpf-next v1 RESEND 0/5] vmalloc_exec for modules and BPF programs Song Liu
2022-10-31 22:25 ` [PATCH bpf-next v1 RESEND 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec Song Liu
2022-11-02 23:41   ` Luis Chamberlain
2022-11-03 15:51     ` Mike Rapoport
2022-11-03 18:59       ` Luis Chamberlain
2022-11-03 21:19         ` Edgecombe, Rick P
2022-11-03 21:41           ` Song Liu
2022-11-03 23:33             ` Luis Chamberlain
2022-11-04  0:18           ` Luis Chamberlain [this message]
2022-11-04  3:29             ` Luis Chamberlain
2022-11-07  6:58         ` Mike Rapoport
2022-11-07 17:26           ` Luis Chamberlain
2022-11-07  6:40     ` Aaron Lu
2022-11-07 17:39       ` Luis Chamberlain
2022-11-07 18:35         ` Song Liu
2022-11-07 18:30       ` Song Liu
2022-10-31 22:25 ` [PATCH bpf-next v1 RESEND 2/5] x86/alternative: support vmalloc_exec() and vfree_exec() Song Liu
2022-11-02 22:21   ` Edgecombe, Rick P
2022-11-03 21:03     ` Song Liu
2022-10-31 22:25 ` [PATCH bpf-next v1 RESEND 3/5] bpf: use vmalloc_exec for bpf program and bpf dispatcher Song Liu
2022-10-31 22:25 ` [PATCH bpf-next v1 RESEND 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
2022-10-31 22:25 ` [PATCH bpf-next v1 RESEND 5/5] x86: use register_text_tail_vm Song Liu
2022-11-02 22:24   ` Edgecombe, Rick P
2022-11-03 21:04     ` Song Liu
2022-11-01 11:26 ` [PATCH bpf-next v1 RESEND 0/5] vmalloc_exec for modules and BPF programs Christoph Hellwig
2022-11-01 15:10   ` Song Liu
2022-11-02 20:45 ` Luis Chamberlain
2022-11-02 22:29 ` Edgecombe, Rick P
2022-11-03 21:13   ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2Raa2wSQnXwd7j8@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=a.manzanares@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=p.raghav@samsung.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox