From: Suren Baghdasaryan <surenb@google.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
akpm@linux-foundation.org, Peter Zijlstra <peterz@infradead.org>,
kent.overstreet@linux.dev, yuzhao@google.com,
minchan@google.com, shakeel.butt@linux.dev,
souravpanda@google.com, pasha.tatashin@soleen.com,
00107082@163.com, quic_zhenhuah@quicinc.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator
Date: Tue, 28 Jan 2025 15:43:13 -0800 [thread overview]
Message-ID: <CAJuCfpHrwmhNK8rT6sQ6BA6iOfwPXDO0yrcwG3OnZmdvTijEcA@mail.gmail.com> (raw)
In-Reply-To: <20250128143549.62f458a7@gandalf.local.home>
On Tue, Jan 28, 2025 at 11:35 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 27 Jan 2025 11:38:32 -0800
> Suren Baghdasaryan <surenb@google.com> wrote:
>
> > On Sun, Jan 26, 2025 at 8:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >
> > > On 1/26/25 08:02, Suren Baghdasaryan wrote:
> > > > When a sizable code section is protected by a disabled static key, that
> > > > code gets into the instruction cache even though it's not executed and
> > > > consumes the cache, increasing cache misses. This can be remedied by
> > > > moving such code into a separate uninlined function. The improvement
> >
> > Sorry, I missed adding Steven Rostedt into the CC list since his
> > advice was instrumental in finding the way to optimize the static key
> > performance in this patch. Added now.
> >
> > >
> > > Weird, I thought the static_branch_likely/unlikely/maybe was already
> > > handling this by the unlikely case being a jump to a block away from the
> > > fast-path stream of instructions, thus making it less likely to get cached.
> > > AFAIU even plain likely()/unlikely() should do this, along with branch
> > > prediction hints.
> >
> > This was indeed an unexpected overhead when I measured it on Android.
> > Cache pollution was my understanding of the cause for this high
> > overhead after Steven told me to try uninlining the protected code. He
> > has done something similar in the tracing subsystem. But maybe I
> > misunderstood the real reason. Steven, could you please verify if my
> > understanding of the high overhead cause is correct here? Maybe there
> > is something else at play that I missed?
>
> From what I understand, is that the compiler will only move code to the end
> of a function with the unlikely(). But, the code after the function could
> also be in the control flow path. If you have several functions that are
> called together, by adding code to the unlikely() cases may not help the
> speed.
>
> I made an effort to make the tracepoint code call functions instead of
> having everything inlined. It actually brought down the size of the text of
> the kernel, but looking in the change logs I never posted benchmarks. But
> I'm sure making the size of the scheduler text section smaller probably did
> help.
>
> > > That would be in line with my understanding above. Does the arm64 compiler
> > > not do it as well as x86 (could be maybe found out by disassembling) or the
> > > Pixel6 cpu somhow caches these out of line blocks more aggressively and only
> > > a function call stops it?
> >
> > I'll disassemble the code and will see what it looks like.
>
> I think I asked you to do that too ;-)
Yes you did! And I disassembled almost each of these functions during
my investigation but in my infinite wisdom I did not save any of them.
So, now I need to do that again to answer Vlastimil's question. I'll
try to do that today.
>
> >
> > >
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > >
> > > Kinda sad that despite the static key we have to control a lot by the
> > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition.
> >
> > I agree. If there is a better way to fix this regression I'm open to
> > changes. Let's wait for Steven to confirm my understanding before
> > proceeding.
>
> How slow is it to always do the call instead of inlining?
Let's see... The additional overhead if we always call is:
Little core: 2.42%
Middle core: 1.23%
Big core: 0.66%
Not a huge deal because the overhead of memory profiling when enabled
is much higher. So, maybe for simplicity I should indeed always call?
>
> -- Steve
next prev parent reply other threads:[~2025-01-28 23:43 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-26 7:02 [PATCH 1/3] mm: avoid extra mem_alloc_profiling_enabled() checks Suren Baghdasaryan
2025-01-26 7:02 ` [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator Suren Baghdasaryan
2025-01-26 16:47 ` Vlastimil Babka
2025-01-27 19:38 ` Suren Baghdasaryan
2025-01-28 19:35 ` Steven Rostedt
2025-01-28 23:43 ` Suren Baghdasaryan [this message]
2025-01-29 0:03 ` Steven Rostedt
2025-01-29 9:50 ` Vlastimil Babka
2025-01-29 17:26 ` Suren Baghdasaryan
2025-01-29 2:54 ` Suren Baghdasaryan
2025-01-29 9:38 ` Vlastimil Babka
2025-01-28 22:49 ` Peter Zijlstra
2025-01-26 7:02 ` [PATCH 3/3] alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJuCfpHrwmhNK8rT6sQ6BA6iOfwPXDO0yrcwG3OnZmdvTijEcA@mail.gmail.com \
--to=surenb@google.com \
--cc=00107082@163.com \
--cc=akpm@linux-foundation.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@google.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterz@infradead.org \
--cc=quic_zhenhuah@quicinc.com \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=souravpanda@google.com \
--cc=vbabka@suse.cz \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox