From: Steven Rostedt <rostedt@goodmis.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Casey Chen <cachen@purestorage.com>,
linux-mm@kvack.org, surenb@google.com, yzhong@purestorage.com,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Namhyung Kim <namhyung@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Ian Rogers <irogers@google.com>
Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats
Date: Mon, 2 Jun 2025 18:08:26 -0400 [thread overview]
Message-ID: <20250602180826.3a0aafc0@gandalf.local.home> (raw)
In-Reply-To: <qpdpyuexlakigm266ufklmqzqxuj6tdqyta4ccphapwjexwujg@cjrt2ecmzjwg>
On Mon, 2 Jun 2025 17:52:49 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:
> +cc Steven, Peter, Ingo
>
> On Mon, Jun 02, 2025 at 01:48:43PM -0700, Casey Chen wrote:
> > On Fri, May 30, 2025 at 5:05 PM Kent Overstreet
> > <kent.overstreet@linux.dev> wrote:
> > >
> > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote:
> > > > On Thu, May 29, 2025 at 6:11 PM Kent Overstreet
> > > > <kent.overstreet@linux.dev> wrote:
> > > > >
> > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote:
> > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_list rcu_list")
> > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > > > > >
> > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in total and per-NUMA
> > > > > > nodes are displayed in a single row for each alloc_tag in /proc/allocinfo.
> > > > > > Also percpu allocation is marked and its stats is stored on NUMA node 0.
> > > > > > For example, the resulting file looks like below.
> > > > > >
> > > > > > percpu y total 8588 2147 numa0 8588 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc
> > > > > > percpu n total 447232 1747 numa0 269568 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk
> > > > > > percpu n total 83200 325 numa0 30976 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one
> > > > > > ...
> > > > > > percpu n total 364800 5700 numa0 109440 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg
> > > > > > percpu n total 1249280 39040 numa0 374784 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1376 [mlx5_core] func:alloc_cmd_box
> > > > >
> > > > > Err, what is 'percpu y/n'?
> > > > >
> > > >
> > > > Mark percpu allocation with 'percpu y/n' because for percpu allocation
> > > > stats, 'bytes' is per-cpu, we have to multiply it by the number of
> > > > CPUs to get the total bytes. Mark it so we know the exact amount of
> > > > memory used. Any /proc/allocinfo parser can understand it and make
> > > > correct calculations.
> > >
> > > Ok, just wanted to be sure it wasn't something else. Let's shorten that
> > > though, a single character should suffice (we already have a header that
> > > can explain what it is) - if you're growing the width we don't want to
> > > overflow.
> > >
> >
> > Does it have a header ?
> >
> > > >
> > > > > >
> > > > > > To save memory, we dynamically allocate per-NUMA node stats counter once the
> > > > > > system boots up and knows how many NUMA nodes available. percpu allocators
> > > > > > are used for memory allocation hence increase PERCPU_DYNAMIC_RESERVE.
> > > > > >
> > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so the memory for
> > > > > > these counters are not accounted in profiling stats.
> > > > > >
> > > > > > For loadable modules, __alloc_percpu_gfp() is called and memory is accounted.
> > > > >
> > > > > Intruiging, but I'd make it a kconfig option, AFAIK this would mainly be
> > > > > of interest to people looking at optimizing allocations to make sure
> > > > > they're on the right numa node?
> > > >
> > > > Yes, to help us know if there is an NUMA imbalance issue and make some
> > > > optimizations. I can make it a kconfig. Does anybody else have any
> > > > opinion about this feature ? Thanks!
> > >
> > > I would like to see some other opinions from potential users, have you
> > > been circulating it?
> >
> > We have been using it internally for a while. I don't know who the
> > potential users are and how to reach them so I am sharing it here to
> > collect opinions from others.
>
> I'd ask the tracing and profiling people for their thoughts, and anyone
> working on tooling that might consume this.
>
> I'm wondering if there might be some way of feeding more info into perf,
> since profiling cache misses is a big thing that it does.
>
> It might be a long shot, since we're just accounting usage, or it might
> spark some useful ideas.
>
> Can you share a bit about how you're using this internally?
I'm guessing this is to show where in the kernel functions are using memory?
I added to the Cc people who tend to use perf for analysis then just having
those that maintain the kernel side of perf.
-- Steve
next prev parent reply other threads:[~2025-06-02 22:07 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 0:39 Casey Chen
2025-05-30 0:39 ` [PATCH] " Casey Chen
2025-05-30 1:11 ` [PATCH 0/1] " Kent Overstreet
2025-05-30 21:45 ` Casey Chen
2025-05-31 0:05 ` Kent Overstreet
2025-06-02 20:48 ` Casey Chen
2025-06-02 21:32 ` Suren Baghdasaryan
2025-06-03 15:00 ` Suren Baghdasaryan
2025-06-03 17:34 ` Kent Overstreet
2025-06-04 0:55 ` Casey Chen
2025-06-04 15:21 ` Suren Baghdasaryan
2025-06-04 15:50 ` Kent Overstreet
2025-06-10 0:21 ` Casey Chen
2025-06-10 15:56 ` Suren Baghdasaryan
2025-06-03 20:00 ` Casey Chen
2025-06-03 20:18 ` Suren Baghdasaryan
2025-06-02 21:52 ` Kent Overstreet
2025-06-02 22:08 ` Steven Rostedt [this message]
2025-06-02 23:35 ` Kent Overstreet
2025-06-03 6:46 ` Ian Rogers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250602180826.3a0aafc0@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=acme@kernel.org \
--cc=cachen@purestorage.com \
--cc=irogers@google.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-mm@kvack.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=surenb@google.com \
--cc=yzhong@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox