From: Kent Overstreet <kent.overstreet@linux.dev>
To: Casey Chen <cachen@purestorage.com>
Cc: linux-mm@kvack.org, surenb@google.com, yzhong@purestorage.com,
Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats
Date: Mon, 2 Jun 2025 17:52:49 -0400 [thread overview]
Message-ID: <qpdpyuexlakigm266ufklmqzqxuj6tdqyta4ccphapwjexwujg@cjrt2ecmzjwg> (raw)
In-Reply-To: <CALCePG3FmtBcaGxCokM+5mdFYYV6h6jidgs8Kp=xkSy=bBTPOw@mail.gmail.com>
+cc Steven, Peter, Ingo
On Mon, Jun 02, 2025 at 01:48:43PM -0700, Casey Chen wrote:
> On Fri, May 30, 2025 at 5:05 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote:
> > > On Thu, May 29, 2025 at 6:11 PM Kent Overstreet
> > > <kent.overstreet@linux.dev> wrote:
> > > >
> > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote:
> > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_list rcu_list")
> > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > > > >
> > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in total and per-NUMA
> > > > > nodes are displayed in a single row for each alloc_tag in /proc/allocinfo.
> > > > > Also percpu allocation is marked and its stats is stored on NUMA node 0.
> > > > > For example, the resulting file looks like below.
> > > > >
> > > > > percpu y total 8588 2147 numa0 8588 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc
> > > > > percpu n total 447232 1747 numa0 269568 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk
> > > > > percpu n total 83200 325 numa0 30976 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one
> > > > > ...
> > > > > percpu n total 364800 5700 numa0 109440 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg
> > > > > percpu n total 1249280 39040 numa0 374784 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1376 [mlx5_core] func:alloc_cmd_box
> > > >
> > > > Err, what is 'percpu y/n'?
> > > >
> > >
> > > Mark percpu allocation with 'percpu y/n' because for percpu allocation
> > > stats, 'bytes' is per-cpu, we have to multiply it by the number of
> > > CPUs to get the total bytes. Mark it so we know the exact amount of
> > > memory used. Any /proc/allocinfo parser can understand it and make
> > > correct calculations.
> >
> > Ok, just wanted to be sure it wasn't something else. Let's shorten that
> > though, a single character should suffice (we already have a header that
> > can explain what it is) - if you're growing the width we don't want to
> > overflow.
> >
>
> Does it have a header ?
>
> > >
> > > > >
> > > > > To save memory, we dynamically allocate per-NUMA node stats counter once the
> > > > > system boots up and knows how many NUMA nodes available. percpu allocators
> > > > > are used for memory allocation hence increase PERCPU_DYNAMIC_RESERVE.
> > > > >
> > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so the memory for
> > > > > these counters are not accounted in profiling stats.
> > > > >
> > > > > For loadable modules, __alloc_percpu_gfp() is called and memory is accounted.
> > > >
> > > > Intruiging, but I'd make it a kconfig option, AFAIK this would mainly be
> > > > of interest to people looking at optimizing allocations to make sure
> > > > they're on the right numa node?
> > >
> > > Yes, to help us know if there is an NUMA imbalance issue and make some
> > > optimizations. I can make it a kconfig. Does anybody else have any
> > > opinion about this feature ? Thanks!
> >
> > I would like to see some other opinions from potential users, have you
> > been circulating it?
>
> We have been using it internally for a while. I don't know who the
> potential users are and how to reach them so I am sharing it here to
> collect opinions from others.
I'd ask the tracing and profiling people for their thoughts, and anyone
working on tooling that might consume this.
I'm wondering if there might be some way of feeding more info into perf,
since profiling cache misses is a big thing that it does.
It might be a long shot, since we're just accounting usage, or it might
spark some useful ideas.
Can you share a bit about how you're using this internally?
next prev parent reply other threads:[~2025-06-02 21:52 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 0:39 Casey Chen
2025-05-30 0:39 ` [PATCH] " Casey Chen
2025-05-30 1:11 ` [PATCH 0/1] " Kent Overstreet
2025-05-30 21:45 ` Casey Chen
2025-05-31 0:05 ` Kent Overstreet
2025-06-02 20:48 ` Casey Chen
2025-06-02 21:32 ` Suren Baghdasaryan
2025-06-03 15:00 ` Suren Baghdasaryan
2025-06-03 17:34 ` Kent Overstreet
2025-06-04 0:55 ` Casey Chen
2025-06-04 15:21 ` Suren Baghdasaryan
2025-06-04 15:50 ` Kent Overstreet
2025-06-10 0:21 ` Casey Chen
2025-06-10 15:56 ` Suren Baghdasaryan
2025-06-03 20:00 ` Casey Chen
2025-06-03 20:18 ` Suren Baghdasaryan
2025-06-02 21:52 ` Kent Overstreet [this message]
2025-06-02 22:08 ` Steven Rostedt
2025-06-02 23:35 ` Kent Overstreet
2025-06-03 6:46 ` Ian Rogers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=qpdpyuexlakigm266ufklmqzqxuj6tdqyta4ccphapwjexwujg@cjrt2ecmzjwg \
--to=kent.overstreet@linux.dev \
--cc=cachen@purestorage.com \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=yzhong@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox