Re: [PATCH RFC] alloc_tag: add option to pick the first codetag along callchain

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kent Overstreet <kent.overstreet@linux.dev>
To: David Wang <00107082@163.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	akpm@linux-foundation.org,  hannes@cmpxchg.org,
	pasha.tatashin@soleen.com, souravpanda@google.com,
	 vbabka@suse.cz, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] alloc_tag: add option to pick the first codetag along callchain
Date: Tue, 6 Jan 2026 18:26:18 -0500	[thread overview]
Message-ID: <aV2VWbY4Rb_w-QTs@moria.home.lan> (raw)
In-Reply-To: <37169c79.a0e5.19b93a2768f.Coremail.00107082@163.com>

On Tue, Jan 06, 2026 at 10:07:36PM +0800, David Wang wrote:
> I agree, the accounting would be incorrect for alloc sites down the callchain, and would confuse things.
> When the call chain has more than one codetag, correct accounting for one codetag would always mean incorrect
> accounting for other codetags, right? But I don't think picking the first tag would make the accounting totally incorrect. 

The trouble is you end up in situations where you have an alloc tag on
the stack, but then you're doing an internal allocation that definitely
should not be accounted to the outer alloc tag.

E.g. there's a lot of internal mm allocations like this; object
extension vectors was I think the first place where it came up,
vmalloc() also has its own internal data structures that require
allocations.

Just using the outermost tag means these inner allocations will get
accounted to other unrelated alloc tags _effectively at random_; meaning
if we're burning more memory than we should be in a place like that it
will never show up in a way that we'll notice and be able to track it
down.

> Totally agree.
> I used to sum by filepath prefix to aggregate memory usage for drivers.
> Take usb subsystem for example,  on my system, the data say my usb drivers use up 200k memory,
> and if pick first codetag, the data say ~350K.   Which one is lying, or are those two both lying. I am  confused.
> 
> I think this also raises the question of what is the *correct* way to make use of /proc/allocinfo...

So yes, summing by filepath prefix is the way we want things to work.

But getting there - with a fully reliable end result - is a process.

What you want to do is - preferably on a reasonably idle machine, aside
from the code you're looking at - just look at everything in
/proc/allocinfo and sort by size. Look at the biggest ones that might be
relevant to your subsystem, and look for any that are suspicious and
perhaps should be accounted to your code. Yes, that may entail reading
code :)

This is why accounting to the innermost tag is important - by doing it
this way, if an allocation is being accounted at the wrong callsite
they'll all be lumped together at the specific callsite that needs to be
fixed, which then shows up higher than normal in /proc/allocations, so
that it gets looked at.

> >The fact that you have to be explicit about where the accounting happens
> >via _noprof is a feature, not a bug :)
> 
> But it is tedious... :(

That's another way of saying it's easy :)

Spot an allocation with insufficiently fine grained accounting and it's
generally a 3-5 line patch to fix it, I've been doing those here and
there - e.g. mempools, workqueues, rhashtables.

One trick I did with rhashtables that may be relevant to other
subsystems: rhashtable does background processing for your hash table,
which will do new allocations for your hash table out of a workqueue.

So rhashtable_init() gets wrapped in alloc_hooks(), and then it stashes
the pointer to that alloc tag in the rhashtable, and uses it later for
all those asynchronous allocations.

This means that instead of seeing a ton of memory accounted to the
rhashtable code, with no idea of which rhashtable is burning memory -
all the rhashtable allocations are accounted to the callsit of the
initialization, meaning it's trivial to see which one is burning memory.

next prev parent reply	other threads:[~2026-01-06 23:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  6:43 David Wang
2026-01-05 21:12 ` Suren Baghdasaryan
2026-01-06  3:50   ` David Wang
2026-01-06 10:54     ` Kent Overstreet
2026-01-06 14:07       ` David Wang
2026-01-06 23:26         ` Kent Overstreet [this message]
2026-01-07  3:38           ` David Wang
2026-01-07  4:07             ` Kent Overstreet
2026-01-07  6:16               ` David Wang
2026-01-07 16:13                 ` Kent Overstreet
2026-01-07 17:50                   ` David Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aV2VWbY4Rb_w-QTs@moria.home.lan \
    --to=kent.overstreet@linux.dev \
    --cc=00107082@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox