Re: [PATCH] alloc_tag: add per-NUMA node stats

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Casey Chen <cachen@purestorage.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	surenb@google.com,  corbet@lwn.net, dennis@kernel.org,
	tj@kernel.org, cl@gentwo.org,  Vlastimil Babka <vbabka@suse.cz>,
	mhocko@suse.com, jackmanb@google.com,  hannes@cmpxchg.org,
	ziy@nvidia.com, roman.gushchin@linux.dev,  harry.yoo@oracle.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 linux-doc@vger.kernel.org, yzhong@purestorage.com,
	 Sourav Panda <souravpanda@google.com>
Subject: Re: [PATCH] alloc_tag: add per-NUMA node stats
Date: Tue, 8 Jul 2025 14:52:11 -0700 (PDT)	[thread overview]
Message-ID: <3c9b5773-83ed-4f13-11a8-fcc162c8c483@google.com> (raw)
In-Reply-To: <cvrr3u7n424dhroqi7essjm53kqrqjomatly2b7us4b6rymcox@3ttbatss6ypy>

On Wed, 18 Jun 2025, Kent Overstreet wrote:

> On Tue, Jun 10, 2025 at 05:30:53PM -0600, Casey Chen wrote:
> > Add support for tracking per-NUMA node statistics in /proc/allocinfo.
> > Previously, each alloc_tag had a single set of counters (bytes and
> > calls), aggregated across all CPUs. With this change, each CPU can
> > maintain separate counters for each NUMA node, allowing finer-grained
> > memory allocation profiling.
> > 
> > This feature is controlled by the new
> > CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS option:
> > 
> > * When enabled (=y), the output includes per-node statistics following
> >   the total bytes/calls:
> > 
> > <size> <calls> <tag info>
> > ...
> > 315456       9858     mm/dmapool.c:338 func:pool_alloc_page
> >         nid0     94912        2966
> >         nid1     220544       6892
> > 7680         60       mm/dmapool.c:254 func:dma_pool_create
> >         nid0     4224         33
> >         nid1     3456         27
> 
> I just received a report of memory reclaim issues where it seems DMA32
> is stuffed full.
> 
> So naturally, instrumenting to see what's consuming DMA32 is going to be
> the first thing to do, which made me think of your patchset.
> 
> I wonder if we should think about something a bit more general, so it's
> easy to break out accounting different ways depending on what we want to
> debug.
> 

Right, per-node memory attribution, or per zone, is very useful.

Casey, what's the latest status of your patch?  Using alloc_tag for 
attributing memory overheads has been exceedingly useful for Google Cloud 
and adding better insight it for per-node breakdown would be even better.  

Our use case is quite simple: we sell guest memory to the customer as 
persistent hugetlb and keep some memory on the host for ourselves (VMM, 
host userspace, host kernel).  We track every page of that overhead memory 
because memory pressure here can cause all sorts of issues like userspace 
unresponsiveness.  We also want to sell as much guest memory as possible 
to avoid stranding cpus.

To do that, per-node breakdown of memory allocations would be a tremendous 
help.  We have memory that is asymmetric for NUMA, even for memory that 
has affinity to the NIC.  Being able to inspect the origins of memory for 
a specific NUMA node that is under memory pressure where other NUMA nodes 
are not under memory pressure would be excellent.

Adding Sourav Panda as well as he may have additional thoughts on this.

next prev parent reply	other threads:[~2025-07-08 21:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-10 23:30 Casey Chen
2025-06-11  1:21 ` Andrew Morton
2025-06-11  1:33   ` Casey Chen
2025-06-11  3:47     ` Kent Overstreet
2025-06-11  3:41   ` Kent Overstreet
2025-06-12  5:36 ` David Wang
2025-06-12 15:37   ` Kent Overstreet
2025-06-18 22:16 ` Kent Overstreet
2025-07-08 21:52   ` David Rientjes [this message]
2025-07-08 22:38     ` Christoph Lameter (Ampere)
2025-07-09 19:14       ` David Rientjes
2025-07-08 22:53     ` Casey Chen
2025-07-08 23:07     ` Casey Chen
2025-07-10  5:54     ` Sourav Panda
  -- strict thread matches above, loose matches on Subject: below --
2025-05-30  0:39 [PATCH 0/1] alloc_tag: add per-numa " Casey Chen
2025-05-30  0:39 ` [PATCH] " Casey Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c9b5773-83ed-4f13-11a8-fcc162c8c483@google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cachen@purestorage.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dennis@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=jackmanb@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=roman.gushchin@linux.dev \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=yzhong@purestorage.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox