linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, Yosry Ahmed <yosry@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@gentwo.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	Harry Yoo <harry@kernel.org>
Subject: Re: [PATCH v2] mm/percpu, memcontrol: Per-memcg-lruvec percpu accounting
Date: Tue, 14 Apr 2026 13:26:31 -0700	[thread overview]
Message-ID: <20260414202631.2753640-1-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <20260404033844.1892595-1-joshua.hahnjy@gmail.com>

On Fri,  3 Apr 2026 20:38:43 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> enum memcg_stat_item includes memory that is tracked on a per-memcg
> level, but not at a per-node (and per-lruvec) level. Diagnosing
> memory pressure for memcgs in multi-NUMA systems can be difficult,
> since not all of the memory accounted in memcg can be traced back
> to a node. In scenarios where numa nodes in an memcg are asymmetrically
> stressed, this difference can be invisible to the user.
> 
> Convert MEMCG_PERCPU_B from a memcg_stat_item to a memcg_node_stat_item
> to give visibility into per-node breakdowns for percpu allocations.
> 
> This will get us closer to being able to know the memcg and physical
> association of all memory on the system. Specifically for percpu, this
> granularity will help demonstrate footprint differences on systems with
> asymmetric NUMA nodes.
> 
> Because percpu memory is accounted at a sub-PAGE_SIZE level, we must
> account node level statistics (accounted in PAGE_SIZE units) and
> memcg-lruvec statistics separately. Account node statistics when the pcpu
> pages are allocated, and account memcg-lruvec statistics when pcpu
> objects are handed out.

[...snip...]

> @@ -55,7 +55,8 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
>  			    struct page **pages, int page_start, int page_end)
>  {
>  	unsigned int cpu;
> -	int i;
> +	int nr_pages = page_end - page_start;
> +	int i, nid;
>  
>  	for_each_possible_cpu(cpu) {
>  		for (i = page_start; i < page_end; i++) {
> @@ -65,6 +66,10 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
>  				__free_page(page);
>  		}
>  	}
> +
> +	for_each_node(nid)
> +		mod_node_page_state(NODE_DATA(nid), NR_PERCPU_B,
> +				-1L * nr_pages * nr_cpus_node(nid) * PAGE_SIZE);
>  }
>  
>  /**
> @@ -84,7 +89,8 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
>  			    gfp_t gfp)
>  {
>  	unsigned int cpu, tcpu;
> -	int i;
> +	int nr_pages = page_end - page_start;
> +	int i, nid;
>  
>  	gfp |= __GFP_HIGHMEM;
>  
> @@ -97,6 +103,10 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
>  				goto err;
>  		}
>  	}
> +
> +	for_each_node(nid)
> +		mod_node_page_state(NODE_DATA(nid), NR_PERCPU_B,
> +				    nr_pages * nr_cpus_node(nid) * PAGE_SIZE);
>  	return 0;

Hello reviewers,

Since I submitted this, I have been thinking about the feedback that Sashiko
has given this patch [1]. Harry has already pointed out the points about
drifting due to CPU hotplug, but one there is one particular concern that
I have been trying to tackle with no avail.

The issue is, pcpu allocations for CPUs on node A may actually fall back to
node B, if node A is out of space and under pressure. This design seems to be
intentional, to prevent memory pressure from failing these allocations.

However, this means that we cannot charge percpu memory based on the number
of CPUs present on a node, because although the memory "belongs" to the node
(since the CPU it actually belongs to is on the node), the memory can be
serviced from elsewhere.

To handle this, I've tried several approaches. All of them were either too
expensive (iterating through all pages at allocation / free time) or introduces
new drift (I thought of managing per-chunk statistics as well).

To be honest, I think I'm out of ideas at this point :/ So I wanted to see
what others thought about how to track physical locations for pcpu allocations
that were allocated via fallback. Are these rare enough that we are OK with
the misattributing here? Should we eat the cost of iterating through all pages
to find out where it is physically?

Or is this patch not worth pursuing at the moment? ; -)

I hope this all makes sense. Thank you all in advance!
Joshua

[1] https://sashiko.dev/#/patchset/20260404033844.1892595-1-joshua.hahnjy%40gmail.com


      parent reply	other threads:[~2026-04-14 20:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04  3:38 Joshua Hahn
2026-04-04  4:56 ` Matthew Wilcox
2026-04-04  5:03   ` Joshua Hahn
2026-04-08  2:40 ` Harry Yoo (Oracle)
2026-04-08  3:40   ` Joshua Hahn
2026-04-08  3:52     ` Harry Yoo (Oracle)
2026-04-14 20:26 ` Joshua Hahn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260414202631.2753640-1-joshua.hahnjy@gmail.com \
    --to=joshua.hahnjy@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=david@kernel.org \
    --cc=dennis@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=vbabka@kernel.org \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox