From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
"Paul E. McKenney" <paulmck@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux.com>,
Martin Liu <liumartin@google.com>,
David Rientjes <rientjes@google.com>,
christian.koenig@amd.com, Shakeel Butt <shakeel.butt@linux.dev>,
SeongJae Park <sj@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Sweet Tea Dorminy <sweettea-kernel@dorminy.me>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <liam.howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Christian Brauner <brauner@kernel.org>,
Wei Yang <richard.weiyang@gmail.com>,
David Hildenbrand <david@redhat.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
Yu Zhao <yuzhao@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Mateusz Guzik <mjguzik@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Aboorva Devarajan <aboorvad@linux.ibm.com>
Subject: Re: [PATCH v16 1/3] lib: Introduce hierarchical per-cpu counters
Date: Wed, 14 Jan 2026 14:19:38 -0500 [thread overview]
Message-ID: <67bdfd38-1acf-4b90-9e34-ce752632ddb1@efficios.com> (raw)
In-Reply-To: <aWfHTLDA8-Fja_gD@tiehlicka>
On 2026-01-14 11:41, Michal Hocko wrote:
>
> One thing you should probably mention here is the memory consumption of
> the structure.
Good point.
The most important parts are the per-cpu counters and the tree items
which propagate the carry.
In the proposed implementation, the per-cpu counters are allocated
within per-cpu data structures, so they end up using:
nr_possible_cpus * sizeof(unsigned long)
In addition, the tree items are appended at the end of the mm_struct.
The size of those items is defined by the per_nr_cpu_order_config
table "nr_items" field.
Each item is aligned on cacheline size (typically 64 bytes) to minimize
false sharing.
Here is the footprint for a few nr_cpus on a 64-bit arch:
nr_cpus percpu counters (bytes) nr_items items size (bytes) total (bytes)
2 16 1 64 80
4 32 3 192 224
8 64 7 448 512
64 512 21 1344 1856
128 1024 21 1344 2368
256 2048 37 2368 4416
512 4096 73 4672 8768
There are of course various trade offs we can make here. We can:
* Increase the n-arity of the intermediate items to shrink the nr_items
required for a given nr_cpus. This will increase contention of carry
propagation across more cores.
* Remove cacheline alignment of intermediate tree items. This will
shrink the memory needed for tree items, but will increase false
sharing.
* Represent intermediate tree items on a byte rather than long.
This further reduces the memory required for intermediate tree
items, but further increases false sharing.
* Represent per-cpu counters on bytes rather than long. This makes
the "sum" operation trickier, because it needs to iterate on the
intermediate carry propagation nodes as well and synchronize with
ongoing "tree add" operations. It further reduces memory use.
* Implement a custom strided allocator for intermediate items carry
propagation bytes. This shares cachelines across different tree
instances, keeping good locality. This ensures that all accesses
from a given location in the machine topology touch the same
cacheline for the various tree instances. This adds complexity,
but provides compactness as well as minimal false-sharing.
Compared to this, the upstream percpu counters use a 32-bit integer per-cpu
(4 bytes), and accumulate within a 64-bit global value.
So yes, there is an extra memory footprint added by the current hpcc
implementation, but if it's an issue we have various options to consider
to reduce its footprint.
Is it OK if I add this discussion to the commit message, or should it
be also added into the high level design doc within
Documentation/core-api/percpu-counter-tree.rst ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2026-01-14 19:19 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 14:59 [PATCH v16 0/3] Improve proc RSS accuracy and OOM killer latency Mathieu Desnoyers
2026-01-14 14:59 ` [PATCH v16 1/3] lib: Introduce hierarchical per-cpu counters Mathieu Desnoyers
2026-01-14 16:41 ` Michal Hocko
2026-01-14 19:19 ` Mathieu Desnoyers [this message]
2026-01-14 14:59 ` [PATCH v16 2/3] mm: Improve RSS counter approximation accuracy for proc interfaces Mathieu Desnoyers
2026-01-14 16:48 ` Michal Hocko
2026-01-14 19:21 ` Mathieu Desnoyers
2026-01-14 14:59 ` [PATCH v16 3/3] mm: Reduce latency of OOM killer task selection with 2-pass algorithm Mathieu Desnoyers
2026-01-14 17:06 ` Michal Hocko
2026-01-14 19:36 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67bdfd38-1acf-4b90-9e34-ce752632ddb1@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=aboorvad@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=christian.koenig@amd.com \
--cc=cl@linux.com \
--cc=david@redhat.com \
--cc=dennis@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=liam.howlett@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=liumartin@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mjguzik@gmail.com \
--cc=paulmck@kernel.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=sweettea-kernel@dorminy.me \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox