From: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Gregory Price <gourry@gourry.net>,
Johannes Weiner <hannes@cmpxchg.org>,
Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Waiman Long <longman@redhat.com>,
Chen Ridong <chenridong@huaweicloud.com>,
Tejun Heo <tj@kernel.org>, Michal Koutny <mkoutny@suse.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [RFC PATCH 0/6] mm/memcontrol: Make memcg limits tier-aware
Date: Mon, 23 Feb 2026 14:38:23 -0800 [thread overview]
Message-ID: <20260223223830.586018-1-joshua.hahnjy@gmail.com> (raw)
Memory cgroups provide an interface that allow multiple workloads on a
host to co-exist, and establish both weak and strong memory isolation
guarantees. For large servers and small embedded systems alike, memcgs
provide an effective way to provide a baseline quality of service for
protected workloads.
This works, because for the most part, all memory is equal (except for
zram / zswap). Restricting a cgroup's memory footprint restricts how
much it can hurt other workloads competing for memory. Likewise, setting
memory.low or memory.min limits can provide weak and strong guarantees
to the performance of a cgroup.
However, on systems with tiered memory (e.g. CXL / compressed memory),
the quality of service guarantees that memcg limits enforced become less
effective, as memcg has no awareness of the physical location of its
charged memory. In other words, a workload that is well-behaved within
its memcg limits may still be hurting the performance of other
well-behaving workloads on the system by hogging more than its
"fair share" of toptier memory.
Introduce tier-aware memcg limits, which scale memory.low/high to
reflect the ratio of toptier:total memory the cgroup has access.
Take the following scenario as an example:
On a host with 3:1 toptier:lowtier, say 150G toptier, and 50Glowtier,
setting a cgroup's limits to:
memory.min: 15G
memory.low: 20G
memory.high: 40G
memory.max: 50G
Will be enforced at the toptier as:
memory.min: 15G
memory.toptier_low: 15G (20 * 150/200)
memory.toptier_high: 30G (40 * 150/200)
memory.max: 50G
Let's say that there are 4 such cgroups on the host. Previously, it would
be possible for 3 hosts to completely take over all of DRAM, while one
cgroup could only access the lowtier memory. In the perspective of a
tier-agnostic memcg limit enforcement, the three cgroups are all
well-behaved, consuming within their memory limits.
This is not to say that the scenario above is incorrect. In fact, for
letting the hottest cgroups run in DRAM while pushing out colder cgroups
to lowtier memory lets the system perform the most aggregate work total.
But for other scenarios, the target might not be maximizing aggregate
work, but maximizing the minimum performance guarantee for each
individual workload (think hosts shared across different users, such as
VM hosting services).
To reflect these two scenarios, introduce a sysctl tier_aware_memcg,
which allows the host to toggle between enforcing and overlooking
toptier memcg limit breaches.
This work is inspired & based off of Kaiyang Zhao's work from 2024 [1],
where he referred to this concept as "memory tiering fairness".
The biggest difference in the implementations lie in how toptier memory
is tracked; in his implementation, an lruvec stat aggregation is done on
each usage check, while in this implementation, a new cacheline is
introduced in page_coutner to keep track of toptier usage (Kaiyang also
introduces a new cachline in page_counter, but only uses it to cache
capacity and thresholds). This implementation also extends the memory
limit enforcement to memory.high as well.
[1] https://lore.kernel.org/linux-mm/20240920221202.1734227-1-kaiyang2@cs.cmu.edu/
---
Joshua Hahn (6):
mm/memory-tiers: Introduce tier-aware memcg limit sysfs
mm/page_counter: Introduce tiered memory awareness to page_counter
mm/memory-tiers, memcontrol: Introduce toptier capacity updates
mm/memcontrol: Charge and uncharge from toptier
mm/memcontrol, page_counter: Make memory.low tier-aware
mm/memcontrol: Make memory.high tier-aware
include/linux/memcontrol.h | 21 ++++-
include/linux/memory-tiers.h | 30 +++++++
include/linux/page_counter.h | 31 ++++++-
include/linux/swap.h | 3 +-
kernel/cgroup/cpuset.c | 2 +-
kernel/cgroup/dmem.c | 2 +-
mm/memcontrol-v1.c | 6 +-
mm/memcontrol.c | 155 +++++++++++++++++++++++++++++++----
mm/memory-tiers.c | 63 ++++++++++++++
mm/page_counter.c | 77 ++++++++++++++++-
mm/vmscan.c | 24 ++++--
11 files changed, 376 insertions(+), 38 deletions(-)
--
2.47.3
next reply other threads:[~2026-02-24 0:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 22:38 Joshua Hahn [this message]
2026-02-23 22:38 ` [RFC PATCH 1/6] mm/memory-tiers: Introduce tier-aware memcg limit sysfs Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 2/6] mm/page_counter: Introduce tiered memory awareness to page_counter Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 3/6] mm/memory-tiers, memcontrol: Introduce toptier capacity updates Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 4/6] mm/memcontrol: Charge and uncharge from toptier Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 5/6] mm/memcontrol, page_counter: Make memory.low tier-aware Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 6/6] mm/memcontrol: Make memory.high tier-aware Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260223223830.586018-1-joshua.hahnjy@gmail.com \
--to=joshua.hahnjy@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=kaiyang2@cs.cmu.edu \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox