From: Donet Tom <donettom@linux.ibm.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Gregory Price <gourry@gourry.net>,
Johannes Weiner <hannes@cmpxchg.org>,
Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Waiman Long <longman@redhat.com>,
Chen Ridong <chenridong@huaweicloud.com>,
Tejun Heo <tj@kernel.org>, Michal Koutny <mkoutny@suse.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [RFC PATCH 0/6] mm/memcontrol: Make memcg limits tier-aware
Date: Tue, 24 Mar 2026 16:00:34 +0530 [thread overview]
Message-ID: <13eb0f7a-95bc-4337-9d38-a06db0700777@linux.ibm.com> (raw)
In-Reply-To: <20260223223830.586018-1-joshua.hahnjy@gmail.com>
Hi Josua
On 2/24/26 4:08 AM, Joshua Hahn wrote:
> Memory cgroups provide an interface that allow multiple workloads on a
> host to co-exist, and establish both weak and strong memory isolation
> guarantees. For large servers and small embedded systems alike, memcgs
> provide an effective way to provide a baseline quality of service for
> protected workloads.
>
> This works, because for the most part, all memory is equal (except for
> zram / zswap). Restricting a cgroup's memory footprint restricts how
> much it can hurt other workloads competing for memory. Likewise, setting
> memory.low or memory.min limits can provide weak and strong guarantees
> to the performance of a cgroup.
>
> However, on systems with tiered memory (e.g. CXL / compressed memory),
> the quality of service guarantees that memcg limits enforced become less
> effective, as memcg has no awareness of the physical location of its
> charged memory. In other words, a workload that is well-behaved within
> its memcg limits may still be hurting the performance of other
> well-behaving workloads on the system by hogging more than its
> "fair share" of toptier memory.
>
> Introduce tier-aware memcg limits, which scale memory.low/high to
> reflect the ratio of toptier:total memory the cgroup has access.
>
> Take the following scenario as an example:
> On a host with 3:1 toptier:lowtier, say 150G toptier, and 50Glowtier,
> setting a cgroup's limits to:
> memory.min: 15G
> memory.low: 20G
> memory.high: 40G
> memory.max: 50G
>
> Will be enforced at the toptier as:
> memory.min: 15G
> memory.toptier_low: 15G (20 * 150/200)
> memory.toptier_high: 30G (40 * 150/200)
> memory.max: 50G
Currently, the high and low thresholds are adjusted based on the ratio
of top-tier to total memory. One concern I see is that if the working
set size exceeds the top-tier high threshold, it could lead to frequent
demotions and promotions. Instead, would it make sense to introduce a
tunable knob to configure the top-tier high threshold?
Another concern is that if the lower-tier memory size is very large, the
cgroup may end up getting only a small portion of higher-tier memory.
>
> Let's say that there are 4 such cgroups on the host. Previously, it would
> be possible for 3 hosts to completely take over all of DRAM, while one
> cgroup could only access the lowtier memory. In the perspective of a
> tier-agnostic memcg limit enforcement, the three cgroups are all
> well-behaved, consuming within their memory limits.
>
> This is not to say that the scenario above is incorrect. In fact, for
> letting the hottest cgroups run in DRAM while pushing out colder cgroups
> to lowtier memory lets the system perform the most aggregate work total.
>
> But for other scenarios, the target might not be maximizing aggregate
> work, but maximizing the minimum performance guarantee for each
> individual workload (think hosts shared across different users, such as
> VM hosting services).
>
> To reflect these two scenarios, introduce a sysctl tier_aware_memcg,
> which allows the host to toggle between enforcing and overlooking
> toptier memcg limit breaches.
>
> This work is inspired & based off of Kaiyang Zhao's work from 2024 [1],
> where he referred to this concept as "memory tiering fairness".
> The biggest difference in the implementations lie in how toptier memory
> is tracked; in his implementation, an lruvec stat aggregation is done on
> each usage check, while in this implementation, a new cacheline is
> introduced in page_coutner to keep track of toptier usage (Kaiyang also
> introduces a new cachline in page_counter, but only uses it to cache
> capacity and thresholds). This implementation also extends the memory
> limit enforcement to memory.high as well.
>
> [1] https://lore.kernel.org/linux-mm/20240920221202.1734227-1-kaiyang2@cs.cmu.edu/
>
> ---
> Joshua Hahn (6):
> mm/memory-tiers: Introduce tier-aware memcg limit sysfs
> mm/page_counter: Introduce tiered memory awareness to page_counter
> mm/memory-tiers, memcontrol: Introduce toptier capacity updates
> mm/memcontrol: Charge and uncharge from toptier
> mm/memcontrol, page_counter: Make memory.low tier-aware
> mm/memcontrol: Make memory.high tier-aware
>
> include/linux/memcontrol.h | 21 ++++-
> include/linux/memory-tiers.h | 30 +++++++
> include/linux/page_counter.h | 31 ++++++-
> include/linux/swap.h | 3 +-
> kernel/cgroup/cpuset.c | 2 +-
> kernel/cgroup/dmem.c | 2 +-
> mm/memcontrol-v1.c | 6 +-
> mm/memcontrol.c | 155 +++++++++++++++++++++++++++++++----
> mm/memory-tiers.c | 63 ++++++++++++++
> mm/page_counter.c | 77 ++++++++++++++++-
> mm/vmscan.c | 24 ++++--
> 11 files changed, 376 insertions(+), 38 deletions(-)
>
next prev parent reply other threads:[~2026-03-24 10:31 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 22:38 Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 1/6] mm/memory-tiers: Introduce tier-aware memcg limit sysfs Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 2/6] mm/page_counter: Introduce tiered memory awareness to page_counter Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 3/6] mm/memory-tiers, memcontrol: Introduce toptier capacity updates Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 4/6] mm/memcontrol: Charge and uncharge from toptier Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 5/6] mm/memcontrol, page_counter: Make memory.low tier-aware Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 6/6] mm/memcontrol: Make memory.high tier-aware Joshua Hahn
2026-03-11 22:05 ` Bing Jiao
2026-03-12 19:44 ` Joshua Hahn
2026-03-24 10:51 ` Donet Tom
2026-03-24 15:23 ` Gregory Price
2026-03-24 15:46 ` Donet Tom
2026-03-24 15:44 ` Joshua Hahn
2026-03-24 16:06 ` Donet Tom
2026-02-24 11:27 ` [RFC PATCH 0/6] mm/memcontrol: Make memcg limits tier-aware Michal Hocko
2026-02-24 16:13 ` Joshua Hahn
2026-02-24 18:49 ` Gregory Price
2026-02-24 20:03 ` Kaiyang Zhao
2026-02-26 8:04 ` Michal Hocko
2026-02-26 16:08 ` Joshua Hahn
2026-03-24 10:30 ` Donet Tom [this message]
2026-03-24 14:58 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=13eb0f7a-95bc-4337-9d38-a06db0700777@linux.ibm.com \
--to=donettom@linux.ibm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=david@kernel.org \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kaiyang2@cs.cmu.edu \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox