linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Donet Tom <donettom@linux.ibm.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Gregory Price <gourry@gourry.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Waiman Long <longman@redhat.com>,
	Chen Ridong <chenridong@huaweicloud.com>,
	Tejun Heo <tj@kernel.org>, Michal Koutny <mkoutny@suse.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [RFC PATCH 0/6] mm/memcontrol: Make memcg limits tier-aware
Date: Tue, 24 Mar 2026 16:00:34 +0530	[thread overview]
Message-ID: <13eb0f7a-95bc-4337-9d38-a06db0700777@linux.ibm.com> (raw)
In-Reply-To: <20260223223830.586018-1-joshua.hahnjy@gmail.com>

Hi Josua

On 2/24/26 4:08 AM, Joshua Hahn wrote:
> Memory cgroups provide an interface that allow multiple workloads on a
> host to co-exist, and establish both weak and strong memory isolation
> guarantees. For large servers and small embedded systems alike, memcgs
> provide an effective way to provide a baseline quality of service for
> protected workloads.
>
> This works, because for the most part, all memory is equal (except for
> zram / zswap). Restricting a cgroup's memory footprint restricts how
> much it can hurt other workloads competing for memory. Likewise, setting
> memory.low or memory.min limits can provide weak and strong guarantees
> to the performance of a cgroup.
>
> However, on systems with tiered memory (e.g. CXL / compressed memory),
> the quality of service guarantees that memcg limits enforced become less
> effective, as memcg has no awareness of the physical location of its
> charged memory. In other words, a workload that is well-behaved within
> its memcg limits may still be hurting the performance of other
> well-behaving workloads on the system by hogging more than its
> "fair share" of toptier memory.
>
> Introduce tier-aware memcg limits, which scale memory.low/high to
> reflect the ratio of toptier:total memory the cgroup has access.
>
> Take the following scenario as an example:
> On a host with 3:1 toptier:lowtier, say 150G toptier, and 50Glowtier,
> setting a cgroup's limits to:
> 	memory.min:  15G
> 	memory.low:  20G
> 	memory.high: 40G
> 	memory.max:  50G
>
> Will be enforced at the toptier as:
> 	memory.min:          15G
> 	memory.toptier_low:  15G (20 * 150/200)
> 	memory.toptier_high: 30G (40 * 150/200)
> 	memory.max:          50G



Currently, the high and low thresholds are adjusted based on the ratio 
of top-tier to total memory. One concern I see is that if the working 
set size exceeds the top-tier high threshold, it could lead to frequent 
demotions and promotions. Instead, would it make sense to introduce a 
tunable knob to configure the top-tier high threshold?

Another concern is that if the lower-tier memory size is very large, the 
cgroup may end up getting only a small portion of higher-tier memory.


>
> Let's say that there are 4 such cgroups on the host. Previously, it would
> be possible for 3 hosts to completely take over all of DRAM, while one
> cgroup could only access the lowtier memory. In the perspective of a
> tier-agnostic memcg limit enforcement, the three cgroups are all
> well-behaved, consuming within their memory limits.
>
> This is not to say that the scenario above is incorrect. In fact, for
> letting the hottest cgroups run in DRAM while pushing out colder cgroups
> to lowtier memory lets the system perform the most aggregate work total.
>
> But for other scenarios, the target might not be maximizing aggregate
> work, but maximizing the minimum performance guarantee for each
> individual workload (think hosts shared across different users, such as
> VM hosting services).
>
> To reflect these two scenarios, introduce a sysctl tier_aware_memcg,
> which allows the host to toggle between enforcing and overlooking
> toptier memcg limit breaches.
>
> This work is inspired & based off of Kaiyang Zhao's work from 2024 [1],
> where he referred to this concept as "memory tiering fairness".
> The biggest difference in the implementations lie in how toptier memory
> is tracked; in his implementation, an lruvec stat aggregation is done on
> each usage check, while in this implementation, a new cacheline is
> introduced in page_coutner to keep track of toptier usage (Kaiyang also
> introduces a new cachline in page_counter, but only uses it to cache
> capacity and thresholds). This implementation also extends the memory
> limit enforcement to memory.high as well.
>
> [1] https://lore.kernel.org/linux-mm/20240920221202.1734227-1-kaiyang2@cs.cmu.edu/
>
> ---
> Joshua Hahn (6):
>    mm/memory-tiers: Introduce tier-aware memcg limit sysfs
>    mm/page_counter: Introduce tiered memory awareness to page_counter
>    mm/memory-tiers, memcontrol: Introduce toptier capacity updates
>    mm/memcontrol: Charge and uncharge from toptier
>    mm/memcontrol, page_counter: Make memory.low tier-aware
>    mm/memcontrol: Make memory.high tier-aware
>
>   include/linux/memcontrol.h   |  21 ++++-
>   include/linux/memory-tiers.h |  30 +++++++
>   include/linux/page_counter.h |  31 ++++++-
>   include/linux/swap.h         |   3 +-
>   kernel/cgroup/cpuset.c       |   2 +-
>   kernel/cgroup/dmem.c         |   2 +-
>   mm/memcontrol-v1.c           |   6 +-
>   mm/memcontrol.c              | 155 +++++++++++++++++++++++++++++++----
>   mm/memory-tiers.c            |  63 ++++++++++++++
>   mm/page_counter.c            |  77 ++++++++++++++++-
>   mm/vmscan.c                  |  24 ++++--
>   11 files changed, 376 insertions(+), 38 deletions(-)
>


  parent reply	other threads:[~2026-03-24 10:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 22:38 Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 1/6] mm/memory-tiers: Introduce tier-aware memcg limit sysfs Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 2/6] mm/page_counter: Introduce tiered memory awareness to page_counter Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 3/6] mm/memory-tiers, memcontrol: Introduce toptier capacity updates Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 4/6] mm/memcontrol: Charge and uncharge from toptier Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 5/6] mm/memcontrol, page_counter: Make memory.low tier-aware Joshua Hahn
2026-02-23 22:38 ` [RFC PATCH 6/6] mm/memcontrol: Make memory.high tier-aware Joshua Hahn
2026-03-11 22:05   ` Bing Jiao
2026-03-12 19:44     ` Joshua Hahn
2026-03-24 10:51   ` Donet Tom
2026-03-24 15:23     ` Gregory Price
2026-03-24 15:46       ` Donet Tom
2026-03-24 15:44     ` Joshua Hahn
2026-03-24 16:06       ` Donet Tom
2026-02-24 11:27 ` [RFC PATCH 0/6] mm/memcontrol: Make memcg limits tier-aware Michal Hocko
2026-02-24 16:13   ` Joshua Hahn
2026-02-24 18:49     ` Gregory Price
2026-02-24 20:03       ` Kaiyang Zhao
2026-02-26  8:04     ` Michal Hocko
2026-02-26 16:08       ` Joshua Hahn
2026-03-24 10:30 ` Donet Tom [this message]
2026-03-24 14:58   ` Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13eb0f7a-95bc-4337-9d38-a06db0700777@linux.ibm.com \
    --to=donettom@linux.ibm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kaiyang2@cs.cmu.edu \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox