* [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg
@ 2026-04-03 14:08 Eric Chanudet
2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet
2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet
0 siblings, 2 replies; 7+ messages in thread
From: Eric Chanudet @ 2026-04-03 14:08 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tejun Heo, Michal Koutný
Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier,
Christian König, Maxime Ripard, Albert Esteve, Dave Airlie,
Eric Chanudet
It was suggested previously[1] to introduce a knob for dmem region to
double charge dmem and memcg at the will of the administrator.
This RFC tries do that in the dmem controller through the cgroupfs
interface already available and walk through the problems that creates.
[1] https://lore.kernel.org/all/a446b598-5041-450b-aaa9-3c39a09ff6a0@amd.com/
Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
Eric Chanudet (2):
mm/memcontrol: add page-level charge/uncharge functions
cgroup/dmem: add a node to double charge in memcg
include/linux/memcontrol.h | 4 +++
kernel/cgroup/dmem.c | 86 ++++++++++++++++++++++++++++++++++++++++++++--
mm/memcontrol.c | 24 +++++++++++++
3 files changed, 111 insertions(+), 3 deletions(-)
---
base-commit: 4b9c36c83b34f710da9573291404f6a2246251c1
change-id: 20260327-cgroup-dmem-memcg-double-charge-0f100a9ffbf2
Best regards,
--
Eric Chanudet <echanude@redhat.com>
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions 2026-04-03 14:08 [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg Eric Chanudet @ 2026-04-03 14:08 ` Eric Chanudet 2026-04-03 17:15 ` Johannes Weiner 2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet 1 sibling, 1 reply; 7+ messages in thread From: Eric Chanudet @ 2026-04-03 14:08 UTC (permalink / raw) To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, Eric Chanudet Expose functions to charge/uncharge memcg with a number of pages instead of a folio. Signed-off-by: Eric Chanudet <echanude@redhat.com> --- include/linux/memcontrol.h | 4 ++++ mm/memcontrol.c | 24 ++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 70b685a85bf4cd0e830c9c0253e4d48f75957fe4..32f03890f13e06551fc910515eb478597c1235d8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -642,6 +642,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target, int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp); +int mem_cgroup_try_charge_pages(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages); /** * mem_cgroup_charge - Charge a newly allocated folio to a cgroup. * @folio: Folio to charge. @@ -692,6 +694,8 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios) __mem_cgroup_uncharge_folios(folios); } +void mem_cgroup_uncharge_pages(struct mem_cgroup *memcg, unsigned int nr_pages); + void mem_cgroup_replace_folio(struct folio *old, struct folio *new); void mem_cgroup_migrate(struct folio *old, struct folio *new); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 772bac21d15584ce495cba6ad2eebfa7f693677f..49ed069a2dafd5d26d77e6737dffe7e64ba5118c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4764,6 +4764,24 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) return ret; } +/** + * mem_cgroup_try_charge_pages - charge pages to a memory cgroup + * @memcg: memory cgroup to charge + * @gfp_mask: reclaim mode + * @nr_pages: number of pages to charge + * + * Try to charge @nr_pages to @memcg through try_charge_memcg. + * + * Returns 0 on success, an error code on failure. + */ +int mem_cgroup_try_charge_pages(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) +{ + return try_charge(memcg, gfp_mask, nr_pages); +} +EXPORT_SYMBOL_GPL(mem_cgroup_try_charge_pages); + + /** * mem_cgroup_charge_hugetlb - charge the memcg for a hugetlb folio * @folio: folio being charged @@ -4948,6 +4966,12 @@ void __mem_cgroup_uncharge_folios(struct folio_batch *folios) uncharge_batch(&ug); } +void mem_cgroup_uncharge_pages(struct mem_cgroup *memcg, unsigned int nr_pages) +{ + memcg_uncharge(memcg, nr_pages); +} +EXPORT_SYMBOL_GPL(mem_cgroup_uncharge_pages); + /** * mem_cgroup_replace_folio - Charge a folio's replacement. * @old: Currently circulating folio. -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions 2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet @ 2026-04-03 17:15 ` Johannes Weiner 2026-04-07 14:31 ` Eric Chanudet 0 siblings, 1 reply; 7+ messages in thread From: Johannes Weiner @ 2026-04-03 17:15 UTC (permalink / raw) To: Eric Chanudet Cc: Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie On Fri, Apr 03, 2026 at 10:08:35AM -0400, Eric Chanudet wrote: > Expose functions to charge/uncharge memcg with a number of pages instead > of a folio. > > Signed-off-by: Eric Chanudet <echanude@redhat.com> No naked number accounting, please. The reason existing charge paths require you to pass an object is because there are other memory attributes we need to track (such as NUMA node location). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions 2026-04-03 17:15 ` Johannes Weiner @ 2026-04-07 14:31 ` Eric Chanudet 0 siblings, 0 replies; 7+ messages in thread From: Eric Chanudet @ 2026-04-07 14:31 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie On Fri, Apr 03, 2026 at 01:15:03PM -0400, Johannes Weiner wrote: > On Fri, Apr 03, 2026 at 10:08:35AM -0400, Eric Chanudet wrote: > > Expose functions to charge/uncharge memcg with a number of pages instead > > of a folio. > > > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > > No naked number accounting, please. The reason existing charge paths > require you to pass an object is because there are other memory > attributes we need to track (such as NUMA node location). > Understood, thank you. I'll change to using a mem_cgroup_dmem_{,un}charge functions and a memory.stat entry as well, similar to what is done for mem_cgroup_sk_{,un}charge. -- Eric Chanudet ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg 2026-04-03 14:08 [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg Eric Chanudet 2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet @ 2026-04-03 14:08 ` Eric Chanudet 2026-04-07 12:48 ` Michal Koutný 1 sibling, 1 reply; 7+ messages in thread From: Eric Chanudet @ 2026-04-03 14:08 UTC (permalink / raw) To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, Michal Koutný Cc: cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie, Eric Chanudet Introduce /cgroupfs/<>/dmem.memcg to make allocations in a dmem controlled region also be charged in memcg. This is disabled by default and requires the administrator to configure it through the cgroupfs before the first charge occurs. The memcg is derived from the pool's cgroup, if it exists, since the pool holds a ref to the dmem cgroup state keeping the cgroup alive and stable. The behavior is quirky. Since keeping track of each allocation would add a fair amount of logic without solving the problem entirely, disable the memcg switch once the first charge is issued. Having this as a dynamic configuration doesn't seem relevant anyway. Signed-off-by: Eric Chanudet <echanude@redhat.com> --- kernel/cgroup/dmem.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 83 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c index 9d95824dc6fa09422274422313b63c25986596de..b65ae8cf0c302ce3773a7aa5f0d6d8223d2c10c9 100644 --- a/kernel/cgroup/dmem.c +++ b/kernel/cgroup/dmem.c @@ -17,6 +17,7 @@ #include <linux/refcount.h> #include <linux/rculist.h> #include <linux/slab.h> +#include <linux/memcontrol.h> struct dmem_cgroup_region { /** @@ -76,6 +77,9 @@ struct dmem_cgroup_pool_state { refcount_t ref; bool inited; + + bool memcg; + bool memcg_locked; }; /* @@ -162,6 +166,14 @@ set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val) page_counter_set_max(&pool->cnt, val); } +static void +set_resource_memcg(struct dmem_cgroup_pool_state *pool, u64 val) +{ + /* Cannot change once a charge happened. */ + if (!pool->memcg_locked) + pool->memcg = !!val; +} + static u64 get_resource_low(struct dmem_cgroup_pool_state *pool) { return pool ? READ_ONCE(pool->cnt.low) : 0; @@ -182,11 +194,17 @@ static u64 get_resource_current(struct dmem_cgroup_pool_state *pool) return pool ? page_counter_read(&pool->cnt) : 0; } +static u64 get_resource_memcg(struct dmem_cgroup_pool_state *pool) +{ + return pool ? READ_ONCE(pool->memcg) : 0; +} + static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool) { set_resource_min(rpool, 0); set_resource_low(rpool, 0); set_resource_max(rpool, PAGE_COUNTER_MAX); + set_resource_memcg(rpool, 0); } static void dmemcs_offline(struct cgroup_subsys_state *css) @@ -609,6 +627,20 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region) return pool; } +static struct mem_cgroup *mem_cgroup_from_cgroup(struct cgroup *c) +{ + struct cgroup_subsys_state *css; + + if (mem_cgroup_disabled()) + return NULL; + + rcu_read_lock(); + css = cgroup_e_css(c, &memory_cgrp_subsys); + rcu_read_unlock(); + + return mem_cgroup_from_css(css); +} + /** * dmem_cgroup_uncharge() - Uncharge a pool. * @pool: Pool to uncharge. @@ -624,6 +656,13 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size) return; page_counter_uncharge(&pool->cnt, size); + + struct mem_cgroup *memcg = mem_cgroup_from_cgroup(pool->cs->css.cgroup); + + if (pool->memcg && memcg) + mem_cgroup_uncharge_pages(memcg, + PAGE_ALIGN(size) >> PAGE_SHIFT); + css_put(&pool->cs->css); dmemcg_pool_put(pool); } @@ -655,6 +694,8 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size, struct dmemcg_state *cg; struct dmem_cgroup_pool_state *pool; struct page_counter *fail; + struct mem_cgroup *memcg; + unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT; int ret; *ret_pool = NULL; @@ -670,7 +711,22 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size, pool = get_cg_pool_unlocked(cg, region); if (IS_ERR(pool)) { ret = PTR_ERR(pool); - goto err; + goto err_css_put; + } + + pool->memcg_locked = true; + memcg = get_mem_cgroup_from_current(); + if (pool->memcg && memcg) { + ret = mem_cgroup_try_charge_pages(memcg, GFP_KERNEL, nr_pages); + if (ret) { + /* + * No dmem_cgroup_state_evict_valuable() could help, + * there's no ret_limit_pool to return. + */ + ret = -ENOMEM; + dmemcg_pool_put(pool); + goto err_memcg_put; + } } if (!page_counter_try_charge(&pool->cnt, size, &fail)) { @@ -681,14 +737,21 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size, } dmemcg_pool_put(pool); ret = -EAGAIN; - goto err; + goto err_uncharge_memcg; } + mem_cgroup_put(memcg); + /* On success, reference from get_current_dmemcs is transferred to *ret_pool */ *ret_pool = pool; return 0; -err: +err_uncharge_memcg: + if (pool->memcg && memcg) + mem_cgroup_uncharge_pages(memcg, nr_pages); +err_memcg_put: + mem_cgroup_put(memcg); +err_css_put: css_put(&cg->css); return ret; } @@ -846,6 +909,17 @@ static ssize_t dmem_cgroup_region_max_write(struct kernfs_open_file *of, return dmemcg_limit_write(of, buf, nbytes, off, set_resource_max); } +static int dmem_cgroup_memcg_show(struct seq_file *sf, void *v) +{ + return dmemcg_limit_show(sf, v, get_resource_memcg); +} + +static ssize_t dmem_cgroup_memcg_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + return dmemcg_limit_write(of, buf, nbytes, off, set_resource_memcg); +} + static struct cftype files[] = { { .name = "capacity", @@ -874,6 +948,12 @@ static struct cftype files[] = { .seq_show = dmem_cgroup_region_max_show, .flags = CFTYPE_NOT_ON_ROOT, }, + { + .name = "memcg", + .write = dmem_cgroup_memcg_write, + .seq_show = dmem_cgroup_memcg_show, + .flags = CFTYPE_NOT_ON_ROOT, + }, { } /* Zero entry terminates. */ }; -- 2.52.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg 2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet @ 2026-04-07 12:48 ` Michal Koutný 2026-04-07 23:35 ` Eric Chanudet 0 siblings, 1 reply; 7+ messages in thread From: Michal Koutný @ 2026-04-07 12:48 UTC (permalink / raw) To: Eric Chanudet Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie [-- Attachment #1: Type: text/plain, Size: 1679 bytes --] Hi. On Fri, Apr 03, 2026 at 10:08:36AM -0400, Eric Chanudet <echanude@redhat.com> wrote: > Introduce /cgroupfs/<>/dmem.memcg to make allocations in a dmem > controlled region also be charged in memcg. > > This is disabled by default and requires the administrator to configure > it through the cgroupfs before the first charge occurs. This somehow dropped the reason from [1] that this should be per-cgroup controllable. Is that still valid? (Otherwise, I'd ask why not make this a simple boot cmdline parameter like cgroup.memory=nokmem.) > @@ -624,6 +656,13 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size) > return; > > page_counter_uncharge(&pool->cnt, size); > + > + struct mem_cgroup *memcg = mem_cgroup_from_cgroup(pool->cs->css.cgroup); This is not necessarily same memcg as when the dmem was charged via current (imagine dmem controller to depth N, but memcg only to N-1; charge, then memcg is enabled up to N so this would attempt uncharge from new memcg at level N, possibly going negative). There is a question whether dmem should enforce same-depth hierarchies with `dmem_cgrp_subsys.depends_on = 1 << memory_cgrp_id` (see io_cgrp_subsys for comparison). And eventually, if per-cgroup attribute is desired, it would make greater sense to me if that attribute was on the parent level, so that siblings competing among each other are always of the same composition (i.e. all w/out dmem or all including dmem). This likely results in this extra-charging attribute to be properly hierarchical. HTH, Michal [1] https://lore.kernel.org/all/a446b598-5041-450b-aaa9-3c39a09ff6a0@amd.com/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 265 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg 2026-04-07 12:48 ` Michal Koutný @ 2026-04-07 23:35 ` Eric Chanudet 0 siblings, 0 replies; 7+ messages in thread From: Eric Chanudet @ 2026-04-07 23:35 UTC (permalink / raw) To: Michal Koutný Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton, Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo, cgroups, linux-mm, linux-kernel, dri-devel, T.J. Mercier, Christian König, Maxime Ripard, Albert Esteve, Dave Airlie On Tue, Apr 07, 2026 at 02:48:05PM +0200, Michal Koutný wrote: > Hi. > > On Fri, Apr 03, 2026 at 10:08:36AM -0400, Eric Chanudet <echanude@redhat.com> wrote: > > Introduce /cgroupfs/<>/dmem.memcg to make allocations in a dmem > > controlled region also be charged in memcg. > > > > This is disabled by default and requires the administrator to configure > > it through the cgroupfs before the first charge occurs. > > This somehow dropped the reason from [1] that this should be per-cgroup > controllable. Is that still valid? (Otherwise, I'd ask why not make this > a simple boot cmdline parameter like cgroup.memory=nokmem.) [1] argued it should be controllable per dmem region more than per cgroup. For example, a cgroup configured +memory and +dmem with one region charging only dmem and the other double charging memcg and dmem. cgroup.memory=nokmem karg would double charge all or none of the regions for all cgroups iiuc. So maybe just make memcg CFTYPE_ONLY_ON_ROOT and inherit that configuration in all children would do? > > @@ -624,6 +656,13 @@ void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size) > > return; > > > > page_counter_uncharge(&pool->cnt, size); > > + > > + struct mem_cgroup *memcg = mem_cgroup_from_cgroup(pool->cs->css.cgroup); > > This is not necessarily same memcg as when the dmem was charged via > current (imagine dmem controller to depth N, but memcg only to N-1; > charge, then memcg is enabled up to N so this would attempt uncharge > from new memcg at level N, possibly going negative). > > There is a question whether dmem should enforce same-depth hierarchies > with `dmem_cgrp_subsys.depends_on = 1 << memory_cgrp_id` (see > io_cgrp_subsys for comparison). Thanks! I'll look into depends_on. > And eventually, if per-cgroup attribute is desired, it would make > greater sense to me if that attribute was on the parent level, so that > siblings competing among each other are always of the same composition > (i.e. all w/out dmem or all including dmem). This likely results in this > extra-charging attribute to be properly hierarchical. > > HTH, > Michal > > [1] https://lore.kernel.org/all/a446b598-5041-450b-aaa9-3c39a09ff6a0@amd.com/ Thank you for the feedback, -- Eric Chanudet ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-07 23:36 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-04-03 14:08 [PATCH RFC 0/2] cgroup/mem: add a node to double charge in memcg Eric Chanudet 2026-04-03 14:08 ` [PATCH RFC 1/2] mm/memcontrol: add page-level charge/uncharge functions Eric Chanudet 2026-04-03 17:15 ` Johannes Weiner 2026-04-07 14:31 ` Eric Chanudet 2026-04-03 14:08 ` [PATCH RFC 2/2] cgroup/dmem: add a node to double charge in memcg Eric Chanudet 2026-04-07 12:48 ` Michal Koutný 2026-04-07 23:35 ` Eric Chanudet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox