* [PATCH 0/3] Make memcg location more flexible
@ 2026-02-25 16:22 Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
cgroups, linux-mm
Cc: Matthew Wilcox (Oracle)
Different memdescs should have the flexibility to place their memcg
wherever they need to. That means that instead of indirecting through
lruvec_stat_mod_folio() and extracting the memcg from the folio,
we need an interface which takes the memcg as a parameter. It turns
out we already need to do that for slabs, and this memcg_stat_mod()
interface also works for that use case.
Matthew Wilcox (Oracle) (3):
memcg: Add memcg_stat_mod()
memcg: Simplify mod_lruvec_kmem_state()
ptdesc: Account page tables to memcgs again
include/linux/mm.h | 15 +++++++++++++--
include/linux/mm_types.h | 6 +++---
include/linux/vmstat.h | 9 ++++++++-
mm/memcontrol.c | 40 ++++++++++++++--------------------------
4 files changed, 38 insertions(+), 32 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] memcg: Add memcg_stat_mod()
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
@ 2026-02-25 16:22 ` Matthew Wilcox (Oracle)
2026-02-25 19:22 ` Johannes Weiner
2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
cgroups, linux-mm
Cc: Matthew Wilcox (Oracle)
This function lets the caller find the memcg somewhere other than
page->memcg_data.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
include/linux/vmstat.h | 9 ++++++++-
mm/memcontrol.c | 23 +++++++++++++----------
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 3c9c266cf782..0da38ea25c97 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -518,7 +518,8 @@ static inline const char *vm_event_name(enum vm_event_item item)
void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
-
+void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
+ enum node_stat_item idx, long val);
void lruvec_stat_mod_folio(struct folio *folio,
enum node_stat_item idx, int val);
@@ -536,6 +537,12 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
}
+static inline void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
+ enum node_stat_item idx, long val)
+{
+ mod_node_page_state(pgdat, idx, val);
+}
+
static inline void lruvec_stat_mod_folio(struct folio *folio,
enum node_stat_item idx, int val)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a52da3a5e4fd..b356ef312bc2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -787,24 +787,27 @@ void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
mod_memcg_lruvec_state(lruvec, idx, val);
}
+void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
+ enum node_stat_item idx, long val)
+{
+ /* Untracked pages have no memcg, no lruvec. Update only the node */
+ if (!memcg) {
+ mod_node_page_state(pgdat, idx, val);
+ } else {
+ struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+ mod_lruvec_state(lruvec, idx, val);
+ }
+}
+
void lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
int val)
{
struct mem_cgroup *memcg;
pg_data_t *pgdat = folio_pgdat(folio);
- struct lruvec *lruvec;
rcu_read_lock();
memcg = folio_memcg(folio);
- /* Untracked pages have no memcg, no lruvec. Update only the node */
- if (!memcg) {
- rcu_read_unlock();
- mod_node_page_state(pgdat, idx, val);
- return;
- }
-
- lruvec = mem_cgroup_lruvec(memcg, pgdat);
- mod_lruvec_state(lruvec, idx, val);
+ memcg_stat_mod(memcg, pgdat, idx, val);
rcu_read_unlock();
}
EXPORT_SYMBOL(lruvec_stat_mod_folio);
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state()
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
@ 2026-02-25 16:22 ` Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
cgroups, linux-mm
Cc: Matthew Wilcox (Oracle)
Use the new memcg_stat_mod() which does exactly what
mod_lruvec_kmem_state() needs.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/memcontrol.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b356ef312bc2..8d9e4a42aecf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -815,24 +815,9 @@ EXPORT_SYMBOL(lruvec_stat_mod_folio);
void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
{
pg_data_t *pgdat = page_pgdat(virt_to_page(p));
- struct mem_cgroup *memcg;
- struct lruvec *lruvec;
rcu_read_lock();
- memcg = mem_cgroup_from_virt(p);
-
- /*
- * Untracked pages have no memcg, no lruvec. Update only the
- * node. If we reparent the slab objects to the root memcg,
- * when we free the slab object, we need to update the per-memcg
- * vmstats to keep it correct for the root memcg.
- */
- if (!memcg) {
- mod_node_page_state(pgdat, idx, val);
- } else {
- lruvec = mem_cgroup_lruvec(memcg, pgdat);
- mod_lruvec_state(lruvec, idx, val);
- }
+ memcg_stat_mod(mem_cgroup_from_virt(p), pgdat, idx, val);
rcu_read_unlock();
}
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] ptdesc: Account page tables to memcgs again
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
@ 2026-02-25 16:22 ` Matthew Wilcox (Oracle)
2026-02-25 16:55 ` Shakeel Butt
` (2 more replies)
2 siblings, 3 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
cgroups, linux-mm
Cc: Matthew Wilcox (Oracle), Axel Rasmussen
Commit f0c92726e89f removed the accounting of page tables to memcgs.
Reintroduce it.
Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
Reported-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
include/linux/mm.h | 15 +++++++++++++--
include/linux/mm_types.h | 6 +++---
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5be3d8a8f806..34bc6f00ed7b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
return compound_nr(ptdesc_page(ptdesc));
}
+static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
+{
+#ifdef CONFIG_MEMCG
+ return ptdesc->pt_memcg;
+#else
+ return NULL;
+#endif
+}
+
static inline void __pagetable_ctor(struct ptdesc *ptdesc)
{
pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
+ struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
__SetPageTable(ptdesc_page(ptdesc));
- mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
+ memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
}
static inline void pagetable_dtor(struct ptdesc *ptdesc)
{
pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
+ struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
ptlock_free(ptdesc);
__ClearPageTable(ptdesc_page(ptdesc));
- mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
+ memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
}
static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cc8ae722886..e9b1da04938a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
* @ptl: Lock for the page table.
* @__page_type: Same as page->page_type. Unused for page tables.
* @__page_refcount: Same as page refcount.
- * @pt_memcg_data: Memcg data. Tracked for page tables here.
+ * @pt_memcg: Memcg that this page table belongs to.
*
* This struct overlays struct page for now. Do not modify without a good
* understanding of the issues.
@@ -602,7 +602,7 @@ struct ptdesc {
unsigned int __page_type;
atomic_t __page_refcount;
#ifdef CONFIG_MEMCG
- unsigned long pt_memcg_data;
+ struct mem_cgroup *pt_memcg;
#endif
};
@@ -617,7 +617,7 @@ TABLE_MATCH(rcu_head, pt_rcu_head);
TABLE_MATCH(page_type, __page_type);
TABLE_MATCH(_refcount, __page_refcount);
#ifdef CONFIG_MEMCG
-TABLE_MATCH(memcg_data, pt_memcg_data);
+TABLE_MATCH(memcg_data, pt_memcg);
#endif
#undef TABLE_MATCH
static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
@ 2026-02-25 16:55 ` Shakeel Butt
2026-02-25 21:01 ` Matthew Wilcox
2026-02-25 20:57 ` Matthew Wilcox
2026-02-25 21:48 ` Axel Rasmussen
2 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2026-02-25 16:55 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, cgroups, linux-mm,
Axel Rasmussen
On Wed, Feb 25, 2026 at 04:22:17PM +0000, Matthew Wilcox (Oracle) wrote:
> Commit f0c92726e89f removed the accounting of page tables to memcgs.
> Reintroduce it.
>
> Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
> Reported-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> include/linux/mm.h | 15 +++++++++++++--
> include/linux/mm_types.h | 6 +++---
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..34bc6f00ed7b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
> return compound_nr(ptdesc_page(ptdesc));
> }
>
> +static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
> +{
> +#ifdef CONFIG_MEMCG
> + return ptdesc->pt_memcg;
> +#else
> + return NULL;
> +#endif
> +}
> +
> static inline void __pagetable_ctor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> __SetPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> }
>
> static inline void pagetable_dtor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> ptlock_free(ptdesc);
> __ClearPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> }
>
> static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..e9b1da04938a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
> * @ptl: Lock for the page table.
> * @__page_type: Same as page->page_type. Unused for page tables.
> * @__page_refcount: Same as page refcount.
> - * @pt_memcg_data: Memcg data. Tracked for page tables here.
> + * @pt_memcg: Memcg that this page table belongs to.
> *
> * This struct overlays struct page for now. Do not modify without a good
> * understanding of the issues.
> @@ -602,7 +602,7 @@ struct ptdesc {
> unsigned int __page_type;
> atomic_t __page_refcount;
> #ifdef CONFIG_MEMCG
> - unsigned long pt_memcg_data;
> + struct mem_cgroup *pt_memcg;
This is kernel memory, so this would be struct obj_cgroup * instead of struct
mem_cgroup pointer. We will need something similar to __folio_objcg(), maybe
__ptdesc_objcg() and then call obj_cgroup_memcg() on it. Basically how
folio_memcg() handles the kernel memory.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] memcg: Add memcg_stat_mod()
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
@ 2026-02-25 19:22 ` Johannes Weiner
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2026-02-25 19:22 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Michal Hocko, Roman Gushchin, Shakeel Butt, cgroups, linux-mm
On Wed, Feb 25, 2026 at 04:22:15PM +0000, Matthew Wilcox (Oracle) wrote:
> This function lets the caller find the memcg somewhere other than
> page->memcg_data.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> @@ -787,24 +787,27 @@ void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
> mod_memcg_lruvec_state(lruvec, idx, val);
> }
>
> +void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
> + enum node_stat_item idx, long val)
> +{
> + /* Untracked pages have no memcg, no lruvec. Update only the node */
> + if (!memcg) {
> + mod_node_page_state(pgdat, idx, val);
> + } else {
> + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
> + mod_lruvec_state(lruvec, idx, val);
> + }
> +}
The refactor (and the one in the next patch) looks good to me.
But we already have a mod_memcg_state(), which genuinely just updates
the memcg counters, and memcg_stat_mod() makes it a bit non-obvious
that this is a "core" stat accounting function (that happens to do
memcg when compiled in).
Can we go with this instead?
void mod_node_memcg_state(pg_data_t *, struct mem_cgroup *, ...)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2026-02-25 16:55 ` Shakeel Butt
@ 2026-02-25 20:57 ` Matthew Wilcox
2026-02-25 21:48 ` Axel Rasmussen
2 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2026-02-25 20:57 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
cgroups, linux-mm
Cc: Axel Rasmussen
On Wed, Feb 25, 2026 at 04:22:17PM +0000, Matthew Wilcox (Oracle) wrote:
> static inline void __pagetable_ctor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> __SetPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> }
It occurs to me that we're not holding the rcu_read_lock() here
(whereas we do for the other two callers). I'm not quite clear
on what the rcu read lock is protecting here -- can it be that the
memcg is rcu-freed while a page table belongs to it? Or does the task
existing prevent the memcg from being freed?
(is there documentation on this that I've been unable to find?)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
2026-02-25 16:55 ` Shakeel Butt
@ 2026-02-25 21:01 ` Matthew Wilcox
0 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2026-02-25 21:01 UTC (permalink / raw)
To: Shakeel Butt
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, cgroups, linux-mm,
Axel Rasmussen
On Wed, Feb 25, 2026 at 08:55:54AM -0800, Shakeel Butt wrote:
> > #ifdef CONFIG_MEMCG
> > - unsigned long pt_memcg_data;
> > + struct mem_cgroup *pt_memcg;
>
> This is kernel memory, so this would be struct obj_cgroup * instead of struct
> mem_cgroup pointer. We will need something similar to __folio_objcg(), maybe
> __ptdesc_objcg() and then call obj_cgroup_memcg() on it. Basically how
> folio_memcg() handles the kernel memory.
Why would we want to do that instead of just stashing a pointer to the
memcg in the ptdesc? I feel very stupid about the differences between
all of these things and would dearly love to read some documentation to
learn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2026-02-25 16:55 ` Shakeel Butt
2026-02-25 20:57 ` Matthew Wilcox
@ 2026-02-25 21:48 ` Axel Rasmussen
2 siblings, 0 replies; 9+ messages in thread
From: Axel Rasmussen @ 2026-02-25 21:48 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
cgroups, linux-mm
On Wed, Feb 25, 2026 at 8:23 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Commit f0c92726e89f removed the accounting of page tables to memcgs.
> Reintroduce it.
>
> Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
> Reported-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> include/linux/mm.h | 15 +++++++++++++--
> include/linux/mm_types.h | 6 +++---
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..34bc6f00ed7b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
> return compound_nr(ptdesc_page(ptdesc));
> }
>
> +static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
> +{
> +#ifdef CONFIG_MEMCG
> + return ptdesc->pt_memcg;
I think this is buggy and we need to decode the "real" pointer from memcg_data?
I applied this series (cleanly) on top of torvalds/master
(7dff99b354601dd01829e1511711846e04340a69) and when I boot I get:
[ 3.315420] BUG: kernel NULL pointer dereference, address: 00000000000004e8
[ 3.316955] #PF: supervisor read access in kernel mode
[ 3.318100] #PF: error_code(0x0000) - not-present page
[ 3.319302] PGD 0 P4D 0
[ 3.319877] Oops: Oops: 0000 [#1] SMP NOPTI
[ 3.320829] CPU: 2 UID: 0 PID: 157 Comm: systemd Not tainted
7.0.0-smp-DEV #2 PREEMPTLAZY
[ 3.322665] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[ 3.324772] RIP: 0010:memcg_stat_mod+0x2c/0x90
[ 3.325784] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[ 3.329919] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[ 3.331110] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[ 3.332718] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[ 3.334322] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[ 3.335917] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[ 3.337522] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[ 3.339118] FS: 00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[ 3.340915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.342208] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[ 3.343804] Call Trace:
[ 3.344383] <TASK>
[ 3.344872] pgd_alloc+0x5d/0x1d0
[ 3.345643] mm_init+0x1df/0x3b0
[ 3.346395] alloc_bprm+0x10b/0x1c0
[ 3.347231] do_execveat_common+0x9b/0x300
[ 3.348162] __x64_sys_execve+0x41/0x60
[ 3.349020] do_syscall_64+0xe0/0x8a0
[ 3.349860] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 3.351009] RIP: 0033:0x7f30004f423b
[ 3.351831] Code: 0f 1e fa 48 8b 05 85 1d 10 00 48 8b 10 e9 0d 00
00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 3b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 1a 10 08
[ 3.356028] RSP: 002b:00007f2fff657e68 EFLAGS: 00000202 ORIG_RAX:
000000000000003b
[ 3.357707] RAX: ffffffffffffffda RBX: 00007ffc318c6b90 RCX: 00007f30004f423b
[ 3.359321] RDX: 00007ffc318c77d8 RSI: 00007ffc318c6e80 RDI: 00007ffc318c6e60
[ 3.360894] RBP: 00007f2fff657ff0 R08: 00007ffc318c68c0 R09: 0000000000000000
[ 3.362483] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc318c68c0
[ 3.364061] R13: 0000000000000040 R14: 0000000000000001 R15: 00007f2fff657f20
[ 3.365657] </TASK>
[ 3.366177] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci
ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p 9pnet netfs
[ 3.369780] CR2: 00000000000004e8
[ 3.370543] ---[ end trace 0000000000000000 ]---
[ 3.371578] RIP: 0010:memcg_stat_mod+0x2c/0x90
[ 3.372584] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[ 3.376675] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[ 3.377838] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[ 3.379437] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[ 3.380994] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[ 3.382586] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[ 3.384188] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[ 3.385761] FS: 00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[ 3.387554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.388836] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[ 3.390449] Kernel panic - not syncing: Fatal exception
[ 3.391806] Kernel Offset: 0x16200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3.394178] Rebooting in 10 seconds..
> +#else
> + return NULL;
> +#endif
> +}
> +
> static inline void __pagetable_ctor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> __SetPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> }
>
> static inline void pagetable_dtor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> ptlock_free(ptdesc);
> __ClearPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
Re: the RCU read lock discussion, I spotted that too. I'm also not
100% clear on whether or not it's required. folio_memcg says:
"For a kmem folio a caller should hold an rcu read lock to protect
memcg associated with a kmem folio from being released."
But on the other hand get_mem_cgroup_from_folio seems to think it's
fine to unconditionally call folio_memcg without an RCU read lock, it
seems to think we only need one whilst acquiring a reference, and once
we have that we can unlock. (Not that that helps us greatly, I don't
think we want ptdecs to hold a reference for their entire lifetime.)
> }
>
> static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..e9b1da04938a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
> * @ptl: Lock for the page table.
> * @__page_type: Same as page->page_type. Unused for page tables.
> * @__page_refcount: Same as page refcount.
> - * @pt_memcg_data: Memcg data. Tracked for page tables here.
> + * @pt_memcg: Memcg that this page table belongs to.
> *
> * This struct overlays struct page for now. Do not modify without a good
> * understanding of the issues.
> @@ -602,7 +602,7 @@ struct ptdesc {
> unsigned int __page_type;
> atomic_t __page_refcount;
> #ifdef CONFIG_MEMCG
> - unsigned long pt_memcg_data;
> + struct mem_cgroup *pt_memcg;
> #endif
> };
>
> @@ -617,7 +617,7 @@ TABLE_MATCH(rcu_head, pt_rcu_head);
> TABLE_MATCH(page_type, __page_type);
> TABLE_MATCH(_refcount, __page_refcount);
> #ifdef CONFIG_MEMCG
> -TABLE_MATCH(memcg_data, pt_memcg_data);
> +TABLE_MATCH(memcg_data, pt_memcg);
> #endif
> #undef TABLE_MATCH
> static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-25 21:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
2026-02-25 19:22 ` Johannes Weiner
2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2026-02-25 16:55 ` Shakeel Butt
2026-02-25 21:01 ` Matthew Wilcox
2026-02-25 20:57 ` Matthew Wilcox
2026-02-25 21:48 ` Axel Rasmussen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox