linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Make memcg location more flexible
@ 2026-02-25 16:22 Matthew Wilcox (Oracle)
  2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	cgroups, linux-mm
  Cc: Matthew Wilcox (Oracle)

Different memdescs should have the flexibility to place their memcg
wherever they need to.  That means that instead of indirecting through
lruvec_stat_mod_folio() and extracting the memcg from the folio,
we need an interface which takes the memcg as a parameter.  It turns
out we already need to do that for slabs, and this memcg_stat_mod()
interface also works for that use case.

Matthew Wilcox (Oracle) (3):
  memcg: Add memcg_stat_mod()
  memcg: Simplify mod_lruvec_kmem_state()
  ptdesc: Account page tables to memcgs again

 include/linux/mm.h       | 15 +++++++++++++--
 include/linux/mm_types.h |  6 +++---
 include/linux/vmstat.h   |  9 ++++++++-
 mm/memcontrol.c          | 40 ++++++++++++++--------------------------
 4 files changed, 38 insertions(+), 32 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] memcg: Add memcg_stat_mod()
  2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
@ 2026-02-25 16:22 ` Matthew Wilcox (Oracle)
  2026-02-25 19:22   ` Johannes Weiner
  2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
  2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
  2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	cgroups, linux-mm
  Cc: Matthew Wilcox (Oracle)

This function lets the caller find the memcg somewhere other than
page->memcg_data.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/vmstat.h |  9 ++++++++-
 mm/memcontrol.c        | 23 +++++++++++++----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 3c9c266cf782..0da38ea25c97 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -518,7 +518,8 @@ static inline const char *vm_event_name(enum vm_event_item item)
 
 void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val);
-
+void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
+		enum node_stat_item idx, long val);
 void lruvec_stat_mod_folio(struct folio *folio,
 			     enum node_stat_item idx, int val);
 
@@ -536,6 +537,12 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
 	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
 }
 
+static inline void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
+		enum node_stat_item idx, long val)
+{
+	mod_node_page_state(pgdat, idx, val);
+}
+
 static inline void lruvec_stat_mod_folio(struct folio *folio,
 					 enum node_stat_item idx, int val)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a52da3a5e4fd..b356ef312bc2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -787,24 +787,27 @@ void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 		mod_memcg_lruvec_state(lruvec, idx, val);
 }
 
+void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
+		enum node_stat_item idx, long val)
+{
+	/* Untracked pages have no memcg, no lruvec. Update only the node */
+	if (!memcg) {
+		mod_node_page_state(pgdat, idx, val);
+	} else {
+		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+		mod_lruvec_state(lruvec, idx, val);
+	}
+}
+
 void lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 			     int val)
 {
 	struct mem_cgroup *memcg;
 	pg_data_t *pgdat = folio_pgdat(folio);
-	struct lruvec *lruvec;
 
 	rcu_read_lock();
 	memcg = folio_memcg(folio);
-	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg) {
-		rcu_read_unlock();
-		mod_node_page_state(pgdat, idx, val);
-		return;
-	}
-
-	lruvec = mem_cgroup_lruvec(memcg, pgdat);
-	mod_lruvec_state(lruvec, idx, val);
+	memcg_stat_mod(memcg, pgdat, idx, val);
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(lruvec_stat_mod_folio);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state()
  2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
  2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
@ 2026-02-25 16:22 ` Matthew Wilcox (Oracle)
  2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
  2 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	cgroups, linux-mm
  Cc: Matthew Wilcox (Oracle)

Use the new memcg_stat_mod() which does exactly what
mod_lruvec_kmem_state() needs.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memcontrol.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b356ef312bc2..8d9e4a42aecf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -815,24 +815,9 @@ EXPORT_SYMBOL(lruvec_stat_mod_folio);
 void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 {
 	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
-	struct mem_cgroup *memcg;
-	struct lruvec *lruvec;
 
 	rcu_read_lock();
-	memcg = mem_cgroup_from_virt(p);
-
-	/*
-	 * Untracked pages have no memcg, no lruvec. Update only the
-	 * node. If we reparent the slab objects to the root memcg,
-	 * when we free the slab object, we need to update the per-memcg
-	 * vmstats to keep it correct for the root memcg.
-	 */
-	if (!memcg) {
-		mod_node_page_state(pgdat, idx, val);
-	} else {
-		lruvec = mem_cgroup_lruvec(memcg, pgdat);
-		mod_lruvec_state(lruvec, idx, val);
-	}
+	memcg_stat_mod(mem_cgroup_from_virt(p), pgdat, idx, val);
 	rcu_read_unlock();
 }
 
-- 
2.47.3



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] ptdesc: Account page tables to memcgs again
  2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
  2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
  2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
@ 2026-02-25 16:22 ` Matthew Wilcox (Oracle)
  2026-02-25 16:55   ` Shakeel Butt
                     ` (2 more replies)
  2 siblings, 3 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2026-02-25 16:22 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	cgroups, linux-mm
  Cc: Matthew Wilcox (Oracle), Axel Rasmussen

Commit f0c92726e89f removed the accounting of page tables to memcgs.
Reintroduce it.

Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
Reported-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h       | 15 +++++++++++++--
 include/linux/mm_types.h |  6 +++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5be3d8a8f806..34bc6f00ed7b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
 	return compound_nr(ptdesc_page(ptdesc));
 }
 
+static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
+{
+#ifdef CONFIG_MEMCG
+	return ptdesc->pt_memcg;
+#else
+	return NULL;
+#endif
+}
+
 static inline void __pagetable_ctor(struct ptdesc *ptdesc)
 {
 	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
+	struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
 
 	__SetPageTable(ptdesc_page(ptdesc));
-	mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
+	memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
 }
 
 static inline void pagetable_dtor(struct ptdesc *ptdesc)
 {
 	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
+	struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
 
 	ptlock_free(ptdesc);
 	__ClearPageTable(ptdesc_page(ptdesc));
-	mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
+	memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
 }
 
 static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cc8ae722886..e9b1da04938a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
  * @ptl:              Lock for the page table.
  * @__page_type:      Same as page->page_type. Unused for page tables.
  * @__page_refcount:  Same as page refcount.
- * @pt_memcg_data:    Memcg data. Tracked for page tables here.
+ * @pt_memcg:         Memcg that this page table belongs to.
  *
  * This struct overlays struct page for now. Do not modify without a good
  * understanding of the issues.
@@ -602,7 +602,7 @@ struct ptdesc {
 	unsigned int __page_type;
 	atomic_t __page_refcount;
 #ifdef CONFIG_MEMCG
-	unsigned long pt_memcg_data;
+	struct mem_cgroup *pt_memcg;
 #endif
 };
 
@@ -617,7 +617,7 @@ TABLE_MATCH(rcu_head, pt_rcu_head);
 TABLE_MATCH(page_type, __page_type);
 TABLE_MATCH(_refcount, __page_refcount);
 #ifdef CONFIG_MEMCG
-TABLE_MATCH(memcg_data, pt_memcg_data);
+TABLE_MATCH(memcg_data, pt_memcg);
 #endif
 #undef TABLE_MATCH
 static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
-- 
2.47.3



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
  2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
@ 2026-02-25 16:55   ` Shakeel Butt
  2026-02-25 21:01     ` Matthew Wilcox
  2026-02-25 20:57   ` Matthew Wilcox
  2026-02-25 21:48   ` Axel Rasmussen
  2 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2026-02-25 16:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, cgroups, linux-mm,
	Axel Rasmussen

On Wed, Feb 25, 2026 at 04:22:17PM +0000, Matthew Wilcox (Oracle) wrote:
> Commit f0c92726e89f removed the accounting of page tables to memcgs.
> Reintroduce it.
> 
> Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
> Reported-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h       | 15 +++++++++++++--
>  include/linux/mm_types.h |  6 +++---
>  2 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..34bc6f00ed7b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
>  	return compound_nr(ptdesc_page(ptdesc));
>  }
>  
> +static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
> +{
> +#ifdef CONFIG_MEMCG
> +	return ptdesc->pt_memcg;
> +#else
> +	return NULL;
> +#endif
> +}
> +
>  static inline void __pagetable_ctor(struct ptdesc *ptdesc)
>  {
>  	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +	struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>  
>  	__SetPageTable(ptdesc_page(ptdesc));
> -	mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> +	memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
>  }
>  
>  static inline void pagetable_dtor(struct ptdesc *ptdesc)
>  {
>  	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +	struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>  
>  	ptlock_free(ptdesc);
>  	__ClearPageTable(ptdesc_page(ptdesc));
> -	mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> +	memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
>  }
>  
>  static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..e9b1da04938a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
>   * @ptl:              Lock for the page table.
>   * @__page_type:      Same as page->page_type. Unused for page tables.
>   * @__page_refcount:  Same as page refcount.
> - * @pt_memcg_data:    Memcg data. Tracked for page tables here.
> + * @pt_memcg:         Memcg that this page table belongs to.
>   *
>   * This struct overlays struct page for now. Do not modify without a good
>   * understanding of the issues.
> @@ -602,7 +602,7 @@ struct ptdesc {
>  	unsigned int __page_type;
>  	atomic_t __page_refcount;
>  #ifdef CONFIG_MEMCG
> -	unsigned long pt_memcg_data;
> +	struct mem_cgroup *pt_memcg;

This is kernel memory, so this would be struct obj_cgroup * instead of struct
mem_cgroup pointer. We will need something similar to __folio_objcg(), maybe
__ptdesc_objcg() and then call obj_cgroup_memcg() on it. Basically how
folio_memcg() handles the kernel memory.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] memcg: Add memcg_stat_mod()
  2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
@ 2026-02-25 19:22   ` Johannes Weiner
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2026-02-25 19:22 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Michal Hocko, Roman Gushchin, Shakeel Butt, cgroups, linux-mm

On Wed, Feb 25, 2026 at 04:22:15PM +0000, Matthew Wilcox (Oracle) wrote:
> This function lets the caller find the memcg somewhere other than
> page->memcg_data.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

> @@ -787,24 +787,27 @@ void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  		mod_memcg_lruvec_state(lruvec, idx, val);
>  }
>  
> +void memcg_stat_mod(struct mem_cgroup *memcg, pg_data_t *pgdat,
> +		enum node_stat_item idx, long val)
> +{
> +	/* Untracked pages have no memcg, no lruvec. Update only the node */
> +	if (!memcg) {
> +		mod_node_page_state(pgdat, idx, val);
> +	} else {
> +		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
> +		mod_lruvec_state(lruvec, idx, val);
> +	}
> +}

The refactor (and the one in the next patch) looks good to me.

But we already have a mod_memcg_state(), which genuinely just updates
the memcg counters, and memcg_stat_mod() makes it a bit non-obvious
that this is a "core" stat accounting function (that happens to do
memcg when compiled in).

Can we go with this instead?

	void mod_node_memcg_state(pg_data_t *, struct mem_cgroup *, ...)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
  2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
  2026-02-25 16:55   ` Shakeel Butt
@ 2026-02-25 20:57   ` Matthew Wilcox
  2026-02-25 21:48   ` Axel Rasmussen
  2 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2026-02-25 20:57 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	cgroups, linux-mm
  Cc: Axel Rasmussen

On Wed, Feb 25, 2026 at 04:22:17PM +0000, Matthew Wilcox (Oracle) wrote:
>  static inline void __pagetable_ctor(struct ptdesc *ptdesc)
>  {
>  	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +	struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>  
>  	__SetPageTable(ptdesc_page(ptdesc));
> -	mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> +	memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
>  }

It occurs to me that we're not holding the rcu_read_lock() here
(whereas we do for the other two callers).  I'm not quite clear
on what the rcu read lock is protecting here -- can it be that the
memcg is rcu-freed while a page table belongs to it?  Or does the task
existing prevent the memcg from being freed?

(is there documentation on this that I've been unable to find?)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
  2026-02-25 16:55   ` Shakeel Butt
@ 2026-02-25 21:01     ` Matthew Wilcox
  0 siblings, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2026-02-25 21:01 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, cgroups, linux-mm,
	Axel Rasmussen

On Wed, Feb 25, 2026 at 08:55:54AM -0800, Shakeel Butt wrote:
> >  #ifdef CONFIG_MEMCG
> > -	unsigned long pt_memcg_data;
> > +	struct mem_cgroup *pt_memcg;
> 
> This is kernel memory, so this would be struct obj_cgroup * instead of struct
> mem_cgroup pointer. We will need something similar to __folio_objcg(), maybe
> __ptdesc_objcg() and then call obj_cgroup_memcg() on it. Basically how
> folio_memcg() handles the kernel memory.

Why would we want to do that instead of just stashing a pointer to the
memcg in the ptdesc?  I feel very stupid about the differences between
all of these things and would dearly love to read some documentation to
learn.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
  2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
  2026-02-25 16:55   ` Shakeel Butt
  2026-02-25 20:57   ` Matthew Wilcox
@ 2026-02-25 21:48   ` Axel Rasmussen
  2 siblings, 0 replies; 9+ messages in thread
From: Axel Rasmussen @ 2026-02-25 21:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	cgroups, linux-mm

On Wed, Feb 25, 2026 at 8:23 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Commit f0c92726e89f removed the accounting of page tables to memcgs.
> Reintroduce it.
>
> Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
> Reported-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h       | 15 +++++++++++++--
>  include/linux/mm_types.h |  6 +++---
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..34bc6f00ed7b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
>         return compound_nr(ptdesc_page(ptdesc));
>  }
>
> +static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
> +{
> +#ifdef CONFIG_MEMCG
> +       return ptdesc->pt_memcg;

I think this is buggy and we need to decode the "real" pointer from memcg_data?

I applied this series (cleanly) on top of torvalds/master
(7dff99b354601dd01829e1511711846e04340a69) and when I boot I get:

[    3.315420] BUG: kernel NULL pointer dereference, address: 00000000000004e8
[    3.316955] #PF: supervisor read access in kernel mode
[    3.318100] #PF: error_code(0x0000) - not-present page
[    3.319302] PGD 0 P4D 0
[    3.319877] Oops: Oops: 0000 [#1] SMP NOPTI
[    3.320829] CPU: 2 UID: 0 PID: 157 Comm: systemd Not tainted
7.0.0-smp-DEV #2 PREEMPTLAZY
[    3.322665] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[    3.324772] RIP: 0010:memcg_stat_mod+0x2c/0x90
[    3.325784] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[    3.329919] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[    3.331110] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[    3.332718] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[    3.334322] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[    3.335917] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[    3.337522] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[    3.339118] FS:  00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[    3.340915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.342208] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[    3.343804] Call Trace:
[    3.344383]  <TASK>
[    3.344872]  pgd_alloc+0x5d/0x1d0
[    3.345643]  mm_init+0x1df/0x3b0
[    3.346395]  alloc_bprm+0x10b/0x1c0
[    3.347231]  do_execveat_common+0x9b/0x300
[    3.348162]  __x64_sys_execve+0x41/0x60
[    3.349020]  do_syscall_64+0xe0/0x8a0
[    3.349860]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    3.351009] RIP: 0033:0x7f30004f423b
[    3.351831] Code: 0f 1e fa 48 8b 05 85 1d 10 00 48 8b 10 e9 0d 00
00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 3b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 1a 10 08
[    3.356028] RSP: 002b:00007f2fff657e68 EFLAGS: 00000202 ORIG_RAX:
000000000000003b
[    3.357707] RAX: ffffffffffffffda RBX: 00007ffc318c6b90 RCX: 00007f30004f423b
[    3.359321] RDX: 00007ffc318c77d8 RSI: 00007ffc318c6e80 RDI: 00007ffc318c6e60
[    3.360894] RBP: 00007f2fff657ff0 R08: 00007ffc318c68c0 R09: 0000000000000000
[    3.362483] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc318c68c0
[    3.364061] R13: 0000000000000040 R14: 0000000000000001 R15: 00007f2fff657f20
[    3.365657]  </TASK>
[    3.366177] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci
ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p 9pnet netfs
[    3.369780] CR2: 00000000000004e8
[    3.370543] ---[ end trace 0000000000000000 ]---
[    3.371578] RIP: 0010:memcg_stat_mod+0x2c/0x90
[    3.372584] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[    3.376675] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[    3.377838] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[    3.379437] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[    3.380994] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[    3.382586] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[    3.384188] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[    3.385761] FS:  00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[    3.387554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.388836] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[    3.390449] Kernel panic - not syncing: Fatal exception
[    3.391806] Kernel Offset: 0x16200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    3.394178] Rebooting in 10 seconds..

> +#else
> +       return NULL;
> +#endif
> +}
> +
>  static inline void __pagetable_ctor(struct ptdesc *ptdesc)
>  {
>         pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +       struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
>         __SetPageTable(ptdesc_page(ptdesc));
> -       mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> +       memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
>  }
>
>  static inline void pagetable_dtor(struct ptdesc *ptdesc)
>  {
>         pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +       struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
>         ptlock_free(ptdesc);
>         __ClearPageTable(ptdesc_page(ptdesc));
> -       mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> +       memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));

Re: the RCU read lock discussion, I spotted that too. I'm also not
100% clear on whether or not it's required. folio_memcg says:

"For a kmem folio a caller should hold an rcu read lock to protect
memcg associated with a kmem folio from being released."

But on the other hand get_mem_cgroup_from_folio seems to think it's
fine to unconditionally call folio_memcg without an RCU read lock, it
seems to think we only need one whilst acquiring a reference, and once
we have that we can unlock. (Not that that helps us greatly, I don't
think we want ptdecs to hold a reference for their entire lifetime.)


>  }
>
>  static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..e9b1da04938a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
>   * @ptl:              Lock for the page table.
>   * @__page_type:      Same as page->page_type. Unused for page tables.
>   * @__page_refcount:  Same as page refcount.
> - * @pt_memcg_data:    Memcg data. Tracked for page tables here.
> + * @pt_memcg:         Memcg that this page table belongs to.
>   *
>   * This struct overlays struct page for now. Do not modify without a good
>   * understanding of the issues.
> @@ -602,7 +602,7 @@ struct ptdesc {
>         unsigned int __page_type;
>         atomic_t __page_refcount;
>  #ifdef CONFIG_MEMCG
> -       unsigned long pt_memcg_data;
> +       struct mem_cgroup *pt_memcg;


>  #endif
>  };
>
> @@ -617,7 +617,7 @@ TABLE_MATCH(rcu_head, pt_rcu_head);
>  TABLE_MATCH(page_type, __page_type);
>  TABLE_MATCH(_refcount, __page_refcount);
>  #ifdef CONFIG_MEMCG
> -TABLE_MATCH(memcg_data, pt_memcg_data);
> +TABLE_MATCH(memcg_data, pt_memcg);
>  #endif
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-25 21:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
2026-02-25 19:22   ` Johannes Weiner
2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2026-02-25 16:55   ` Shakeel Butt
2026-02-25 21:01     ` Matthew Wilcox
2026-02-25 20:57   ` Matthew Wilcox
2026-02-25 21:48   ` Axel Rasmussen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox