From: Axel Rasmussen <axelrasmussen@google.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
cgroups@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
Date: Wed, 25 Feb 2026 13:48:32 -0800 [thread overview]
Message-ID: <CAJHvVcgoCE_LSfQFk4W6mtFLeUrdc4JuvJr=5vv4mCsV5YFepw@mail.gmail.com> (raw)
In-Reply-To: <20260225162319.315281-4-willy@infradead.org>
On Wed, Feb 25, 2026 at 8:23 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Commit f0c92726e89f removed the accounting of page tables to memcgs.
> Reintroduce it.
>
> Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
> Reported-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> include/linux/mm.h | 15 +++++++++++++--
> include/linux/mm_types.h | 6 +++---
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..34bc6f00ed7b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
> return compound_nr(ptdesc_page(ptdesc));
> }
>
> +static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
> +{
> +#ifdef CONFIG_MEMCG
> + return ptdesc->pt_memcg;
I think this is buggy and we need to decode the "real" pointer from memcg_data?
I applied this series (cleanly) on top of torvalds/master
(7dff99b354601dd01829e1511711846e04340a69) and when I boot I get:
[ 3.315420] BUG: kernel NULL pointer dereference, address: 00000000000004e8
[ 3.316955] #PF: supervisor read access in kernel mode
[ 3.318100] #PF: error_code(0x0000) - not-present page
[ 3.319302] PGD 0 P4D 0
[ 3.319877] Oops: Oops: 0000 [#1] SMP NOPTI
[ 3.320829] CPU: 2 UID: 0 PID: 157 Comm: systemd Not tainted
7.0.0-smp-DEV #2 PREEMPTLAZY
[ 3.322665] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[ 3.324772] RIP: 0010:memcg_stat_mod+0x2c/0x90
[ 3.325784] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[ 3.329919] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[ 3.331110] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[ 3.332718] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[ 3.334322] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[ 3.335917] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[ 3.337522] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[ 3.339118] FS: 00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[ 3.340915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.342208] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[ 3.343804] Call Trace:
[ 3.344383] <TASK>
[ 3.344872] pgd_alloc+0x5d/0x1d0
[ 3.345643] mm_init+0x1df/0x3b0
[ 3.346395] alloc_bprm+0x10b/0x1c0
[ 3.347231] do_execveat_common+0x9b/0x300
[ 3.348162] __x64_sys_execve+0x41/0x60
[ 3.349020] do_syscall_64+0xe0/0x8a0
[ 3.349860] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 3.351009] RIP: 0033:0x7f30004f423b
[ 3.351831] Code: 0f 1e fa 48 8b 05 85 1d 10 00 48 8b 10 e9 0d 00
00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 3b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 1a 10 08
[ 3.356028] RSP: 002b:00007f2fff657e68 EFLAGS: 00000202 ORIG_RAX:
000000000000003b
[ 3.357707] RAX: ffffffffffffffda RBX: 00007ffc318c6b90 RCX: 00007f30004f423b
[ 3.359321] RDX: 00007ffc318c77d8 RSI: 00007ffc318c6e80 RDI: 00007ffc318c6e60
[ 3.360894] RBP: 00007f2fff657ff0 R08: 00007ffc318c68c0 R09: 0000000000000000
[ 3.362483] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc318c68c0
[ 3.364061] R13: 0000000000000040 R14: 0000000000000001 R15: 00007f2fff657f20
[ 3.365657] </TASK>
[ 3.366177] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci
ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p 9pnet netfs
[ 3.369780] CR2: 00000000000004e8
[ 3.370543] ---[ end trace 0000000000000000 ]---
[ 3.371578] RIP: 0010:memcg_stat_mod+0x2c/0x90
[ 3.372584] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[ 3.376675] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[ 3.377838] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[ 3.379437] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[ 3.380994] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[ 3.382586] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[ 3.384188] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[ 3.385761] FS: 00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[ 3.387554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.388836] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[ 3.390449] Kernel panic - not syncing: Fatal exception
[ 3.391806] Kernel Offset: 0x16200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3.394178] Rebooting in 10 seconds..
> +#else
> + return NULL;
> +#endif
> +}
> +
> static inline void __pagetable_ctor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> __SetPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> }
>
> static inline void pagetable_dtor(struct ptdesc *ptdesc)
> {
> pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> + struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
> ptlock_free(ptdesc);
> __ClearPageTable(ptdesc_page(ptdesc));
> - mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> + memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
Re: the RCU read lock discussion, I spotted that too. I'm also not
100% clear on whether or not it's required. folio_memcg says:
"For a kmem folio a caller should hold an rcu read lock to protect
memcg associated with a kmem folio from being released."
But on the other hand get_mem_cgroup_from_folio seems to think it's
fine to unconditionally call folio_memcg without an RCU read lock, it
seems to think we only need one whilst acquiring a reference, and once
we have that we can unlock. (Not that that helps us greatly, I don't
think we want ptdecs to hold a reference for their entire lifetime.)
> }
>
> static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..e9b1da04938a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
> * @ptl: Lock for the page table.
> * @__page_type: Same as page->page_type. Unused for page tables.
> * @__page_refcount: Same as page refcount.
> - * @pt_memcg_data: Memcg data. Tracked for page tables here.
> + * @pt_memcg: Memcg that this page table belongs to.
> *
> * This struct overlays struct page for now. Do not modify without a good
> * understanding of the issues.
> @@ -602,7 +602,7 @@ struct ptdesc {
> unsigned int __page_type;
> atomic_t __page_refcount;
> #ifdef CONFIG_MEMCG
> - unsigned long pt_memcg_data;
> + struct mem_cgroup *pt_memcg;
> #endif
> };
>
> @@ -617,7 +617,7 @@ TABLE_MATCH(rcu_head, pt_rcu_head);
> TABLE_MATCH(page_type, __page_type);
> TABLE_MATCH(_refcount, __page_refcount);
> #ifdef CONFIG_MEMCG
> -TABLE_MATCH(memcg_data, pt_memcg_data);
> +TABLE_MATCH(memcg_data, pt_memcg);
> #endif
> #undef TABLE_MATCH
> static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> --
> 2.47.3
>
prev parent reply other threads:[~2026-02-25 21:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
2026-02-25 19:22 ` Johannes Weiner
2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2026-02-25 16:55 ` Shakeel Butt
2026-02-25 21:01 ` Matthew Wilcox
2026-02-25 20:57 ` Matthew Wilcox
2026-02-25 21:48 ` Axel Rasmussen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJHvVcgoCE_LSfQFk4W6mtFLeUrdc4JuvJr=5vv4mCsV5YFepw@mail.gmail.com' \
--to=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox