linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Axel Rasmussen <axelrasmussen@google.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 cgroups@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 3/3] ptdesc: Account page tables to memcgs again
Date: Wed, 25 Feb 2026 13:48:32 -0800	[thread overview]
Message-ID: <CAJHvVcgoCE_LSfQFk4W6mtFLeUrdc4JuvJr=5vv4mCsV5YFepw@mail.gmail.com> (raw)
In-Reply-To: <20260225162319.315281-4-willy@infradead.org>

On Wed, Feb 25, 2026 at 8:23 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Commit f0c92726e89f removed the accounting of page tables to memcgs.
> Reintroduce it.
>
> Fixes: f0c92726e89f (ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor())
> Reported-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h       | 15 +++++++++++++--
>  include/linux/mm_types.h |  6 +++---
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5be3d8a8f806..34bc6f00ed7b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3519,21 +3519,32 @@ static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
>         return compound_nr(ptdesc_page(ptdesc));
>  }
>
> +static inline struct mem_cgroup *pagetable_memcg(const struct ptdesc *ptdesc)
> +{
> +#ifdef CONFIG_MEMCG
> +       return ptdesc->pt_memcg;

I think this is buggy and we need to decode the "real" pointer from memcg_data?

I applied this series (cleanly) on top of torvalds/master
(7dff99b354601dd01829e1511711846e04340a69) and when I boot I get:

[    3.315420] BUG: kernel NULL pointer dereference, address: 00000000000004e8
[    3.316955] #PF: supervisor read access in kernel mode
[    3.318100] #PF: error_code(0x0000) - not-present page
[    3.319302] PGD 0 P4D 0
[    3.319877] Oops: Oops: 0000 [#1] SMP NOPTI
[    3.320829] CPU: 2 UID: 0 PID: 157 Comm: systemd Not tainted
7.0.0-smp-DEV #2 PREEMPTLAZY
[    3.322665] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[    3.324772] RIP: 0010:memcg_stat_mod+0x2c/0x90
[    3.325784] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[    3.329919] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[    3.331110] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[    3.332718] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[    3.334322] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[    3.335917] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[    3.337522] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[    3.339118] FS:  00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[    3.340915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.342208] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[    3.343804] Call Trace:
[    3.344383]  <TASK>
[    3.344872]  pgd_alloc+0x5d/0x1d0
[    3.345643]  mm_init+0x1df/0x3b0
[    3.346395]  alloc_bprm+0x10b/0x1c0
[    3.347231]  do_execveat_common+0x9b/0x300
[    3.348162]  __x64_sys_execve+0x41/0x60
[    3.349020]  do_syscall_64+0xe0/0x8a0
[    3.349860]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    3.351009] RIP: 0033:0x7f30004f423b
[    3.351831] Code: 0f 1e fa 48 8b 05 85 1d 10 00 48 8b 10 e9 0d 00
00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 3b 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c5 1a 10 08
[    3.356028] RSP: 002b:00007f2fff657e68 EFLAGS: 00000202 ORIG_RAX:
000000000000003b
[    3.357707] RAX: ffffffffffffffda RBX: 00007ffc318c6b90 RCX: 00007f30004f423b
[    3.359321] RDX: 00007ffc318c77d8 RSI: 00007ffc318c6e80 RDI: 00007ffc318c6e60
[    3.360894] RBP: 00007f2fff657ff0 R08: 00007ffc318c68c0 R09: 0000000000000000
[    3.362483] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffc318c68c0
[    3.364061] R13: 0000000000000040 R14: 0000000000000001 R15: 00007f2fff657f20
[    3.365657]  </TASK>
[    3.366177] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_blk virtio_balloon uhci_hcd ohci_pci
ohci_hcd evdev ehci_pci ehci_hcd 9pnet_virtio 9p 9pnet netfs
[    3.369780] CR2: 00000000000004e8
[    3.370543] ---[ end trace 0000000000000000 ]---
[    3.371578] RIP: 0010:memcg_stat_mod+0x2c/0x90
[    3.372584] Code: 40 d6 0f 1f 44 00 00 55 41 56 53 48 89 cb 89 d5
48 85 ff 74 3d 66 90 48 63 86 c0 19 00 00 4c 8b b4 c7 90 08 00 00 49
83 c6 48 <49> 8b be a0 04 00 00 48 39 f7 75 2d 48 63 d3 8f
[    3.376675] RSP: 0018:ffff9b62c0817de0 EFLAGS: 00010206
[    3.377838] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[    3.379437] RDX: 0000000000000025 RSI: ffff98d33fffdcc0 RDI: ffff98cc08b8d142
[    3.380994] RBP: 0000000000000025 R08: 0000000000007fff R09: ffffffff99079980
[    3.382586] R10: 0000000000017ffd R11: 00000000ffff7fff R12: ffff98cc0310c138
[    3.384188] R13: 00007ffc318c77d8 R14: 0000000000000048 R15: ffff98cc009e2280
[    3.385761] FS:  00007f2fffd3d400(0000) GS:ffff98d385556000(0000)
knlGS:0000000000000000
[    3.387554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.388836] CR2: 00000000000004e8 CR3: 00000001089ad000 CR4: 0000000000350ef0
[    3.390449] Kernel panic - not syncing: Fatal exception
[    3.391806] Kernel Offset: 0x16200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    3.394178] Rebooting in 10 seconds..

> +#else
> +       return NULL;
> +#endif
> +}
> +
>  static inline void __pagetable_ctor(struct ptdesc *ptdesc)
>  {
>         pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +       struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
>         __SetPageTable(ptdesc_page(ptdesc));
> -       mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
> +       memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
>  }
>
>  static inline void pagetable_dtor(struct ptdesc *ptdesc)
>  {
>         pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> +       struct mem_cgroup *memcg = pagetable_memcg(ptdesc);
>
>         ptlock_free(ptdesc);
>         __ClearPageTable(ptdesc_page(ptdesc));
> -       mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
> +       memcg_stat_mod(memcg, pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));

Re: the RCU read lock discussion, I spotted that too. I'm also not
100% clear on whether or not it's required. folio_memcg says:

"For a kmem folio a caller should hold an rcu read lock to protect
memcg associated with a kmem folio from being released."

But on the other hand get_mem_cgroup_from_folio seems to think it's
fine to unconditionally call folio_memcg without an RCU read lock, it
seems to think we only need one whilst acquiring a reference, and once
we have that we can unlock. (Not that that helps us greatly, I don't
think we want ptdecs to hold a reference for their entire lifetime.)


>  }
>
>  static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..e9b1da04938a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -564,7 +564,7 @@ FOLIO_MATCH(compound_head, _head_3);
>   * @ptl:              Lock for the page table.
>   * @__page_type:      Same as page->page_type. Unused for page tables.
>   * @__page_refcount:  Same as page refcount.
> - * @pt_memcg_data:    Memcg data. Tracked for page tables here.
> + * @pt_memcg:         Memcg that this page table belongs to.
>   *
>   * This struct overlays struct page for now. Do not modify without a good
>   * understanding of the issues.
> @@ -602,7 +602,7 @@ struct ptdesc {
>         unsigned int __page_type;
>         atomic_t __page_refcount;
>  #ifdef CONFIG_MEMCG
> -       unsigned long pt_memcg_data;
> +       struct mem_cgroup *pt_memcg;


>  #endif
>  };
>
> @@ -617,7 +617,7 @@ TABLE_MATCH(rcu_head, pt_rcu_head);
>  TABLE_MATCH(page_type, __page_type);
>  TABLE_MATCH(_refcount, __page_refcount);
>  #ifdef CONFIG_MEMCG
> -TABLE_MATCH(memcg_data, pt_memcg_data);
> +TABLE_MATCH(memcg_data, pt_memcg);
>  #endif
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> --
> 2.47.3
>


      parent reply	other threads:[~2026-02-25 21:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-25 16:22 [PATCH 0/3] Make memcg location more flexible Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 1/3] memcg: Add memcg_stat_mod() Matthew Wilcox (Oracle)
2026-02-25 19:22   ` Johannes Weiner
2026-02-25 16:22 ` [PATCH 2/3] memcg: Simplify mod_lruvec_kmem_state() Matthew Wilcox (Oracle)
2026-02-25 16:22 ` [PATCH 3/3] ptdesc: Account page tables to memcgs again Matthew Wilcox (Oracle)
2026-02-25 16:55   ` Shakeel Butt
2026-02-25 21:01     ` Matthew Wilcox
2026-02-25 20:57   ` Matthew Wilcox
2026-02-25 21:48   ` Axel Rasmussen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJHvVcgoCE_LSfQFk4W6mtFLeUrdc4JuvJr=5vv4mCsV5YFepw@mail.gmail.com' \
    --to=axelrasmussen@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox