Understanding profiled tagged allocator growth in a stock kernel that's just idling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Understanding profiled tagged allocator growth in a stock kernel that's just idling
@ 2025-11-17 23:16 Preble, Adam C
  2025-11-18 18:24 ` Vishal Moola (Oracle)
  0 siblings, 1 reply; 3+ messages in thread
From: Preble, Adam C @ 2025-11-17 23:16 UTC (permalink / raw)
  To: linux-mm

I had sent a note to the list a few weeks ago about a situation where my vmap_area allocations were continuing to grow and I couldn't account for anything in a kmemleak report (it was completely empty). This lead me (thanks, Lance!) to enabling CONFIG_MEM_ALLOC_PROFILING=y. I then saved /proc/allocinfo before and after my workloads. I then used a script to collect the two reports, determine what grew, and sort it by how each tagged allocation grew.

I got a bunch of suspicious things, but I couldn't trace any of it back to anything I were doing. For example, I tried to collect kstacks with bpftrace by kprobing any of these functions I could. I mostly had to get near them because a lot didn't have direct trace points. I haven't tried with manually-inserted trace points yet.

After multiple experiments, I took it back to a stock kernel (6.17, tag v6.17 commit e5f0a698b34ed76002dc5cff3804a61c80233a7a) and just let it sit idle for a few days. This still generated some growth:

    Growth    NewSize TagInfo
============================================================
 676085760  820760576 mm/slub.c:2492 func:alloc_slab_page
 591151104  672784384 mm/filemap.c:1981 func:__filemap_get_folio
 528635360  532319824 fs/ext4/super.c:1384 func:ext4_alloc_inode
 187793408  325976064 mm/readahead.c:186 func:ractl_alloc_folio
  95700288  103415040 fs/dcache.c:1690 func:__d_alloc
  29970432   63787008 mm/slub.c:2494 func:alloc_slab_page
  20316160   39485440 mm/percpu-vm.c:95 func:pcpu_alloc_pages
  15044120   16415048 fs/buffer.c:3025 func:alloc_buffer_head
   3116224    8994768 lib/xarray.c:378 func:xas_alloc
   2138208    4317472 mm/percpu.c:512 func:pcpu_mem_zalloc

I don't entirely know what I'm looking at with all this yet; kernel memory management beyond kmalloc and vmalloc wasn't really on my 2025 bingo card. I only listened to a talk about folios after I first saw this stuff and didn't know they were a thing before now. There are a few things I can still do to try to normalize my setup like taking it to a stock Linux image of some kind in QEMU, but I was hoping for some advice before I keep slashing away. If there's one quirky thing to disclose, it's that the file system is ext3.

First, is any of that information actually relevant? Like, I wonder if I'm just looking at some records that are known to not have their corresponding free operations properly correlated. Second, would any of that actually be relevant to progressive growth of vmap_area?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Understanding profiled tagged allocator growth in a stock kernel that's just idling
  2025-11-17 23:16 Understanding profiled tagged allocator growth in a stock kernel that's just idling Preble, Adam C
@ 2025-11-18 18:24 ` Vishal Moola (Oracle)
  2025-11-19 23:29   ` Preble, Adam C
  0 siblings, 1 reply; 3+ messages in thread
From: Vishal Moola (Oracle) @ 2025-11-18 18:24 UTC (permalink / raw)
  To: Preble, Adam C; +Cc: linux-mm

On Mon, Nov 17, 2025 at 11:16:51PM +0000, Preble, Adam C wrote:
> I had sent a note to the list a few weeks ago about a situation where my vmap_area allocations were continuing to grow and I couldn't account for anything in a kmemleak report (it was completely empty). This lead me (thanks, Lance!) to enabling CONFIG_MEM_ALLOC_PROFILING=y. I then saved /proc/allocinfo before and after my workloads. I then used a script to collect the two reports, determine what grew, and sort it by how each tagged allocation grew.

When making referencees, please link them. Especially in cases like this
where the subject line is different from your first post[1]. It makes it
easier to keep track of discussions :).

> I got a bunch of suspicious things, but I couldn't trace any of it back to anything I were doing. For example, I tried to collect kstacks with bpftrace by kprobing any of these functions I could. I mostly had to get near them because a lot didn't have direct trace points. I haven't tried with manually-inserted trace points yet.
> 
> After multiple experiments, I took it back to a stock kernel (6.17, tag v6.17 commit e5f0a698b34ed76002dc5cff3804a61c80233a7a) and just let it sit idle for a few days. This still generated some growth:
> 
>     Growth    NewSize TagInfo
> ============================================================
>  676085760  820760576 mm/slub.c:2492 func:alloc_slab_page
>  591151104  672784384 mm/filemap.c:1981 func:__filemap_get_folio
>  528635360  532319824 fs/ext4/super.c:1384 func:ext4_alloc_inode
>  187793408  325976064 mm/readahead.c:186 func:ractl_alloc_folio
>   95700288  103415040 fs/dcache.c:1690 func:__d_alloc
>   29970432   63787008 mm/slub.c:2494 func:alloc_slab_page
>   20316160   39485440 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>   15044120   16415048 fs/buffer.c:3025 func:alloc_buffer_head
>    3116224    8994768 lib/xarray.c:378 func:xas_alloc
>    2138208    4317472 mm/percpu.c:512 func:pcpu_mem_zalloc
> 
> I don't entirely know what I'm looking at with all this yet; kernel memory management beyond kmalloc and vmalloc wasn't really on my 2025 bingo card. I only listened to a talk about folios after I first saw this stuff and didn't know they were a thing before now. There are a few things I can still do to try to normalize my setup like taking it to a stock Linux image of some kind in QEMU, but I was hoping for some advice before I keep slashing away. If there's one quirky thing to disclose, it's that the file system is ext3.

The numbers here look pretty normal to me. There are a number of places
where we can reuse objects, so we don't proactively free them.

> First, is any of that information actually relevant? Like, I wonder if I'm just looking at some records that are known to not have their corresponding free operations properly correlated. Second, would any of that actually be relevant to progressive growth of vmap_area?

I'd recommend confirming that the external module isn't the source of
the leak first. Memory allocation profiling doesn't track statistics
once a module is unloaded, so I'd recommend inserting a 'while(1);' in
the last line of your module, then checking /proc/allocinfo.

Also, afaik vmap_area is only allocated from within mm/vmalloc.c, so if
this is a kernel-side leak, I'd start looking from there first.

[1] https://lore.kernel.org/linux-mm/PH7PR11MB6523C5200943207E879FB5CAA9F8A@PH7PR11MB6523.namprd11.prod.outlook.com/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Understanding profiled tagged allocator growth in a stock kernel that's just idling
  2025-11-18 18:24 ` Vishal Moola (Oracle)
@ 2025-11-19 23:29   ` Preble, Adam C
  0 siblings, 0 replies; 3+ messages in thread
From: Preble, Adam C @ 2025-11-19 23:29 UTC (permalink / raw)
  To: vishal.moola; +Cc: linux-mm

On Tue, 2025-11-18 at 10:24 -0800, Vishal Moola (Oracle) wrote:
> When making referencees, please link them. Especially in cases like
> this
> where the subject line is different from your first post[1]. It makes
> it
> easier to keep track of discussions :).
> 

Fair enough! My last bit about it is here [1]. I also got a better e-
mail client setup going (hopefully).

> 
> 
> The numbers here look pretty normal to me. There are a number of
> places
> where we can reuse objects, so we don't proactively free them.

Okay, I am going to assume I'm just looking at non sequiturs there.

> I'd recommend confirming that the external module isn't the source of
> the leak first. Memory allocation profiling doesn't track statistics
> once a module is unloaded, so I'd recommend inserting a 'while(1);'
> in
> the last line of your module, then checking /proc/allocinfo.
> 
> Also, afaik vmap_area is only allocated from within mm/vmalloc.c, so
> if
> this is a kernel-side leak, I'd start looking from there first.

I threw in an infinite loop right at the end and ... initially didn't
hit it. It looks like deinitialization was blocked on something and
didn't continue until I echo'd 3 to /proc/sys/vm/drop_caches. I only
got to the end after that. This is alarming for my own code, and I will
be looking into it too. However, my normal test payload with the
growing vmap_area drops the caches, so I've been getting my
measurements with deinitialization terminating.

Here's the top 20 when I look at what I captured after having all the
caches dropped:

    Growth    NewSize TagInfo
============================================================
   7176192   11071488 mm/shmem.c:1908 func:shmem_alloc_folio
    983040   19496960 mm/percpu-vm.c:95 func:pcpu_alloc_pages
    860160    1122304 arch/x86/mm/pat/set_memory.c:1239
func:split_large_page
    806912   33808384 mm/slub.c:2496 func:alloc_slab_page
    552960     806912 mm/memory.c:468 func:__pte_alloc_kernel
    164160    8224416 kernel/fork.c:311 func:alloc_thread_stack_node
     61440    1507328 kernel/dma/direct.c:142
func:__dma_direct_alloc_pages
     20480    1372160 arch/x86/mm/pgtable.c:18 func:pte_alloc_one
     12288      45056 lib/alloc_tag.c:430 func:vm_module_tags_populate
      8760     821104 lib/radix-tree.c:253 func:radix_tree_node_alloc
      8192     264704 mm/list_lru.c:410 func:memcg_init_list_lru_one
      6528      95904 mm/vmalloc.c:3176 func:__get_vm_area_node
      5760       9024 net/core/dst.c:89 func:dst_alloc
      4096     221184 arch/x86/mm/pgtable.c:314 func:_pgd_alloc
      4096       7168 drivers/tty/tty_buffer.c:180
func:tty_buffer_alloc
      3680    2297056 mm/shmem.c:5211 func:shmem_alloc_inode
      3328       3840 net/core/skbuff.c:283 func:napi_skb_cache_get
      2944      17664 kernel/fork.c:1487 func:dup_mm
      2560       2560 ipc/sem.c:517 func:sem_alloc
      2016       2800 fs/ext4/mballoc.c:5695 func:ext4_mb_pa_alloc

If vmap_area is my perpetually-growing slab allocator, then I suppose I
should look at __get_vm_area_node. It has an alloc_vmap_area call. If I
have to, I'd just make my own trace point so I could collect kstacks
from it. I assume I would store the pointers acquired from the
alloc_vmap_area call within __get_vm_area_node and then also attach to
free_vmap_area to test when they are reclaimed.

Also, thanks for the idea of just looping at the end of the deinit. I
wouldn't normally have the mind to do something do that kind of evil
despite dropping int $3's all over the place.

https://lore.kernel.org/linux-mm/PH7PR11MB65239FC7763E954D4AA828A5A9CDA@PH7PR11MB6523.namprd11.prod.outlook.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-11-19 23:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-17 23:16 Understanding profiled tagged allocator growth in a stock kernel that's just idling Preble, Adam C
2025-11-18 18:24 ` Vishal Moola (Oracle)
2025-11-19 23:29   ` Preble, Adam C

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox