* Memory allocation profiling warnings in memory bound systems
@ 2025-05-19 13:31 Usama Arif
2025-05-19 13:33 ` Usama Arif
0 siblings, 1 reply; 10+ messages in thread
From: Usama Arif @ 2025-05-19 13:31 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Johannes Weiner, Shakeel Butt, Linux Memory Management List,
kent.overstreet
Hi,
We have started enabling memory allocation profiling (with kernel 6.13) in our fleet
and are seeing a large number of warnings (reported by Vlad Poenaru) due to failure
in allocation of slab object extensions on services that are memory bound. I have attached
one of the logs at the end.
Does it make sense to change the slabobj_ext to be allocated via kvcalloc and also change
the WARN to WARN_ONCE (or maybe even pr_debug?) like the diff below? A large number of
prints for this in a short time may mask any real issues in the system during memory
pressure being reported in dmesg. I tried to see if there were any changes after 6.13
to this code but didn't find any, but thought will check before sending below as a patch.
diff --git a/mm/slub.c b/mm/slub.c
index c2151c9fee22..4595ca190cd9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1961,7 +1961,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
gfp |= __GFP_NO_OBJ_EXT;
- vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
+ vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
if (!vec) {
/* Mark vectors which failed to allocate */
@@ -2069,7 +2069,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
slab = virt_to_slab(p);
if (!slab_obj_exts(slab) &&
- WARN(alloc_slab_obj_exts(slab, s, flags, false),
+ WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
"%s, %s: Failed to create slab extension vector!\n",
__func__, s->name))
return NULL;
[ 1824.754108] prepare_slab_obj_exts_hook, zs_handle-zswap1: Failed to create slab extension vector!
[ 1824.771857] WARNING: CPU: 17 PID: 118473 at mm/slub.c:2074 kmem_cache_alloc_noprof+0x780/0x1620
...
[ 1824.967011] RIP: 0010:kmem_cache_alloc_noprof+0x780/0x1620
[ 1824.978004] Code: 48 8b 14 24 44 8b 4c 24 08 4c 8b 44 24 18 e9 fd f8 ff ff 49 8b 56 60 48 c7 c7 cc 76 5c 82 48 c7 c6 46 f6 6e 82 e8 10 14 63 ff <0f> 0b 44 8b 4c 24 08 e9 50 f9 ff ff f0 49 0f ba 2c 24 00 0f 82 cd
[ 1825.015516] RSP: 0000:ffffc9004a6cb228 EFLAGS: 00010286
[ 1825.025967] RAX: 0000000000000055 RBX: 0000000000000000 RCX: 0000000000000000
[ 1825.040248] RDX: ffff889fff6b0158 RSI: ffff889fff6a1c60 RDI: ffff889fff6a1c60
[ 1825.054534] RBP: ffffea009a9e6dc0 R08: ffffffff83268ec0 R09: 000000000002fffd
[ 1825.068817] R10: 0000000000000000 R11: ffffc9003de2f9d0 R12: ffffffff7fff0000
[ 1825.083086] R13: 0000777f80000000 R14: ffff8881caa20700 R15: ffff88a6a79b7000
[ 1825.097371] FS: 00007f545a9447c0(0000) GS:ffff889fff680000(0000) knlGS:0000000000000000
[ 1825.113572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1825.125063] CR2: 00007f4e5c3d0020 CR3: 0000001a2d9ad001 CR4: 00000000007726f0
[ 1825.139328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1825.153596] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1825.167858] PKRU: 55555554
[ 1825.173281] Call Trace:
[ 1825.178193] <TASK>
[ 1825.182402] ? __warn+0xa4/0x140
[ 1825.188882] ? kmem_cache_alloc_noprof+0x780/0x1620
[ 1825.198647] ? report_bug+0xe1/0x140
[ 1825.205821] ? kmem_cache_alloc_noprof+0x780/0x1620
[ 1825.215594] ? handle_bug+0x5e/0x90
[ 1825.222579] ? exc_invalid_op+0x16/0x40
[ 1825.230257] ? asm_exc_invalid_op+0x16/0x20
[ 1825.238633] ? kmem_cache_alloc_noprof+0x780/0x1620
[ 1825.248410] ? kmem_cache_alloc_noprof+0x780/0x1620
[ 1825.258212] ? zs_malloc+0x958/0x9d0
[ 1825.265393] ? ZSTD_compress2+0x73/0xb0
[ 1825.273099] zs_malloc+0x958/0x9d0
[ 1825.279930] ? zstd_scompress+0x4c/0x70
[ 1825.287625] ? scomp_acomp_comp_decomp.llvm.7924360292990082857+0x128/0x1d0
[ 1825.301568] zs_zpool_malloc+0xe/0x30
[ 1825.308915] zswap_store+0x4b7/0x8e0
[ 1825.316094] ? __lruvec_stat_mod_folio+0x11a/0x240
[ 1825.325724] swap_writepage+0x12f/0x370
[ 1825.333426] shrink_folio_list+0x809/0x13a0
[ 1825.341837] ? 0xffffffff81000000
[ 1825.348511] ? isolate_lru_folios+0x242/0x4d0
[ 1825.357263] ? sysvec_call_function+0xa/0x80
[ 1825.365840] ? asm_sysvec_call_function+0x16/0x20
[ 1825.375285] shrink_lruvec+0x50c/0xb50
[ 1825.382823] shrink_node+0x38e/0x8e0
[ 1825.390000] do_try_to_free_pages+0xc4/0x530
[ 1825.398560] try_to_free_pages+0x191/0x460
[ 1825.406805] __alloc_pages_noprof+0x2997/0x4f60
[ 1825.415906] vma_alloc_folio_noprof+0x132/0x530
[ 1825.425017] folio_prealloc+0xc8/0xf0
[ 1825.432383] do_pte_missing+0x605/0x1020
[ 1825.440268] handle_mm_fault+0x3f9/0x1190
[ 1825.448333] ? task_tick_fair.llvm.18033716799305157738+0x43/0x190
[ 1825.460729] do_user_addr_fault+0x196/0x5c0
[ 1825.469150] exc_page_fault+0x69/0x130
[ 1825.476687] asm_exc_page_fault+0x22/0x30
[ 1825.484737] RIP: 0033:0x346333
[ 1825.490879] Code: 5e 41 5f 5d c3 48 8b 41 08 48 8b 10 48 89 51 08 eb e4 49 8b 4e 08 4c 89 f6 48 2b 35 cf 8a 29 00 48 c1 ee 04 69 f6 ab aa aa aa <89> 71 20 c7 41 24 ff ff 00 00 48 8d b1 00 40 00 00 49 89 76 08 e9
[ 1825.528413] RSP: 002b:00007ffcabaf4b70 EFLAGS: 00010a07
[ 1825.538891] RAX: 000000000000000c RBX: 0000000000000018 RCX: 00007f4e5c3d0000
[ 1825.553175] RDX: 000052f00026ea40 RSI: 0000000000000221 RDI: 0000000000000018
[ 1825.567459] RBP: 00007ffcabaf4bc0 R08: 0000512000000040 R09: 0000000000000000
[ 1825.581746] R10: 00007f5428b0c6a0 R11: 0000000000000002 R12: 0000534000000800
[ 1825.596030] R13: 0000000000000190 R14: 000052f00026ea30 R15: 0000512000000040
[ 1825.610328] </TASK>
Thanks,
Usama
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 13:31 Memory allocation profiling warnings in memory bound systems Usama Arif
@ 2025-05-19 13:33 ` Usama Arif
2025-05-19 15:50 ` Suren Baghdasaryan
0 siblings, 1 reply; 10+ messages in thread
From: Usama Arif @ 2025-05-19 13:33 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Johannes Weiner, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
+cc Vlad
On 19/05/2025 14:31, Usama Arif wrote:
> Hi,
>
> We have started enabling memory allocation profiling (with kernel 6.13) in our fleet
> and are seeing a large number of warnings (reported by Vlad Poenaru) due to failure
> in allocation of slab object extensions on services that are memory bound. I have attached
> one of the logs at the end.
>
> Does it make sense to change the slabobj_ext to be allocated via kvcalloc and also change
> the WARN to WARN_ONCE (or maybe even pr_debug?) like the diff below? A large number of
> prints for this in a short time may mask any real issues in the system during memory
> pressure being reported in dmesg. I tried to see if there were any changes after 6.13
> to this code but didn't find any, but thought will check before sending below as a patch.
>
> diff --git a/mm/slub.c b/mm/slub.c
> index c2151c9fee22..4595ca190cd9 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1961,7 +1961,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> gfp &= ~OBJCGS_CLEAR_MASK;
> /* Prevent recursive extension vector allocation */
> gfp |= __GFP_NO_OBJ_EXT;
> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> slab_nid(slab));
> if (!vec) {
> /* Mark vectors which failed to allocate */
> @@ -2069,7 +2069,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>
> slab = virt_to_slab(p);
> if (!slab_obj_exts(slab) &&
> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
> "%s, %s: Failed to create slab extension vector!\n",
> __func__, s->name))
> return NULL;
>
>
>
>
> [ 1824.754108] prepare_slab_obj_exts_hook, zs_handle-zswap1: Failed to create slab extension vector!
> [ 1824.771857] WARNING: CPU: 17 PID: 118473 at mm/slub.c:2074 kmem_cache_alloc_noprof+0x780/0x1620
> ...
> [ 1824.967011] RIP: 0010:kmem_cache_alloc_noprof+0x780/0x1620
> [ 1824.978004] Code: 48 8b 14 24 44 8b 4c 24 08 4c 8b 44 24 18 e9 fd f8 ff ff 49 8b 56 60 48 c7 c7 cc 76 5c 82 48 c7 c6 46 f6 6e 82 e8 10 14 63 ff <0f> 0b 44 8b 4c 24 08 e9 50 f9 ff ff f0 49 0f ba 2c 24 00 0f 82 cd
> [ 1825.015516] RSP: 0000:ffffc9004a6cb228 EFLAGS: 00010286
> [ 1825.025967] RAX: 0000000000000055 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1825.040248] RDX: ffff889fff6b0158 RSI: ffff889fff6a1c60 RDI: ffff889fff6a1c60
> [ 1825.054534] RBP: ffffea009a9e6dc0 R08: ffffffff83268ec0 R09: 000000000002fffd
> [ 1825.068817] R10: 0000000000000000 R11: ffffc9003de2f9d0 R12: ffffffff7fff0000
> [ 1825.083086] R13: 0000777f80000000 R14: ffff8881caa20700 R15: ffff88a6a79b7000
> [ 1825.097371] FS: 00007f545a9447c0(0000) GS:ffff889fff680000(0000) knlGS:0000000000000000
> [ 1825.113572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1825.125063] CR2: 00007f4e5c3d0020 CR3: 0000001a2d9ad001 CR4: 00000000007726f0
> [ 1825.139328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1825.153596] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1825.167858] PKRU: 55555554
> [ 1825.173281] Call Trace:
> [ 1825.178193] <TASK>
> [ 1825.182402] ? __warn+0xa4/0x140
> [ 1825.188882] ? kmem_cache_alloc_noprof+0x780/0x1620
> [ 1825.198647] ? report_bug+0xe1/0x140
> [ 1825.205821] ? kmem_cache_alloc_noprof+0x780/0x1620
> [ 1825.215594] ? handle_bug+0x5e/0x90
> [ 1825.222579] ? exc_invalid_op+0x16/0x40
> [ 1825.230257] ? asm_exc_invalid_op+0x16/0x20
> [ 1825.238633] ? kmem_cache_alloc_noprof+0x780/0x1620
> [ 1825.248410] ? kmem_cache_alloc_noprof+0x780/0x1620
> [ 1825.258212] ? zs_malloc+0x958/0x9d0
> [ 1825.265393] ? ZSTD_compress2+0x73/0xb0
> [ 1825.273099] zs_malloc+0x958/0x9d0
> [ 1825.279930] ? zstd_scompress+0x4c/0x70
> [ 1825.287625] ? scomp_acomp_comp_decomp.llvm.7924360292990082857+0x128/0x1d0
> [ 1825.301568] zs_zpool_malloc+0xe/0x30
> [ 1825.308915] zswap_store+0x4b7/0x8e0
> [ 1825.316094] ? __lruvec_stat_mod_folio+0x11a/0x240
> [ 1825.325724] swap_writepage+0x12f/0x370
> [ 1825.333426] shrink_folio_list+0x809/0x13a0
> [ 1825.341837] ? 0xffffffff81000000
> [ 1825.348511] ? isolate_lru_folios+0x242/0x4d0
> [ 1825.357263] ? sysvec_call_function+0xa/0x80
> [ 1825.365840] ? asm_sysvec_call_function+0x16/0x20
> [ 1825.375285] shrink_lruvec+0x50c/0xb50
> [ 1825.382823] shrink_node+0x38e/0x8e0
> [ 1825.390000] do_try_to_free_pages+0xc4/0x530
> [ 1825.398560] try_to_free_pages+0x191/0x460
> [ 1825.406805] __alloc_pages_noprof+0x2997/0x4f60
> [ 1825.415906] vma_alloc_folio_noprof+0x132/0x530
> [ 1825.425017] folio_prealloc+0xc8/0xf0
> [ 1825.432383] do_pte_missing+0x605/0x1020
> [ 1825.440268] handle_mm_fault+0x3f9/0x1190
> [ 1825.448333] ? task_tick_fair.llvm.18033716799305157738+0x43/0x190
> [ 1825.460729] do_user_addr_fault+0x196/0x5c0
> [ 1825.469150] exc_page_fault+0x69/0x130
> [ 1825.476687] asm_exc_page_fault+0x22/0x30
> [ 1825.484737] RIP: 0033:0x346333
> [ 1825.490879] Code: 5e 41 5f 5d c3 48 8b 41 08 48 8b 10 48 89 51 08 eb e4 49 8b 4e 08 4c 89 f6 48 2b 35 cf 8a 29 00 48 c1 ee 04 69 f6 ab aa aa aa <89> 71 20 c7 41 24 ff ff 00 00 48 8d b1 00 40 00 00 49 89 76 08 e9
> [ 1825.528413] RSP: 002b:00007ffcabaf4b70 EFLAGS: 00010a07
> [ 1825.538891] RAX: 000000000000000c RBX: 0000000000000018 RCX: 00007f4e5c3d0000
> [ 1825.553175] RDX: 000052f00026ea40 RSI: 0000000000000221 RDI: 0000000000000018
> [ 1825.567459] RBP: 00007ffcabaf4bc0 R08: 0000512000000040 R09: 0000000000000000
> [ 1825.581746] R10: 00007f5428b0c6a0 R11: 0000000000000002 R12: 0000534000000800
> [ 1825.596030] R13: 0000000000000190 R14: 000052f00026ea30 R15: 0000512000000040
> [ 1825.610328] </TASK>
>
>
> Thanks,
> Usama
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 13:33 ` Usama Arif
@ 2025-05-19 15:50 ` Suren Baghdasaryan
2025-05-19 16:08 ` Johannes Weiner
0 siblings, 1 reply; 10+ messages in thread
From: Suren Baghdasaryan @ 2025-05-19 15:50 UTC (permalink / raw)
To: Usama Arif
Cc: Johannes Weiner, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
On Mon, May 19, 2025 at 6:33 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
> +cc Vlad
>
> On 19/05/2025 14:31, Usama Arif wrote:
> > Hi,
> >
> > We have started enabling memory allocation profiling (with kernel 6.13) in our fleet
> > and are seeing a large number of warnings (reported by Vlad Poenaru) due to failure
> > in allocation of slab object extensions on services that are memory bound. I have attached
> > one of the logs at the end.
> >
> > Does it make sense to change the slabobj_ext to be allocated via kvcalloc and also change
> > the WARN to WARN_ONCE (or maybe even pr_debug?) like the diff below? A large number of
> > prints for this in a short time may mask any real issues in the system during memory
> > pressure being reported in dmesg. I tried to see if there were any changes after 6.13
> > to this code but didn't find any, but thought will check before sending below as a patch.
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index c2151c9fee22..4595ca190cd9 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1961,7 +1961,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > gfp &= ~OBJCGS_CLEAR_MASK;
> > /* Prevent recursive extension vector allocation */
> > gfp |= __GFP_NO_OBJ_EXT;
> > - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
Hi Usama,
Is the allocation larger than page size? IIUC, unless allocation size
is over PAGE_SIZE, kvcalloc_node() will not fall back to vmalloc (see:
https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L668). How
big is the allocation when it fails in your case?
> > slab_nid(slab));
> > if (!vec) {
> > /* Mark vectors which failed to allocate */
> > @@ -2069,7 +2069,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> >
> > slab = virt_to_slab(p);
> > if (!slab_obj_exts(slab) &&
> > - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> > + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
Makes sense if you see lots of these but I'm still wondering how big
these failing allocations are.
> > "%s, %s: Failed to create slab extension vector!\n",
> > __func__, s->name))
> > return NULL;
> >
> >
> >
> >
> > [ 1824.754108] prepare_slab_obj_exts_hook, zs_handle-zswap1: Failed to create slab extension vector!
> > [ 1824.771857] WARNING: CPU: 17 PID: 118473 at mm/slub.c:2074 kmem_cache_alloc_noprof+0x780/0x1620
> > ...
> > [ 1824.967011] RIP: 0010:kmem_cache_alloc_noprof+0x780/0x1620
> > [ 1824.978004] Code: 48 8b 14 24 44 8b 4c 24 08 4c 8b 44 24 18 e9 fd f8 ff ff 49 8b 56 60 48 c7 c7 cc 76 5c 82 48 c7 c6 46 f6 6e 82 e8 10 14 63 ff <0f> 0b 44 8b 4c 24 08 e9 50 f9 ff ff f0 49 0f ba 2c 24 00 0f 82 cd
> > [ 1825.015516] RSP: 0000:ffffc9004a6cb228 EFLAGS: 00010286
> > [ 1825.025967] RAX: 0000000000000055 RBX: 0000000000000000 RCX: 0000000000000000
> > [ 1825.040248] RDX: ffff889fff6b0158 RSI: ffff889fff6a1c60 RDI: ffff889fff6a1c60
> > [ 1825.054534] RBP: ffffea009a9e6dc0 R08: ffffffff83268ec0 R09: 000000000002fffd
> > [ 1825.068817] R10: 0000000000000000 R11: ffffc9003de2f9d0 R12: ffffffff7fff0000
> > [ 1825.083086] R13: 0000777f80000000 R14: ffff8881caa20700 R15: ffff88a6a79b7000
> > [ 1825.097371] FS: 00007f545a9447c0(0000) GS:ffff889fff680000(0000) knlGS:0000000000000000
> > [ 1825.113572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1825.125063] CR2: 00007f4e5c3d0020 CR3: 0000001a2d9ad001 CR4: 00000000007726f0
> > [ 1825.139328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 1825.153596] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 1825.167858] PKRU: 55555554
> > [ 1825.173281] Call Trace:
> > [ 1825.178193] <TASK>
> > [ 1825.182402] ? __warn+0xa4/0x140
> > [ 1825.188882] ? kmem_cache_alloc_noprof+0x780/0x1620
> > [ 1825.198647] ? report_bug+0xe1/0x140
> > [ 1825.205821] ? kmem_cache_alloc_noprof+0x780/0x1620
> > [ 1825.215594] ? handle_bug+0x5e/0x90
> > [ 1825.222579] ? exc_invalid_op+0x16/0x40
> > [ 1825.230257] ? asm_exc_invalid_op+0x16/0x20
> > [ 1825.238633] ? kmem_cache_alloc_noprof+0x780/0x1620
> > [ 1825.248410] ? kmem_cache_alloc_noprof+0x780/0x1620
> > [ 1825.258212] ? zs_malloc+0x958/0x9d0
> > [ 1825.265393] ? ZSTD_compress2+0x73/0xb0
> > [ 1825.273099] zs_malloc+0x958/0x9d0
> > [ 1825.279930] ? zstd_scompress+0x4c/0x70
> > [ 1825.287625] ? scomp_acomp_comp_decomp.llvm.7924360292990082857+0x128/0x1d0
> > [ 1825.301568] zs_zpool_malloc+0xe/0x30
> > [ 1825.308915] zswap_store+0x4b7/0x8e0
> > [ 1825.316094] ? __lruvec_stat_mod_folio+0x11a/0x240
> > [ 1825.325724] swap_writepage+0x12f/0x370
> > [ 1825.333426] shrink_folio_list+0x809/0x13a0
> > [ 1825.341837] ? 0xffffffff81000000
> > [ 1825.348511] ? isolate_lru_folios+0x242/0x4d0
> > [ 1825.357263] ? sysvec_call_function+0xa/0x80
> > [ 1825.365840] ? asm_sysvec_call_function+0x16/0x20
> > [ 1825.375285] shrink_lruvec+0x50c/0xb50
> > [ 1825.382823] shrink_node+0x38e/0x8e0
> > [ 1825.390000] do_try_to_free_pages+0xc4/0x530
> > [ 1825.398560] try_to_free_pages+0x191/0x460
> > [ 1825.406805] __alloc_pages_noprof+0x2997/0x4f60
> > [ 1825.415906] vma_alloc_folio_noprof+0x132/0x530
> > [ 1825.425017] folio_prealloc+0xc8/0xf0
> > [ 1825.432383] do_pte_missing+0x605/0x1020
> > [ 1825.440268] handle_mm_fault+0x3f9/0x1190
> > [ 1825.448333] ? task_tick_fair.llvm.18033716799305157738+0x43/0x190
> > [ 1825.460729] do_user_addr_fault+0x196/0x5c0
> > [ 1825.469150] exc_page_fault+0x69/0x130
> > [ 1825.476687] asm_exc_page_fault+0x22/0x30
> > [ 1825.484737] RIP: 0033:0x346333
> > [ 1825.490879] Code: 5e 41 5f 5d c3 48 8b 41 08 48 8b 10 48 89 51 08 eb e4 49 8b 4e 08 4c 89 f6 48 2b 35 cf 8a 29 00 48 c1 ee 04 69 f6 ab aa aa aa <89> 71 20 c7 41 24 ff ff 00 00 48 8d b1 00 40 00 00 49 89 76 08 e9
> > [ 1825.528413] RSP: 002b:00007ffcabaf4b70 EFLAGS: 00010a07
> > [ 1825.538891] RAX: 000000000000000c RBX: 0000000000000018 RCX: 00007f4e5c3d0000
> > [ 1825.553175] RDX: 000052f00026ea40 RSI: 0000000000000221 RDI: 0000000000000018
> > [ 1825.567459] RBP: 00007ffcabaf4bc0 R08: 0000512000000040 R09: 0000000000000000
> > [ 1825.581746] R10: 00007f5428b0c6a0 R11: 0000000000000002 R12: 0000534000000800
> > [ 1825.596030] R13: 0000000000000190 R14: 000052f00026ea30 R15: 0000512000000040
> > [ 1825.610328] </TASK>
> >
> >
> > Thanks,
> > Usama
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 15:50 ` Suren Baghdasaryan
@ 2025-05-19 16:08 ` Johannes Weiner
2025-05-19 16:42 ` Suren Baghdasaryan
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2025-05-19 16:08 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
On Mon, May 19, 2025 at 08:50:28AM -0700, Suren Baghdasaryan wrote:
> On Mon, May 19, 2025 at 6:33 AM Usama Arif <usamaarif642@gmail.com> wrote:
> >
> >
> > +cc Vlad
> >
> > On 19/05/2025 14:31, Usama Arif wrote:
> > > Hi,
> > >
> > > We have started enabling memory allocation profiling (with kernel 6.13) in our fleet
> > > and are seeing a large number of warnings (reported by Vlad Poenaru) due to failure
> > > in allocation of slab object extensions on services that are memory bound. I have attached
> > > one of the logs at the end.
> > >
> > > Does it make sense to change the slabobj_ext to be allocated via kvcalloc and also change
> > > the WARN to WARN_ONCE (or maybe even pr_debug?) like the diff below? A large number of
> > > prints for this in a short time may mask any real issues in the system during memory
> > > pressure being reported in dmesg. I tried to see if there were any changes after 6.13
> > > to this code but didn't find any, but thought will check before sending below as a patch.
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index c2151c9fee22..4595ca190cd9 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1961,7 +1961,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > gfp &= ~OBJCGS_CLEAR_MASK;
> > > /* Prevent recursive extension vector allocation */
> > > gfp |= __GFP_NO_OBJ_EXT;
> > > - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>
> Hi Usama,
> Is the allocation larger than page size? IIUC, unless allocation size
> is over PAGE_SIZE, kvcalloc_node() will not fall back to vmalloc (see:
> https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L668). How
> big is the allocation when it fails in your case?
Digging through the reports, it appears we're encountering both. We've
seen a zswap slab where the slab is order-0 and slabext is
higher-order (8 byte objects, 512 objsperslab, 1 pageperslab), but
also biovec-max where it's the other way round (4k byte objects, 8
objsperslab, 8 pagesperslab).
In the first case, vmalloc would help. In the second it wouldn't.
The second case is interesting. The higher-order slab succeeds because
bios use a mempool; but the system is so depleted that the order-0 for
the slabext fails.
I'm not sure there is much we can do about this tbh. It would seem
overkill to add a mempool or grant the tracking access to system-wide
emergency reserves.
A warn-once would probably make sense nonetheless.
It might also make sense to flag the line item for that callsite in
the reporting file, to make it obvious that the counter is compromised
and is missing allocations?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 16:08 ` Johannes Weiner
@ 2025-05-19 16:42 ` Suren Baghdasaryan
2025-05-19 17:23 ` Usama Arif
0 siblings, 1 reply; 10+ messages in thread
From: Suren Baghdasaryan @ 2025-05-19 16:42 UTC (permalink / raw)
To: Johannes Weiner
Cc: Usama Arif, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
On Mon, May 19, 2025 at 9:08 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Mon, May 19, 2025 at 08:50:28AM -0700, Suren Baghdasaryan wrote:
> > On Mon, May 19, 2025 at 6:33 AM Usama Arif <usamaarif642@gmail.com> wrote:
> > >
> > >
> > > +cc Vlad
> > >
> > > On 19/05/2025 14:31, Usama Arif wrote:
> > > > Hi,
> > > >
> > > > We have started enabling memory allocation profiling (with kernel 6.13) in our fleet
> > > > and are seeing a large number of warnings (reported by Vlad Poenaru) due to failure
> > > > in allocation of slab object extensions on services that are memory bound. I have attached
> > > > one of the logs at the end.
> > > >
> > > > Does it make sense to change the slabobj_ext to be allocated via kvcalloc and also change
> > > > the WARN to WARN_ONCE (or maybe even pr_debug?) like the diff below? A large number of
> > > > prints for this in a short time may mask any real issues in the system during memory
> > > > pressure being reported in dmesg. I tried to see if there were any changes after 6.13
> > > > to this code but didn't find any, but thought will check before sending below as a patch.
> > > >
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index c2151c9fee22..4595ca190cd9 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -1961,7 +1961,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > gfp &= ~OBJCGS_CLEAR_MASK;
> > > > /* Prevent recursive extension vector allocation */
> > > > gfp |= __GFP_NO_OBJ_EXT;
> > > > - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> >
> > Hi Usama,
> > Is the allocation larger than page size? IIUC, unless allocation size
> > is over PAGE_SIZE, kvcalloc_node() will not fall back to vmalloc (see:
> > https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L668). How
> > big is the allocation when it fails in your case?
>
> Digging through the reports, it appears we're encountering both. We've
> seen a zswap slab where the slab is order-0 and slabext is
> higher-order (8 byte objects, 512 objsperslab, 1 pageperslab), but
> also biovec-max where it's the other way round (4k byte objects, 8
> objsperslab, 8 pagesperslab).
>
> In the first case, vmalloc would help. In the second it wouldn't.
Ok, then I don't see any downside to changing to kvcalloc_node() here.
Let's do it.
>
> The second case is interesting. The higher-order slab succeeds because
> bios use a mempool; but the system is so depleted that the order-0 for
> the slabext fails.
I see.
>
> I'm not sure there is much we can do about this tbh. It would seem
> overkill to add a mempool or grant the tracking access to system-wide
> emergency reserves.
Yeah, with the system under so much memory pressure we probably have
bigger issues than extension vector allocation failures.
>
> A warn-once would probably make sense nonetheless.
Agree.
>
> It might also make sense to flag the line item for that callsite in
> the reporting file, to make it obvious that the counter is compromised
> and is missing allocations?
Good idea. We could output something like 'X' instead of the number if
the value is known to be invalid. I can look into it. Will also have
to raise the file version so that parsers can handle this change.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 16:42 ` Suren Baghdasaryan
@ 2025-05-19 17:23 ` Usama Arif
2025-05-19 17:29 ` Johannes Weiner
0 siblings, 1 reply; 10+ messages in thread
From: Usama Arif @ 2025-05-19 17:23 UTC (permalink / raw)
To: Suren Baghdasaryan, Johannes Weiner
Cc: Shakeel Butt, Linux Memory Management List, kent.overstreet, vlad.wing
On 19/05/2025 17:42, Suren Baghdasaryan wrote:
> On Mon, May 19, 2025 at 9:08 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>>
>> On Mon, May 19, 2025 at 08:50:28AM -0700, Suren Baghdasaryan wrote:
>>> On Mon, May 19, 2025 at 6:33 AM Usama Arif <usamaarif642@gmail.com> wrote:
>>>>
>>>>
>>>> +cc Vlad
>>>>
>>>> On 19/05/2025 14:31, Usama Arif wrote:
>>>>> Hi,
>>>>>
>>>>> We have started enabling memory allocation profiling (with kernel 6.13) in our fleet
>>>>> and are seeing a large number of warnings (reported by Vlad Poenaru) due to failure
>>>>> in allocation of slab object extensions on services that are memory bound. I have attached
>>>>> one of the logs at the end.
>>>>>
>>>>> Does it make sense to change the slabobj_ext to be allocated via kvcalloc and also change
>>>>> the WARN to WARN_ONCE (or maybe even pr_debug?) like the diff below? A large number of
>>>>> prints for this in a short time may mask any real issues in the system during memory
>>>>> pressure being reported in dmesg. I tried to see if there were any changes after 6.13
>>>>> to this code but didn't find any, but thought will check before sending below as a patch.
>>>>>
>>>>> diff --git a/mm/slub.c b/mm/slub.c
>>>>> index c2151c9fee22..4595ca190cd9 100644
>>>>> --- a/mm/slub.c
>>>>> +++ b/mm/slub.c
>>>>> @@ -1961,7 +1961,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>>>>> gfp &= ~OBJCGS_CLEAR_MASK;
>>>>> /* Prevent recursive extension vector allocation */
>>>>> gfp |= __GFP_NO_OBJ_EXT;
>>>>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>>>>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>>>
>>> Hi Usama,
>>> Is the allocation larger than page size? IIUC, unless allocation size
>>> is over PAGE_SIZE, kvcalloc_node() will not fall back to vmalloc (see:
>>> https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L668). How
>>> big is the allocation when it fails in your case?
>>
>> Digging through the reports, it appears we're encountering both. We've
>> seen a zswap slab where the slab is order-0 and slabext is
>> higher-order (8 byte objects, 512 objsperslab, 1 pageperslab), but
>> also biovec-max where it's the other way round (4k byte objects, 8
>> objsperslab, 8 pagesperslab).
>>
>> In the first case, vmalloc would help. In the second it wouldn't.
>
> Ok, then I don't see any downside to changing to kvcalloc_node() here.
> Let's do it.
>
>>
>> The second case is interesting. The higher-order slab succeeds because
>> bios use a mempool; but the system is so depleted that the order-0 for
>> the slabext fails.
>
> I see.
>
>>
>> I'm not sure there is much we can do about this tbh. It would seem
>> overkill to add a mempool or grant the tracking access to system-wide
>> emergency reserves.
>
> Yeah, with the system under so much memory pressure we probably have
> bigger issues than extension vector allocation failures.
>
>>
>> A warn-once would probably make sense nonetheless.
>
> Agree.
>
>>
>> It might also make sense to flag the line item for that callsite in
>> the reporting file, to make it obvious that the counter is compromised
>> and is missing allocations?
>
> Good idea. We could output something like 'X' instead of the number if
> the value is known to be invalid. I can look into it. Will also have
> to raise the file version so that parsers can handle this change.
>
Thanks, I will send the above diff as patches.
For when the value is inaccurate, it might be better to have the number
and [X] next to it to reflect its inaccurate? Maybe an inaccurate number
is better than no number?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 17:23 ` Usama Arif
@ 2025-05-19 17:29 ` Johannes Weiner
2025-05-19 17:56 ` Suren Baghdasaryan
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2025-05-19 17:29 UTC (permalink / raw)
To: Usama Arif
Cc: Suren Baghdasaryan, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
On Mon, May 19, 2025 at 06:23:59PM +0100, Usama Arif wrote:
> For when the value is inaccurate, it might be better to have the number
> and [X] next to it to reflect its inaccurate? Maybe an inaccurate number
> is better than no number?
Right, it could be a large consumer with only a handful of individual
objects missing from the tally due to these extreme corner cases.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 17:29 ` Johannes Weiner
@ 2025-05-19 17:56 ` Suren Baghdasaryan
2025-05-19 18:31 ` Usama Arif
0 siblings, 1 reply; 10+ messages in thread
From: Suren Baghdasaryan @ 2025-05-19 17:56 UTC (permalink / raw)
To: Johannes Weiner
Cc: Usama Arif, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
On Mon, May 19, 2025 at 10:29 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Mon, May 19, 2025 at 06:23:59PM +0100, Usama Arif wrote:
> > For when the value is inaccurate, it might be better to have the number
> > and [X] next to it to reflect its inaccurate? Maybe an inaccurate number
> > is better than no number?
>
> Right, it could be a large consumer with only a handful of individual
> objects missing from the tally due to these extreme corner cases.
Ok, then "<value>?" might make more sense.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 17:56 ` Suren Baghdasaryan
@ 2025-05-19 18:31 ` Usama Arif
2025-05-19 18:39 ` Suren Baghdasaryan
0 siblings, 1 reply; 10+ messages in thread
From: Usama Arif @ 2025-05-19 18:31 UTC (permalink / raw)
To: Suren Baghdasaryan, Johannes Weiner
Cc: Shakeel Butt, Linux Memory Management List, kent.overstreet, vlad.wing
On 19/05/2025 18:56, Suren Baghdasaryan wrote:
> On Mon, May 19, 2025 at 10:29 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>>
>> On Mon, May 19, 2025 at 06:23:59PM +0100, Usama Arif wrote:
>>> For when the value is inaccurate, it might be better to have the number
>>> and [X] next to it to reflect its inaccurate? Maybe an inaccurate number
>>> is better than no number?
>>
>> Right, it could be a large consumer with only a handful of individual
>> objects missing from the tally due to these extreme corner cases.
>
> Ok, then "<value>?" might make more sense.
Yeah I think that would be good as well.
I guess as long as its loud enough for anyone looking at the profile
to go and have a look at the documentation to see what it would mean.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Memory allocation profiling warnings in memory bound systems
2025-05-19 18:31 ` Usama Arif
@ 2025-05-19 18:39 ` Suren Baghdasaryan
0 siblings, 0 replies; 10+ messages in thread
From: Suren Baghdasaryan @ 2025-05-19 18:39 UTC (permalink / raw)
To: Usama Arif
Cc: Johannes Weiner, Shakeel Butt, Linux Memory Management List,
kent.overstreet, vlad.wing
On Mon, May 19, 2025 at 11:31 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 19/05/2025 18:56, Suren Baghdasaryan wrote:
> > On Mon, May 19, 2025 at 10:29 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >>
> >> On Mon, May 19, 2025 at 06:23:59PM +0100, Usama Arif wrote:
> >>> For when the value is inaccurate, it might be better to have the number
> >>> and [X] next to it to reflect its inaccurate? Maybe an inaccurate number
> >>> is better than no number?
> >>
> >> Right, it could be a large consumer with only a handful of individual
> >> objects missing from the tally due to these extreme corner cases.
> >
> > Ok, then "<value>?" might make more sense.
>
> Yeah I think that would be good as well.
> I guess as long as its loud enough for anyone looking at the profile
> to go and have a look at the documentation to see what it would mean.
Ok, I'll work on that but probably not until June 2nd. I have an
upcoming vacation and won't be able to post the change before it
starts. Hope this timeline works for you.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-05-19 18:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-19 13:31 Memory allocation profiling warnings in memory bound systems Usama Arif
2025-05-19 13:33 ` Usama Arif
2025-05-19 15:50 ` Suren Baghdasaryan
2025-05-19 16:08 ` Johannes Weiner
2025-05-19 16:42 ` Suren Baghdasaryan
2025-05-19 17:23 ` Usama Arif
2025-05-19 17:29 ` Johannes Weiner
2025-05-19 17:56 ` Suren Baghdasaryan
2025-05-19 18:31 ` Usama Arif
2025-05-19 18:39 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox