From: Zhao Liu <zhao1.liu@intel.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Hao Li <haolee.swjtu@gmail.com>,
akpm@linux-foundation.org, harry.yoo@oracle.com, cl@gentwo.org,
rientjes@google.com, roman.gushchin@linux.dev,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
tim.c.chen@intel.com, yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty_main()
Date: Fri, 16 Jan 2026 17:07:30 +0800 [thread overview]
Message-ID: <aWn/0mn93MmUvTPY@intel.com> (raw)
In-Reply-To: <6be60100-e94c-4c06-9542-29ac8bf8f013@suse.cz>
> > The following is the perf data comparing 2 tests w/o fix & with this fix:
> >
> > # Baseline Delta Abs Shared Object Symbol
> > # ........ ......... ....................... ....................................
> > #
> > 61.76% +4.78% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> > 0.93% -0.32% [kernel.vmlinux] [k] __slab_free
> > 0.39% -0.31% [kernel.vmlinux] [k] barn_get_empty_sheaf
> > 1.35% -0.30% [kernel.vmlinux] [k] mas_leaf_max_gap
> > 3.22% -0.30% [kernel.vmlinux] [k] __kmem_cache_alloc_bulk
> > 1.73% -0.20% [kernel.vmlinux] [k] __cond_resched
> > 0.52% -0.19% [kernel.vmlinux] [k] _raw_spin_lock_irqsave
> > 0.92% +0.18% [kernel.vmlinux] [k] _raw_spin_lock
> > 1.91% -0.15% [kernel.vmlinux] [k] zap_pmd_range.isra.0
> > 1.37% -0.13% [kernel.vmlinux] [k] mas_wr_node_store
> > 1.29% -0.12% [kernel.vmlinux] [k] free_pud_range
> > 0.92% -0.11% [kernel.vmlinux] [k] __mmap_region
> > 0.12% -0.11% [kernel.vmlinux] [k] barn_put_empty_sheaf
> > 0.20% -0.09% [kernel.vmlinux] [k] barn_replace_empty_sheaf
> > 0.31% +0.09% [kernel.vmlinux] [k] get_partial_node
> > 0.29% -0.07% [kernel.vmlinux] [k] __rcu_free_sheaf_prepare
> > 0.12% -0.07% [kernel.vmlinux] [k] intel_idle_xstate
> > 0.21% -0.07% [kernel.vmlinux] [k] __kfree_rcu_sheaf
> > 0.26% -0.07% [kernel.vmlinux] [k] down_write
> > 0.53% -0.06% libc.so.6 [.] __mmap
> > 0.66% -0.06% [kernel.vmlinux] [k] mas_walk
> > 0.48% -0.06% [kernel.vmlinux] [k] mas_prev_slot
> > 0.45% -0.06% [kernel.vmlinux] [k] mas_find
> > 0.38% -0.06% [kernel.vmlinux] [k] mas_wr_store_type
> > 0.23% -0.06% [kernel.vmlinux] [k] do_vmi_align_munmap
> > 0.21% -0.05% [kernel.vmlinux] [k] perf_event_mmap_event
> > 0.32% -0.05% [kernel.vmlinux] [k] entry_SYSRETQ_unsafe_stack
> > 0.19% -0.05% [kernel.vmlinux] [k] downgrade_write
> > 0.59% -0.05% [kernel.vmlinux] [k] mas_next_slot
> > 0.31% -0.05% [kernel.vmlinux] [k] __mmap_new_vma
> > 0.44% -0.05% [kernel.vmlinux] [k] kmem_cache_alloc_noprof
> > 0.28% -0.05% [kernel.vmlinux] [k] __vma_enter_locked
> > 0.41% -0.05% [kernel.vmlinux] [k] memcpy
> > 0.48% -0.04% [kernel.vmlinux] [k] mas_store_gfp
> > 0.14% +0.04% [kernel.vmlinux] [k] __put_partials
> > 0.19% -0.04% [kernel.vmlinux] [k] mas_empty_area_rev
> > 0.30% -0.04% [kernel.vmlinux] [k] do_syscall_64
> > 0.25% -0.04% [kernel.vmlinux] [k] mas_preallocate
> > 0.15% -0.04% [kernel.vmlinux] [k] rcu_free_sheaf
> > 0.22% -0.04% [kernel.vmlinux] [k] entry_SYSCALL_64
> > 0.49% -0.04% libc.so.6 [.] __munmap
> > 0.91% -0.04% [kernel.vmlinux] [k] rcu_all_qs
> > 0.21% -0.04% [kernel.vmlinux] [k] __vm_munmap
> > 0.24% -0.04% [kernel.vmlinux] [k] mas_store_prealloc
> > 0.19% -0.04% [kernel.vmlinux] [k] __kmalloc_cache_noprof
> > 0.34% -0.04% [kernel.vmlinux] [k] build_detached_freelist
> > 0.19% -0.03% [kernel.vmlinux] [k] vms_complete_munmap_vmas
> > 0.36% -0.03% [kernel.vmlinux] [k] mas_rev_awalk
> > 0.05% -0.03% [kernel.vmlinux] [k] shuffle_freelist
> > 0.19% -0.03% [kernel.vmlinux] [k] down_write_killable
> > 0.19% -0.03% [kernel.vmlinux] [k] kmem_cache_free
> > 0.27% -0.03% [kernel.vmlinux] [k] up_write
> > 0.13% -0.03% [kernel.vmlinux] [k] vm_area_alloc
> > 0.18% -0.03% [kernel.vmlinux] [k] arch_get_unmapped_area_topdown
> > 0.08% -0.03% [kernel.vmlinux] [k] userfaultfd_unmap_complete
> > 0.10% -0.03% [kernel.vmlinux] [k] tlb_gather_mmu
> > 0.30% -0.02% [kernel.vmlinux] [k] ___slab_alloc
> >
> > I think the insteresting item is "get_partial_node". It seems this fix
> > makes "get_partial_node" slightly more frequent. HMM, however, I still
> > can't figure out why this is happening. Do you have any thoughts on it?
>
> I'm not sure if it's statistically significant or just noise, +0.09% could
> be noise?
small number does't always mean it's noise. When perf samples get_partial_node
on the spin lock call chain, its subroutines (spin lock) are hotter, so
the proportion of subroutine execution is higher. If the function -
get_partial_node itself (excluding subroutines) executes very quickly,
the proportion is lower.
I also expend the perf data with call chain:
* w/o fix:
We can calculate the proportion of spin locks introduced by get_partial_node
is: 31.05% / 49.91% = 62.21%
49.91% mmap2_processes [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
|
--49.91%--native_queued_spin_lock_slowpath
|
--49.91%--_raw_spin_lock_irqsave
|
|--31.05%--get_partial_node
| |
| |--23.66%--get_any_partial
| | ___slab_alloc
| |
| --7.40%--___slab_alloc
| __kmem_cache_alloc_bulk
|
|--10.84%--barn_get_empty_sheaf
| |
| |--6.18%--__kfree_rcu_sheaf
| | kvfree_call_rcu
| |
| --4.66%--__pcs_replace_empty_main
| kmem_cache_alloc_noprof
|
|--5.10%--barn_put_empty_sheaf
| |
| --5.09%--__pcs_replace_empty_main
| kmem_cache_alloc_noprof
|
|--2.01%--barn_replace_empty_sheaf
| __pcs_replace_empty_main
| kmem_cache_alloc_noprof
|
--0.78%--__put_partials
|
--0.78%--__kmem_cache_free_bulk.part.0
rcu_free_sheaf
* with fix:
Similarly, the proportion of spin locks introduced by get_partial_node
is: 39.91% / 42.82% = 93.20%
42.82% mmap2_processes [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
|
---native_queued_spin_lock_slowpath
|
--42.82%--_raw_spin_lock_irqsave
|
|--39.91%--get_partial_node
| |
| |--28.25%--get_any_partial
| | ___slab_alloc
| |
| --11.66%--___slab_alloc
| __kmem_cache_alloc_bulk
|
|--1.09%--barn_get_empty_sheaf
| |
| --0.90%--__kfree_rcu_sheaf
| kvfree_call_rcu
|
|--0.96%--barn_replace_empty_sheaf
| __pcs_replace_empty_main
| kmem_cache_alloc_noprof
|
--0.77%--__put_partials
__kmem_cache_free_bulk.part.0
rcu_free_sheaf
So, 62.21% -> 93.20% could reflect that get_partial_node contribute more
overhead at this point.
> > So, I'd like to know if you think dynamically or adaptively adjusting
> > capacity is a worthwhile idea.
>
> In the followup series, there will be automatically determined capacity to
> roughly match the current capacity of cpu partial slabs:
>
> https://lore.kernel.org/all/20260112-sheaves-for-all-v2-4-98225cfb50cf@suse.cz/
>
> We can use that as starting point for further tuning. But I suspect making
> it adjust dynamically would be complicated.
Thanks, will continue to evaluate this series.
Regards,
Zhao
next prev parent reply other threads:[~2026-01-16 8:42 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 0:26 Hao Li
2025-12-15 14:30 ` Vlastimil Babka
2025-12-16 2:34 ` Hao Lee
2025-12-22 10:20 ` Harry Yoo
2026-01-05 15:58 ` Vlastimil Babka
2026-01-15 10:12 ` Zhao Liu
2026-01-15 16:19 ` Vlastimil Babka
2026-01-16 9:07 ` Zhao Liu [this message]
2026-01-16 9:11 ` Hao Li
2026-01-16 4:06 ` Hao Li
2026-01-16 9:16 ` Zhao Liu
2026-01-16 9:09 ` Hao Li
2026-01-19 6:07 ` Hao Li
2026-01-20 8:21 ` Zhao Liu
2026-01-21 3:15 ` Hao Li
2026-01-21 13:17 ` Zhao Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWn/0mn93MmUvTPY@intel.com \
--to=zhao1.liu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=cl@gentwo.org \
--cc=haolee.swjtu@gmail.com \
--cc=harry.yoo@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=tim.c.chen@intel.com \
--cc=vbabka@suse.cz \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox