Re: [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty_main()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zhao Liu <zhao1.liu@intel.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Hao Li <haolee.swjtu@gmail.com>,
	akpm@linux-foundation.org, harry.yoo@oracle.com, cl@gentwo.org,
	rientjes@google.com, roman.gushchin@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	tim.c.chen@intel.com, yu.c.chen@intel.com, zhao1.liu@intel.com
Subject: Re: [PATCH v2] slub: keep empty main sheaf as spare in __pcs_replace_empty_main()
Date: Fri, 16 Jan 2026 17:07:30 +0800	[thread overview]
Message-ID: <aWn/0mn93MmUvTPY@intel.com> (raw)
In-Reply-To: <6be60100-e94c-4c06-9542-29ac8bf8f013@suse.cz>

> > The following is the perf data comparing 2 tests w/o fix & with this fix:
> > 
> > # Baseline  Delta Abs  Shared Object            Symbol
> > # ........  .........  .......................  ....................................
> > #
> >     61.76%     +4.78%  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
> >      0.93%     -0.32%  [kernel.vmlinux]         [k] __slab_free
> >      0.39%     -0.31%  [kernel.vmlinux]         [k] barn_get_empty_sheaf
> >      1.35%     -0.30%  [kernel.vmlinux]         [k] mas_leaf_max_gap
> >      3.22%     -0.30%  [kernel.vmlinux]         [k] __kmem_cache_alloc_bulk
> >      1.73%     -0.20%  [kernel.vmlinux]         [k] __cond_resched
> >      0.52%     -0.19%  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
> >      0.92%     +0.18%  [kernel.vmlinux]         [k] _raw_spin_lock
> >      1.91%     -0.15%  [kernel.vmlinux]         [k] zap_pmd_range.isra.0
> >      1.37%     -0.13%  [kernel.vmlinux]         [k] mas_wr_node_store
> >      1.29%     -0.12%  [kernel.vmlinux]         [k] free_pud_range
> >      0.92%     -0.11%  [kernel.vmlinux]         [k] __mmap_region
> >      0.12%     -0.11%  [kernel.vmlinux]         [k] barn_put_empty_sheaf
> >      0.20%     -0.09%  [kernel.vmlinux]         [k] barn_replace_empty_sheaf
> >      0.31%     +0.09%  [kernel.vmlinux]         [k] get_partial_node
> >      0.29%     -0.07%  [kernel.vmlinux]         [k] __rcu_free_sheaf_prepare
> >      0.12%     -0.07%  [kernel.vmlinux]         [k] intel_idle_xstate
> >      0.21%     -0.07%  [kernel.vmlinux]         [k] __kfree_rcu_sheaf
> >      0.26%     -0.07%  [kernel.vmlinux]         [k] down_write
> >      0.53%     -0.06%  libc.so.6                [.] __mmap
> >      0.66%     -0.06%  [kernel.vmlinux]         [k] mas_walk
> >      0.48%     -0.06%  [kernel.vmlinux]         [k] mas_prev_slot
> >      0.45%     -0.06%  [kernel.vmlinux]         [k] mas_find
> >      0.38%     -0.06%  [kernel.vmlinux]         [k] mas_wr_store_type
> >      0.23%     -0.06%  [kernel.vmlinux]         [k] do_vmi_align_munmap
> >      0.21%     -0.05%  [kernel.vmlinux]         [k] perf_event_mmap_event
> >      0.32%     -0.05%  [kernel.vmlinux]         [k] entry_SYSRETQ_unsafe_stack
> >      0.19%     -0.05%  [kernel.vmlinux]         [k] downgrade_write
> >      0.59%     -0.05%  [kernel.vmlinux]         [k] mas_next_slot
> >      0.31%     -0.05%  [kernel.vmlinux]         [k] __mmap_new_vma
> >      0.44%     -0.05%  [kernel.vmlinux]         [k] kmem_cache_alloc_noprof
> >      0.28%     -0.05%  [kernel.vmlinux]         [k] __vma_enter_locked
> >      0.41%     -0.05%  [kernel.vmlinux]         [k] memcpy
> >      0.48%     -0.04%  [kernel.vmlinux]         [k] mas_store_gfp
> >      0.14%     +0.04%  [kernel.vmlinux]         [k] __put_partials
> >      0.19%     -0.04%  [kernel.vmlinux]         [k] mas_empty_area_rev
> >      0.30%     -0.04%  [kernel.vmlinux]         [k] do_syscall_64
> >      0.25%     -0.04%  [kernel.vmlinux]         [k] mas_preallocate
> >      0.15%     -0.04%  [kernel.vmlinux]         [k] rcu_free_sheaf
> >      0.22%     -0.04%  [kernel.vmlinux]         [k] entry_SYSCALL_64
> >      0.49%     -0.04%  libc.so.6                [.] __munmap
> >      0.91%     -0.04%  [kernel.vmlinux]         [k] rcu_all_qs
> >      0.21%     -0.04%  [kernel.vmlinux]         [k] __vm_munmap
> >      0.24%     -0.04%  [kernel.vmlinux]         [k] mas_store_prealloc
> >      0.19%     -0.04%  [kernel.vmlinux]         [k] __kmalloc_cache_noprof
> >      0.34%     -0.04%  [kernel.vmlinux]         [k] build_detached_freelist
> >      0.19%     -0.03%  [kernel.vmlinux]         [k] vms_complete_munmap_vmas
> >      0.36%     -0.03%  [kernel.vmlinux]         [k] mas_rev_awalk
> >      0.05%     -0.03%  [kernel.vmlinux]         [k] shuffle_freelist
> >      0.19%     -0.03%  [kernel.vmlinux]         [k] down_write_killable
> >      0.19%     -0.03%  [kernel.vmlinux]         [k] kmem_cache_free
> >      0.27%     -0.03%  [kernel.vmlinux]         [k] up_write
> >      0.13%     -0.03%  [kernel.vmlinux]         [k] vm_area_alloc
> >      0.18%     -0.03%  [kernel.vmlinux]         [k] arch_get_unmapped_area_topdown
> >      0.08%     -0.03%  [kernel.vmlinux]         [k] userfaultfd_unmap_complete
> >      0.10%     -0.03%  [kernel.vmlinux]         [k] tlb_gather_mmu
> >      0.30%     -0.02%  [kernel.vmlinux]         [k] ___slab_alloc
> > 
> > I think the insteresting item is "get_partial_node". It seems this fix
> > makes "get_partial_node" slightly more frequent. HMM, however, I still
> > can't figure out why this is happening. Do you have any thoughts on it?
> 
> I'm not sure if it's statistically significant or just noise, +0.09% could
> be noise?

small number does't always mean it's noise. When perf samples get_partial_node
on the spin lock call chain, its subroutines (spin lock) are hotter, so
the proportion of subroutine execution is higher. If the function -
get_partial_node itself (excluding subroutines) executes very quickly,
the proportion is lower.

I also expend the perf data with call chain:

* w/o fix:

We can calculate the proportion of spin locks introduced by get_partial_node
is: 31.05% / 49.91% = 62.21%

    49.91%  mmap2_processes  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
            |
             --49.91%--native_queued_spin_lock_slowpath
                       |
                        --49.91%--_raw_spin_lock_irqsave
                                  |
                                  |--31.05%--get_partial_node
                                  |          |
                                  |          |--23.66%--get_any_partial
                                  |          |          ___slab_alloc
                                  |          |
                                  |           --7.40%--___slab_alloc
                                  |                     __kmem_cache_alloc_bulk
                                  |
                                  |--10.84%--barn_get_empty_sheaf
                                  |          |
                                  |          |--6.18%--__kfree_rcu_sheaf
                                  |          |          kvfree_call_rcu
                                  |          |
                                  |           --4.66%--__pcs_replace_empty_main
                                  |                     kmem_cache_alloc_noprof
                                  |
                                  |--5.10%--barn_put_empty_sheaf
                                  |          |
                                  |           --5.09%--__pcs_replace_empty_main
                                  |                     kmem_cache_alloc_noprof
                                  |
                                  |--2.01%--barn_replace_empty_sheaf
                                  |          __pcs_replace_empty_main
                                  |          kmem_cache_alloc_noprof
                                  |
                                   --0.78%--__put_partials
                                             |
                                              --0.78%--__kmem_cache_free_bulk.part.0
                                                        rcu_free_sheaf


* with fix:

Similarly, the proportion of spin locks introduced by get_partial_node
is: 39.91% / 42.82% = 93.20%

    42.82%  mmap2_processes  [kernel.vmlinux]         [k] native_queued_spin_lock_slowpath
            |
            ---native_queued_spin_lock_slowpath
               |
                --42.82%--_raw_spin_lock_irqsave
                          |
                          |--39.91%--get_partial_node
                          |          |
                          |          |--28.25%--get_any_partial
                          |          |          ___slab_alloc
                          |          |
                          |           --11.66%--___slab_alloc
                          |                     __kmem_cache_alloc_bulk
                          |
                          |--1.09%--barn_get_empty_sheaf
                          |          |
                          |           --0.90%--__kfree_rcu_sheaf
                          |                     kvfree_call_rcu
                          |
                          |--0.96%--barn_replace_empty_sheaf
                          |          __pcs_replace_empty_main
                          |          kmem_cache_alloc_noprof
                          |
                           --0.77%--__put_partials
                                     __kmem_cache_free_bulk.part.0
                                     rcu_free_sheaf


So, 62.21% -> 93.20% could reflect that get_partial_node contribute more
overhead at this point.

> > So, I'd like to know if you think dynamically or adaptively adjusting
> > capacity is a worthwhile idea.
> 
> In the followup series, there will be automatically determined capacity to
> roughly match the current capacity of cpu partial slabs:
> 
> https://lore.kernel.org/all/20260112-sheaves-for-all-v2-4-98225cfb50cf@suse.cz/
> 
> We can use that as starting point for further tuning. But I suspect making
> it adjust dynamically would be complicated.

Thanks, will continue to evaluate this series.

Regards,
Zhao

next prev parent reply	other threads:[~2026-01-16  8:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  0:26 Hao Li
2025-12-15 14:30 ` Vlastimil Babka
2025-12-16  2:34   ` Hao Lee
2025-12-22 10:20   ` Harry Yoo
2026-01-05 15:58     ` Vlastimil Babka
2026-01-15 10:12   ` Zhao Liu
2026-01-15 16:19     ` Vlastimil Babka
2026-01-16  9:07       ` Zhao Liu [this message]
2026-01-16  9:11         ` Hao Li
2026-01-16  4:06     ` Hao Li
2026-01-16  9:16       ` Zhao Liu
2026-01-16  9:09         ` Hao Li
2026-01-19  6:07     ` Hao Li
2026-01-20  8:21       ` Zhao Liu
2026-01-21  3:15         ` Hao Li
2026-01-21 13:17           ` Zhao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWn/0mn93MmUvTPY@intel.com \
    --to=zhao1.liu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=haolee.swjtu@gmail.com \
    --cc=harry.yoo@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=tim.c.chen@intel.com \
    --cc=vbabka@suse.cz \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox