* [PATCH v2 0/3] SLUB: improve filling cpu partial a bit in get_partial_node()
@ 2024-04-04 5:58 xiongwei.song
2024-04-04 5:58 ` [PATCH v2 1/3] mm/slub: remove the check of !kmem_cache_has_cpu_partial() xiongwei.song
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: xiongwei.song @ 2024-04-04 5:58 UTC (permalink / raw)
To: vbabka, rientjes, cl, penberg, iamjoonsoo.kim, akpm,
roman.gushchin, 42.hyeyoo
Cc: linux-mm, linux-kernel, chengming.zhou
From: Xiongwei Song <xiongwei.song@windriver.com>
This series is to remove the unnecessary check for filling cpu partial
and improve the readability.
Introduce slub_get_cpu_partial() and dummy function to remove #ifdef of
CONFIG_SLUB_CPU_PARTIAL. Please check patch 2 and patch 3.
Also add the comparison in patch 3 for when breaking from filling cpu
partial loop with ">" or ">=". For more details, please check the
patch 3 inside.
No functionality changed.
Changes in v2:
- Refine the commit message(from Vlastimil Babka's comments).
- Refine the check conditions of filling cpu partial(as Vlastimil Babka
suggested).
- drop patch 4 of v1.
v1:
- https://lore.kernel.org/lkml/20240331021926.2732572-4-xiongwei.song@windriver.com/T/
Before v1:
- Actually, the series is the improvement of patch below:
https://lore.kernel.org/lkml/934f65c6-4d97-6c4d-b123-4937ede24a99@google.com/T/
Regards,
Xiongwei
Xiongwei Song (3):
mm/slub: remove the check of !kmem_cache_has_cpu_partial()
mm/slub: add slub_get_cpu_partial() helper
mm/slub: simplify get_partial_node()
mm/slub.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v2 1/3] mm/slub: remove the check of !kmem_cache_has_cpu_partial() 2024-04-04 5:58 [PATCH v2 0/3] SLUB: improve filling cpu partial a bit in get_partial_node() xiongwei.song @ 2024-04-04 5:58 ` xiongwei.song 2024-04-04 5:58 ` [PATCH v2 2/3] mm/slub: add slub_get_cpu_partial() helper xiongwei.song 2024-04-04 5:58 ` [PATCH v2 3/3] mm/slub: simplify get_partial_node() xiongwei.song 2 siblings, 0 replies; 6+ messages in thread From: xiongwei.song @ 2024-04-04 5:58 UTC (permalink / raw) To: vbabka, rientjes, cl, penberg, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo Cc: linux-mm, linux-kernel, chengming.zhou From: Xiongwei Song <xiongwei.song@windriver.com> The check of !kmem_cache_has_cpu_partial(s) with CONFIG_SLUB_CPU_PARTIAL enabled here is always false. We have already checked kmem_cache_debug() earlier and if it was true, then we either continued or broke from the loop so we can't reach this code in that case and don't need to check kmem_cache_debug() as part of kmem_cache_has_cpu_partial() again. Here we can remove it. Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> --- mm/slub.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 1bb2a93cf7b6..059922044a4f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2610,8 +2610,7 @@ static struct slab *get_partial_node(struct kmem_cache *s, partial_slabs++; } #ifdef CONFIG_SLUB_CPU_PARTIAL - if (!kmem_cache_has_cpu_partial(s) - || partial_slabs > s->cpu_partial_slabs / 2) + if (partial_slabs > s->cpu_partial_slabs / 2) break; #else break; -- 2.34.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/3] mm/slub: add slub_get_cpu_partial() helper 2024-04-04 5:58 [PATCH v2 0/3] SLUB: improve filling cpu partial a bit in get_partial_node() xiongwei.song 2024-04-04 5:58 ` [PATCH v2 1/3] mm/slub: remove the check of !kmem_cache_has_cpu_partial() xiongwei.song @ 2024-04-04 5:58 ` xiongwei.song 2024-04-04 5:58 ` [PATCH v2 3/3] mm/slub: simplify get_partial_node() xiongwei.song 2 siblings, 0 replies; 6+ messages in thread From: xiongwei.song @ 2024-04-04 5:58 UTC (permalink / raw) To: vbabka, rientjes, cl, penberg, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo Cc: linux-mm, linux-kernel, chengming.zhou From: Xiongwei Song <xiongwei.song@windriver.com> Add slub_get_cpu_partial() and dummy function to help improve get_partial_node(). It can help remove #ifdef of CONFIG_SLUB_CPU_PARTIAL and improve filling cpu partial logic. Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> --- mm/slub.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 059922044a4f..590cc953895d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -604,11 +604,21 @@ static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) nr_slabs = DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo)); s->cpu_partial_slabs = nr_slabs; } + +static inline unsigned int slub_get_cpu_partial(struct kmem_cache *s) +{ + return s->cpu_partial_slabs; +} #else static inline void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects) { } + +static inline unsigned int slub_get_cpu_partial(struct kmem_cache *s) +{ + return 0; +} #endif /* CONFIG_SLUB_CPU_PARTIAL */ /* -- 2.34.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] mm/slub: simplify get_partial_node() 2024-04-04 5:58 [PATCH v2 0/3] SLUB: improve filling cpu partial a bit in get_partial_node() xiongwei.song 2024-04-04 5:58 ` [PATCH v2 1/3] mm/slub: remove the check of !kmem_cache_has_cpu_partial() xiongwei.song 2024-04-04 5:58 ` [PATCH v2 2/3] mm/slub: add slub_get_cpu_partial() helper xiongwei.song @ 2024-04-04 5:58 ` xiongwei.song 2024-04-04 9:26 ` Vlastimil Babka 2 siblings, 1 reply; 6+ messages in thread From: xiongwei.song @ 2024-04-04 5:58 UTC (permalink / raw) To: vbabka, rientjes, cl, penberg, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo Cc: linux-mm, linux-kernel, chengming.zhou From: Xiongwei Song <xiongwei.song@windriver.com> The break conditions for filling cpu partial can be more readable and simple. If slub_get_cpu_partial() returns 0, we can confirm that we don't need to fill cpu partial, then we should break from the loop. On the other hand, we also should break from the loop if we have added enough cpu partial slabs. Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird corner case that if we set cpu_partial_slabs to 0 from sysfs, we still allocate at least one here. Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> --- The measurement below is to compare the performance effects when checking if we need to break from the filling cpu partial loop with the following either-or condition: Condition 1: When the count of added cpu slabs is greater than cpu_partial_slabs/2: (partial_slabs > slub_get_cpu_partial(s) / 2) Condition 2: When the count of added cpu slabs is greater than or equal to cpu_partial_slabs/2: (partial_slabs >= slub_get_cpu_partial(s) / 2) The change of breaking condition can effect how many cpu partial slabs would be put on the cpu partial list. Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with 16 cores. The OS is Ubuntu 22.04. hackbench-process-pipes 6.9-rc2(with ">") 6.9.0-rc2(with ">=") Amean 1 0.0373 ( 0.00%) 0.0356 * 4.60%* Amean 4 0.0984 ( 0.00%) 0.1014 * -3.05%* Amean 7 0.1803 ( 0.00%) 0.1851 * -2.69%* Amean 12 0.2947 ( 0.00%) 0.3141 * -6.59%* Amean 21 0.4577 ( 0.00%) 0.4927 * -7.65%* Amean 30 0.6326 ( 0.00%) 0.6649 * -5.10%* Amean 48 0.9396 ( 0.00%) 0.9884 * -5.20%* Amean 64 1.2321 ( 0.00%) 1.3004 * -5.54%* hackbench-process-sockets 6.9-rc2(with ">") 6.9.0-rc2(with ">=") Amean 1 0.0609 ( 0.00%) 0.0623 * -2.35%* Amean 4 0.2107 ( 0.00%) 0.2140 * -1.56%* Amean 7 0.3754 ( 0.00%) 0.3966 * -5.63%* Amean 12 0.6456 ( 0.00%) 0.6734 * -4.32%* Amean 21 1.1440 ( 0.00%) 1.1769 * -2.87%* Amean 30 1.6629 ( 0.00%) 1.7031 * -2.42%* Amean 48 2.7321 ( 0.00%) 2.7897 * -2.11%* Amean 64 3.7397 ( 0.00%) 3.7640 * -0.65%* It seems there is a bit performance penalty when using ">=" to break up the loop. Hence, we should still use ">" here. --- mm/slub.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 590cc953895d..6beff3b1e22c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct kmem_cache *s, stat(s, CPU_PARTIAL_NODE); partial_slabs++; } -#ifdef CONFIG_SLUB_CPU_PARTIAL - if (partial_slabs > s->cpu_partial_slabs / 2) - break; -#else - break; -#endif + if ((slub_get_cpu_partial(s) == 0) || + (partial_slabs > slub_get_cpu_partial(s) / 2)) + break; } spin_unlock_irqrestore(&n->list_lock, flags); return partial; -- 2.34.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 3/3] mm/slub: simplify get_partial_node() 2024-04-04 5:58 ` [PATCH v2 3/3] mm/slub: simplify get_partial_node() xiongwei.song @ 2024-04-04 9:26 ` Vlastimil Babka 2024-04-07 1:47 ` Song, Xiongwei 0 siblings, 1 reply; 6+ messages in thread From: Vlastimil Babka @ 2024-04-04 9:26 UTC (permalink / raw) To: xiongwei.song, rientjes, cl, penberg, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo Cc: linux-mm, linux-kernel, chengming.zhou On 4/4/24 7:58 AM, xiongwei.song@windriver.com wrote: > From: Xiongwei Song <xiongwei.song@windriver.com> > > The break conditions for filling cpu partial can be more readable and > simple. > > If slub_get_cpu_partial() returns 0, we can confirm that we don't need > to fill cpu partial, then we should break from the loop. On the other > hand, we also should break from the loop if we have added enough cpu > partial slabs. > > Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird > corner case that if we set cpu_partial_slabs to 0 from sysfs, we still > allocate at least one here. > > Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> > --- > > The measurement below is to compare the performance effects when checking > if we need to break from the filling cpu partial loop with the following > either-or condition: > > Condition 1: > When the count of added cpu slabs is greater than cpu_partial_slabs/2: > (partial_slabs > slub_get_cpu_partial(s) / 2) > > Condition 2: > When the count of added cpu slabs is greater than or equal to > cpu_partial_slabs/2: > (partial_slabs >= slub_get_cpu_partial(s) / 2) > > The change of breaking condition can effect how many cpu partial slabs > would be put on the cpu partial list. > > Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with > 16 cores. The OS is Ubuntu 22.04. > > hackbench-process-pipes > 6.9-rc2(with ">") 6.9.0-rc2(with ">=") > Amean 1 0.0373 ( 0.00%) 0.0356 * 4.60%* > Amean 4 0.0984 ( 0.00%) 0.1014 * -3.05%* > Amean 7 0.1803 ( 0.00%) 0.1851 * -2.69%* > Amean 12 0.2947 ( 0.00%) 0.3141 * -6.59%* > Amean 21 0.4577 ( 0.00%) 0.4927 * -7.65%* > Amean 30 0.6326 ( 0.00%) 0.6649 * -5.10%* > Amean 48 0.9396 ( 0.00%) 0.9884 * -5.20%* > Amean 64 1.2321 ( 0.00%) 1.3004 * -5.54%* > > hackbench-process-sockets > 6.9-rc2(with ">") 6.9.0-rc2(with ">=") > Amean 1 0.0609 ( 0.00%) 0.0623 * -2.35%* > Amean 4 0.2107 ( 0.00%) 0.2140 * -1.56%* > Amean 7 0.3754 ( 0.00%) 0.3966 * -5.63%* > Amean 12 0.6456 ( 0.00%) 0.6734 * -4.32%* > Amean 21 1.1440 ( 0.00%) 1.1769 * -2.87%* > Amean 30 1.6629 ( 0.00%) 1.7031 * -2.42%* > Amean 48 2.7321 ( 0.00%) 2.7897 * -2.11%* > Amean 64 3.7397 ( 0.00%) 3.7640 * -0.65%* > > It seems there is a bit performance penalty when using ">=" to break up > the loop. Hence, we should still use ">" here. Thanks for evaluating that, I suspected that would be the case so we should not change that performance aspect as part of a cleanup. > --- > mm/slub.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 590cc953895d..6beff3b1e22c 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct kmem_cache *s, > stat(s, CPU_PARTIAL_NODE); > partial_slabs++; > } > -#ifdef CONFIG_SLUB_CPU_PARTIAL > - if (partial_slabs > s->cpu_partial_slabs / 2) > - break; > -#else > - break; > -#endif > > + if ((slub_get_cpu_partial(s) == 0) || > + (partial_slabs > slub_get_cpu_partial(s) / 2)) > + break; > } > spin_unlock_irqrestore(&n->list_lock, flags); > return partial; After looking at the result and your v1 again, I arrived at this modification that incorporates the core v1 idea without reintroducing kmem_cache_has_cpu_partial(). The modified patch looks like below. Is it OK with you? Pushed the whole series with this modification to slab/for-next for now. ----8<----- --- a/mm/slub.c +++ b/mm/slub.c @@ -2614,18 +2614,17 @@ static struct slab *get_partial_node(struct kmem_cache *s, if (!partial) { partial = slab; stat(s, ALLOC_FROM_PARTIAL); + if ((slub_get_cpu_partial(s) == 0)) { + break; + } } else { put_cpu_partial(s, slab, 0); stat(s, CPU_PARTIAL_NODE); - partial_slabs++; - } -#ifdef CONFIG_SLUB_CPU_PARTIAL - if (partial_slabs > s->cpu_partial_slabs / 2) - break; -#else - break; -#endif + if (++partial_slabs > slub_get_cpu_partial(s) / 2) { + break; + } + } } spin_unlock_irqrestore(&n->list_lock, flags); return partial; ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH v2 3/3] mm/slub: simplify get_partial_node() 2024-04-04 9:26 ` Vlastimil Babka @ 2024-04-07 1:47 ` Song, Xiongwei 0 siblings, 0 replies; 6+ messages in thread From: Song, Xiongwei @ 2024-04-07 1:47 UTC (permalink / raw) To: Vlastimil Babka, rientjes, cl, penberg, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo Cc: linux-mm, linux-kernel, chengming.zhou > On 4/4/24 7:58 AM, xiongwei.song@windriver.com wrote: > > From: Xiongwei Song <xiongwei.song@windriver.com> > > > > The break conditions for filling cpu partial can be more readable and > > simple. > > > > If slub_get_cpu_partial() returns 0, we can confirm that we don't need > > to fill cpu partial, then we should break from the loop. On the other > > hand, we also should break from the loop if we have added enough cpu > > partial slabs. > > > > Meanwhile, the logic above gets rid of the #ifdef and also fixes a weird > > corner case that if we set cpu_partial_slabs to 0 from sysfs, we still > > allocate at least one here. > > > > Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> > > --- > > > > The measurement below is to compare the performance effects when > checking > > if we need to break from the filling cpu partial loop with the following > > either-or condition: > > > > Condition 1: > > When the count of added cpu slabs is greater than cpu_partial_slabs/2: > > (partial_slabs > slub_get_cpu_partial(s) / 2) > > > > Condition 2: > > When the count of added cpu slabs is greater than or equal to > > cpu_partial_slabs/2: > > (partial_slabs >= slub_get_cpu_partial(s) / 2) > > > > The change of breaking condition can effect how many cpu partial slabs > > would be put on the cpu partial list. > > > > Run the test with a "Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz" cpu with > > 16 cores. The OS is Ubuntu 22.04. > > > > hackbench-process-pipes > > 6.9-rc2(with ">") 6.9.0-rc2(with ">=") > > Amean 1 0.0373 ( 0.00%) 0.0356 * 4.60%* > > Amean 4 0.0984 ( 0.00%) 0.1014 * -3.05%* > > Amean 7 0.1803 ( 0.00%) 0.1851 * -2.69%* > > Amean 12 0.2947 ( 0.00%) 0.3141 * -6.59%* > > Amean 21 0.4577 ( 0.00%) 0.4927 * -7.65%* > > Amean 30 0.6326 ( 0.00%) 0.6649 * -5.10%* > > Amean 48 0.9396 ( 0.00%) 0.9884 * -5.20%* > > Amean 64 1.2321 ( 0.00%) 1.3004 * -5.54%* > > > > hackbench-process-sockets > > 6.9-rc2(with ">") 6.9.0-rc2(with ">=") > > Amean 1 0.0609 ( 0.00%) 0.0623 * -2.35%* > > Amean 4 0.2107 ( 0.00%) 0.2140 * -1.56%* > > Amean 7 0.3754 ( 0.00%) 0.3966 * -5.63%* > > Amean 12 0.6456 ( 0.00%) 0.6734 * -4.32%* > > Amean 21 1.1440 ( 0.00%) 1.1769 * -2.87%* > > Amean 30 1.6629 ( 0.00%) 1.7031 * -2.42%* > > Amean 48 2.7321 ( 0.00%) 2.7897 * -2.11%* > > Amean 64 3.7397 ( 0.00%) 3.7640 * -0.65%* > > > > It seems there is a bit performance penalty when using ">=" to break up > > the loop. Hence, we should still use ">" here. > > Thanks for evaluating that, I suspected that would be the case so we should > not change that performance aspect as part of a cleanup. > > > --- > > mm/slub.c | 9 +++------ > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 590cc953895d..6beff3b1e22c 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2619,13 +2619,10 @@ static struct slab *get_partial_node(struct > kmem_cache *s, > > stat(s, CPU_PARTIAL_NODE); > > partial_slabs++; > > } > > -#ifdef CONFIG_SLUB_CPU_PARTIAL > > - if (partial_slabs > s->cpu_partial_slabs / 2) > > - break; > > -#else > > - break; > > -#endif > > > > + if ((slub_get_cpu_partial(s) == 0) || > > + (partial_slabs > slub_get_cpu_partial(s) / 2)) > > + break; > > } > > spin_unlock_irqrestore(&n->list_lock, flags); > > return partial; > > After looking at the result and your v1 again, I arrived at this > modification that incorporates the core v1 idea without reintroducing > kmem_cache_has_cpu_partial(). The modified patch looks like below. Is it OK > with you? Pushed the whole series with this modification to slab/for-next > for now. Sorry for the late response, I was on vacation. I'm ok with the patch below. Thanks, Xiongwei > > ----8<----- > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2614,18 +2614,17 @@ static struct slab *get_partial_node(struct > kmem_cache *s, > if (!partial) { > partial = slab; > stat(s, ALLOC_FROM_PARTIAL); > + if ((slub_get_cpu_partial(s) == 0)) { > + break; > + } > } else { > put_cpu_partial(s, slab, 0); > stat(s, CPU_PARTIAL_NODE); > - partial_slabs++; > - } > -#ifdef CONFIG_SLUB_CPU_PARTIAL > - if (partial_slabs > s->cpu_partial_slabs / 2) > - break; > -#else > - break; > -#endif > > + if (++partial_slabs > slub_get_cpu_partial(s) / 2) { > + break; > + } > + } > } > spin_unlock_irqrestore(&n->list_lock, flags); > return partial; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-04-07 1:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-04 5:58 [PATCH v2 0/3] SLUB: improve filling cpu partial a bit in get_partial_node() xiongwei.song 2024-04-04 5:58 ` [PATCH v2 1/3] mm/slub: remove the check of !kmem_cache_has_cpu_partial() xiongwei.song 2024-04-04 5:58 ` [PATCH v2 2/3] mm/slub: add slub_get_cpu_partial() helper xiongwei.song 2024-04-04 5:58 ` [PATCH v2 3/3] mm/slub: simplify get_partial_node() xiongwei.song 2024-04-04 9:26 ` Vlastimil Babka 2024-04-07 1:47 ` Song, Xiongwei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox