* [PATCH] mm, slub: prefetch freelist in ___slab_alloc()
@ 2024-08-19 7:02 Yongqiang Liu
2024-08-19 9:33 ` Hyeonggon Yoo
0 siblings, 1 reply; 5+ messages in thread
From: Yongqiang Liu @ 2024-08-19 7:02 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, zhangxiaoxu5, cl, wangkefeng.wang, penberg,
rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo
commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
slab_alloc()") introduced prefetch_freepointer() for fastpath
allocation. Use it at the freelist firt load could have a bit
improvement in some workloads. Here is hackbench results at
arm64 machine(about 3.8%):
Before:
average time cost of 'hackbench -g 100 -l 1000': 17.068
Afther:
average time cost of 'hackbench -g 100 -l 1000': 16.416
There is also having about 5% improvement at x86_64 machine
for hackbench.
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
---
mm/slub.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/slub.c b/mm/slub.c
index c9d8a2497fd6..f9daaff10c6a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
VM_BUG_ON(!c->slab->frozen);
c->freelist = get_freepointer(s, freelist);
c->tid = next_tid(c->tid);
+ prefetch_freepointer(s, c->freelist);
local_unlock_irqrestore(&s->cpu_slab->lock, flags);
return freelist;
--
2.25.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, slub: prefetch freelist in ___slab_alloc()
2024-08-19 7:02 [PATCH] mm, slub: prefetch freelist in ___slab_alloc() Yongqiang Liu
@ 2024-08-19 9:33 ` Hyeonggon Yoo
2024-08-21 6:58 ` Yongqiang Liu
0 siblings, 1 reply; 5+ messages in thread
From: Hyeonggon Yoo @ 2024-08-19 9:33 UTC (permalink / raw)
To: Yongqiang Liu
Cc: linux-mm, linux-kernel, zhangxiaoxu5, cl, wangkefeng.wang,
penberg, rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin
On Mon, Aug 19, 2024 at 4:02 PM Yongqiang Liu <liuyongqiang13@huawei.com> wrote:
>
> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
> slab_alloc()") introduced prefetch_freepointer() for fastpath
> allocation. Use it at the freelist firt load could have a bit
> improvement in some workloads. Here is hackbench results at
> arm64 machine(about 3.8%):
>
> Before:
> average time cost of 'hackbench -g 100 -l 1000': 17.068
>
> Afther:
> average time cost of 'hackbench -g 100 -l 1000': 16.416
>
> There is also having about 5% improvement at x86_64 machine
> for hackbench.
I think adding more prefetch might not be a good idea unless we have
more real-world data supporting it because prefetch might help when slab
is frequently used, but it will end up unnecessarily using more cache
lines when slab is not frequently used.
Also I don't understand how adding prefetch in slowpath affects the performance
because most allocs/frees should be done in the fastpath. Could you
please explain?
> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> ---
> mm/slub.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..f9daaff10c6a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> VM_BUG_ON(!c->slab->frozen);
> c->freelist = get_freepointer(s, freelist);
> c->tid = next_tid(c->tid);
> + prefetch_freepointer(s, c->freelist);
> local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> return freelist;
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, slub: prefetch freelist in ___slab_alloc()
2024-08-19 9:33 ` Hyeonggon Yoo
@ 2024-08-21 6:58 ` Yongqiang Liu
2024-09-14 13:45 ` Hyeonggon Yoo
0 siblings, 1 reply; 5+ messages in thread
From: Yongqiang Liu @ 2024-08-21 6:58 UTC (permalink / raw)
To: Hyeonggon Yoo
Cc: linux-mm, linux-kernel, zhangxiaoxu5, cl, wangkefeng.wang,
penberg, rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin
在 2024/8/19 17:33, Hyeonggon Yoo 写道:
> On Mon, Aug 19, 2024 at 4:02 PM Yongqiang Liu <liuyongqiang13@huawei.com> wrote:
>> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
>> slab_alloc()") introduced prefetch_freepointer() for fastpath
>> allocation. Use it at the freelist firt load could have a bit
>> improvement in some workloads. Here is hackbench results at
>> arm64 machine(about 3.8%):
>>
>> Before:
>> average time cost of 'hackbench -g 100 -l 1000': 17.068
>>
>> Afther:
>> average time cost of 'hackbench -g 100 -l 1000': 16.416
>>
>> There is also having about 5% improvement at x86_64 machine
>> for hackbench.
> I think adding more prefetch might not be a good idea unless we have
> more real-world data supporting it because prefetch might help when slab
> is frequently used, but it will end up unnecessarily using more cache
> lines when slab is not frequently used.
Yes, prefetching unnecessary objects is a bad idea. But I think the slab
entered
in slowpath that means it will more likely need more objects. I've
tested the
cases from commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
slab_alloc()"). Here is the result:
Before:
Performance counter stats for './hackbench 50 process 4000' (32 runs):
2545.28 msec task-clock # 6.938 CPUs
utilized ( +- 1.75% )
6166 context-switches # 0.002
M/sec ( +- 1.58% )
1129 cpu-migrations # 0.444
K/sec ( +- 2.16% )
13298 page-faults # 0.005
M/sec ( +- 0.38% )
4435113150 cycles # 1.742
GHz ( +- 1.22% )
2259717630 instructions # 0.51 insn per
cycle ( +- 0.05% )
385847392 branches # 151.593
M/sec ( +- 0.06% )
6205369 branch-misses # 1.61% of all
branches ( +- 0.56% )
0.36688 +- 0.00595 seconds time elapsed ( +- 1.62% )
After:
Performance counter stats for './hackbench 50 process 4000' (32 runs):
2277.61 msec task-clock # 6.855 CPUs
utilized ( +- 0.98% )
5653 context-switches # 0.002
M/sec ( +- 1.62% )
1081 cpu-migrations # 0.475
K/sec ( +- 1.89% )
13217 page-faults # 0.006
M/sec ( +- 0.48% )
3751509945 cycles # 1.647
GHz ( +- 1.14% )
2253177626 instructions # 0.60 insn per
cycle ( +- 0.06% )
384509166 branches # 168.821
M/sec ( +- 0.07% )
6045031 branch-misses # 1.57% of all
branches ( +- 0.58% )
0.33225 +- 0.00321 seconds time elapsed ( +- 0.97% )
>
> Also I don't understand how adding prefetch in slowpath affects the performance
> because most allocs/frees should be done in the fastpath. Could you
> please explain?
By adding some debug info to count the slowpath for the hackbench:
'hackbench -g 100 -l 1000' slab alloc total: 80416886, and the slowpath:
7184236.
About 9% slowpath in total allocation. The perf stats in arm64 as follow:
Before:
Performance counter stats for './hackbench -g 100 -l 1000' (32 runs):
34766611220 branches ( +- 0.01% )
382593804 branch-misses # 1.10% of all
branches ( +- 0.14% )
1120091414 cache-misses ( +- 0.08% )
76810485402 L1-dcache-loads ( +- 0.03% )
1120091414 L1-dcache-load-misses # 1.46% of all
L1-dcache hits ( +- 0.08% )
23.8854 +- 0.0804 seconds time elapsed ( +- 0.34% )
After:
Performance counter stats for './hackbench -g 100 -l 1000' (32 runs):
34812735277 branches ( +- 0.01% )
393449644 branch-misses # 1.13% of all
branches ( +- 0.15% )
1095185949 cache-misses ( +- 0.15% )
76995789602 L1-dcache-loads ( +- 0.03% )
1095185949 L1-dcache-load-misses # 1.42% of all
L1-dcache hits ( +- 0.15% )
23.341 +- 0.104 seconds time elapsed ( +- 0.45% )
It seems having less L1-dcache-load-misses.
>
>> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>> ---
>> mm/slub.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index c9d8a2497fd6..f9daaff10c6a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> VM_BUG_ON(!c->slab->frozen);
>> c->freelist = get_freepointer(s, freelist);
>> c->tid = next_tid(c->tid);
>> + prefetch_freepointer(s, c->freelist);
>> local_unlock_irqrestore(&s->cpu_slab->lock, flags);
>> return freelist;
>>
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, slub: prefetch freelist in ___slab_alloc()
2024-08-21 6:58 ` Yongqiang Liu
@ 2024-09-14 13:45 ` Hyeonggon Yoo
2024-10-02 11:57 ` Vlastimil Babka
0 siblings, 1 reply; 5+ messages in thread
From: Hyeonggon Yoo @ 2024-09-14 13:45 UTC (permalink / raw)
To: Yongqiang Liu
Cc: linux-mm, linux-kernel, zhangxiaoxu5, cl, wangkefeng.wang,
penberg, rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin
On Wed, Aug 21, 2024 at 3:58 PM Yongqiang Liu <liuyongqiang13@huawei.com> wrote:
>
>
> 在 2024/8/19 17:33, Hyeonggon Yoo 写道:
> > On Mon, Aug 19, 2024 at 4:02 PM Yongqiang Liu <liuyongqiang13@huawei.com> wrote:
> >> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
> >> slab_alloc()") introduced prefetch_freepointer() for fastpath
> >> allocation. Use it at the freelist firt load could have a bit
> >> improvement in some workloads. Here is hackbench results at
> >> arm64 machine(about 3.8%):
> >>
> >> Before:
> >> average time cost of 'hackbench -g 100 -l 1000': 17.068
> >>
> >> Afther:
> >> average time cost of 'hackbench -g 100 -l 1000': 16.416
> >>
> >> There is also having about 5% improvement at x86_64 machine
> >> for hackbench.
> > I think adding more prefetch might not be a good idea unless we have
> > more real-world data supporting it because prefetch might help when slab
> > is frequently used, but it will end up unnecessarily using more cache
> > lines when slab is not frequently used.
>
Hi,
sorry for the late reply.
Thanks for explaining how it impacts hackbench, even when prefetch is
added in the slow path.
However, I still think the main issue is that hackbench is too
synthetic to make a strong argument that
prefetch in the slow path would help in most real-world scenarios.
> Yes, prefetching unnecessary objects is a bad idea. But I think the slab
> entered
>
> in slowpath that means it will more likely need more objects.
The fast path is hit when an object can be allocated from the CPU slab
without much work,
and the slow path is hit when it can’t. This doesn't give any
indication about future allocations.
To be honest, I'm not even sure if prefetching in the fast path really
helps if slab is not frequently called.
Just because it hits the fast path or slow path doesn’t necessarily
mean more objects will be needed in the future.
And then, I don't think "Prefetch some data because we might need it
in the future" is not a good argument
because if we don't need it, it just wastes a cache line. If it helps
in some cases but hurts in other cases,
is not a net gain.
I might be wrong. If I am, please prove me wrong and convince me and others.
Best,
Hyeonggon
> I've tested the cases from commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
> slab_alloc()"). Here is the result:
> Before:
>
> Performance counter stats for './hackbench 50 process 4000' (32 runs):
>
> 2545.28 msec task-clock # 6.938 CPUs
> utilized ( +- 1.75% )
> 6166 context-switches # 0.002
> M/sec ( +- 1.58% )
> 1129 cpu-migrations # 0.444
> K/sec ( +- 2.16% )
> 13298 page-faults # 0.005
> M/sec ( +- 0.38% )
> 4435113150 cycles # 1.742
> GHz ( +- 1.22% )
> 2259717630 instructions # 0.51 insn per
> cycle ( +- 0.05% )
> 385847392 branches # 151.593
> M/sec ( +- 0.06% )
> 6205369 branch-misses # 1.61% of all
> branches ( +- 0.56% )
>
> 0.36688 +- 0.00595 seconds time elapsed ( +- 1.62% )
> After:
>
> Performance counter stats for './hackbench 50 process 4000' (32 runs):
>
> 2277.61 msec task-clock # 6.855 CPUs
> utilized ( +- 0.98% )
> 5653 context-switches # 0.002
> M/sec ( +- 1.62% )
> 1081 cpu-migrations # 0.475
> K/sec ( +- 1.89% )
> 13217 page-faults # 0.006
> M/sec ( +- 0.48% )
> 3751509945 cycles # 1.647
> GHz ( +- 1.14% )
> 2253177626 instructions # 0.60 insn per
> cycle ( +- 0.06% )
> 384509166 branches # 168.821
> M/sec ( +- 0.07% )
> 6045031 branch-misses # 1.57% of all
> branches ( +- 0.58% )
>
> 0.33225 +- 0.00321 seconds time elapsed ( +- 0.97% )
>
> >
> > Also I don't understand how adding prefetch in slowpath affects the performance
> > because most allocs/frees should be done in the fastpath. Could you
> > please explain?
>
> By adding some debug info to count the slowpath for the hackbench:
>
> 'hackbench -g 100 -l 1000' slab alloc total: 80416886, and the slowpath:
> 7184236.
>
> About 9% slowpath in total allocation. The perf stats in arm64 as follow:
>
> Before:
> Performance counter stats for './hackbench -g 100 -l 1000' (32 runs):
>
> 34766611220 branches ( +- 0.01% )
> 382593804 branch-misses # 1.10% of all
> branches ( +- 0.14% )
> 1120091414 cache-misses ( +- 0.08% )
> 76810485402 L1-dcache-loads ( +- 0.03% )
> 1120091414 L1-dcache-load-misses # 1.46% of all
> L1-dcache hits ( +- 0.08% )
>
> 23.8854 +- 0.0804 seconds time elapsed ( +- 0.34% )
>
> After:
> Performance counter stats for './hackbench -g 100 -l 1000' (32 runs):
>
> 34812735277 branches ( +- 0.01% )
> 393449644 branch-misses # 1.13% of all
> branches ( +- 0.15% )
> 1095185949 cache-misses ( +- 0.15% )
> 76995789602 L1-dcache-loads ( +- 0.03% )
> 1095185949 L1-dcache-load-misses # 1.42% of all
> L1-dcache hits ( +- 0.15% )
>
> 23.341 +- 0.104 seconds time elapsed ( +- 0.45% )
>
> It seems having less L1-dcache-load-misses.
>
> >
> >> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
> >> ---
> >> mm/slub.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index c9d8a2497fd6..f9daaff10c6a 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> >> VM_BUG_ON(!c->slab->frozen);
> >> c->freelist = get_freepointer(s, freelist);
> >> c->tid = next_tid(c->tid);
> >> + prefetch_freepointer(s, c->freelist);
> >> local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> >> return freelist;
> >>
> >> --
> >> 2.25.1
> >>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, slub: prefetch freelist in ___slab_alloc()
2024-09-14 13:45 ` Hyeonggon Yoo
@ 2024-10-02 11:57 ` Vlastimil Babka
0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2024-10-02 11:57 UTC (permalink / raw)
To: Hyeonggon Yoo, Yongqiang Liu
Cc: linux-mm, linux-kernel, zhangxiaoxu5, cl, wangkefeng.wang,
penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin
On 9/14/24 15:45, Hyeonggon Yoo wrote:
> On Wed, Aug 21, 2024 at 3:58 PM Yongqiang Liu <liuyongqiang13@huawei.com> wrote:
>>
>>
>> 在 2024/8/19 17:33, Hyeonggon Yoo 写道:
>> > On Mon, Aug 19, 2024 at 4:02 PM Yongqiang Liu <liuyongqiang13@huawei.com> wrote:
>> >> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
>> >> slab_alloc()") introduced prefetch_freepointer() for fastpath
>> >> allocation. Use it at the freelist firt load could have a bit
>> >> improvement in some workloads. Here is hackbench results at
>> >> arm64 machine(about 3.8%):
>> >>
>> >> Before:
>> >> average time cost of 'hackbench -g 100 -l 1000': 17.068
>> >>
>> >> Afther:
>> >> average time cost of 'hackbench -g 100 -l 1000': 16.416
>> >>
>> >> There is also having about 5% improvement at x86_64 machine
>> >> for hackbench.
>> > I think adding more prefetch might not be a good idea unless we have
>> > more real-world data supporting it because prefetch might help when slab
>> > is frequently used, but it will end up unnecessarily using more cache
>> > lines when slab is not frequently used.
>>
>
> Hi,
>
> sorry for the late reply.
> Thanks for explaining how it impacts hackbench, even when prefetch is
> added in the slow path.
>
> However, I still think the main issue is that hackbench is too
> synthetic to make a strong argument that
> prefetch in the slow path would help in most real-world scenarios.
>
>> Yes, prefetching unnecessary objects is a bad idea. But I think the slab
>> entered
>>
>> in slowpath that means it will more likely need more objects.
>
> The fast path is hit when an object can be allocated from the CPU slab
> without much work,
> and the slow path is hit when it can’t. This doesn't give any
> indication about future allocations.
>
> To be honest, I'm not even sure if prefetching in the fast path really
> helps if slab is not frequently called.
> Just because it hits the fast path or slow path doesn’t necessarily
> mean more objects will be needed in the future.
Yeah but in some sense if we keep the fastpath prefetch then it makes sense
to have the analogical slow path one too.
> And then, I don't think "Prefetch some data because we might need it
> in the future" is not a good argument
> because if we don't need it, it just wastes a cache line. If it helps
> in some cases but hurts in other cases,
> is not a net gain.
>
> I might be wrong. If I am, please prove me wrong and convince me and others.
>
> Best,
> Hyeonggon
>
>> I've tested the cases from commit 0ad9500e16fe ("slub: prefetch next freelist pointer in
>> slab_alloc()"). Here is the result:
>> Before:
>>
>> Performance counter stats for './hackbench 50 process 4000' (32 runs):
>>
>> 2545.28 msec task-clock # 6.938 CPUs
>> utilized ( +- 1.75% )
>> 6166 context-switches # 0.002
>> M/sec ( +- 1.58% )
>> 1129 cpu-migrations # 0.444
>> K/sec ( +- 2.16% )
>> 13298 page-faults # 0.005
>> M/sec ( +- 0.38% )
>> 4435113150 cycles # 1.742
>> GHz ( +- 1.22% )
>> 2259717630 instructions # 0.51 insn per
>> cycle ( +- 0.05% )
>> 385847392 branches # 151.593
>> M/sec ( +- 0.06% )
>> 6205369 branch-misses # 1.61% of all
>> branches ( +- 0.56% )
>>
>> 0.36688 +- 0.00595 seconds time elapsed ( +- 1.62% )
>> After:
>>
>> Performance counter stats for './hackbench 50 process 4000' (32 runs):
>>
>> 2277.61 msec task-clock # 6.855 CPUs
>> utilized ( +- 0.98% )
>> 5653 context-switches # 0.002
>> M/sec ( +- 1.62% )
>> 1081 cpu-migrations # 0.475
>> K/sec ( +- 1.89% )
>> 13217 page-faults # 0.006
>> M/sec ( +- 0.48% )
>> 3751509945 cycles # 1.647
>> GHz ( +- 1.14% )
>> 2253177626 instructions # 0.60 insn per
>> cycle ( +- 0.06% )
>> 384509166 branches # 168.821
>> M/sec ( +- 0.07% )
>> 6045031 branch-misses # 1.57% of all
>> branches ( +- 0.58% )
>>
>> 0.33225 +- 0.00321 seconds time elapsed ( +- 0.97% )
>>
>> >
>> > Also I don't understand how adding prefetch in slowpath affects the performance
>> > because most allocs/frees should be done in the fastpath. Could you
>> > please explain?
>>
>> By adding some debug info to count the slowpath for the hackbench:
You could just try enabling the CONFIG_SLUB_STATS to get this count?
>>
>> 'hackbench -g 100 -l 1000' slab alloc total: 80416886, and the slowpath:
>> 7184236.
>>
>> About 9% slowpath in total allocation. The perf stats in arm64 as follow:
Now that's somewhat weird. we improve a case hit in 9% and yet the overal
improvement is 5%? Sounds too good to make sense.
>> Before:
>> Performance counter stats for './hackbench -g 100 -l 1000' (32 runs):
>>
>> 34766611220 branches ( +- 0.01% )
>> 382593804 branch-misses # 1.10% of all
>> branches ( +- 0.14% )
>> 1120091414 cache-misses ( +- 0.08% )
>> 76810485402 L1-dcache-loads ( +- 0.03% )
>> 1120091414 L1-dcache-load-misses # 1.46% of all
>> L1-dcache hits ( +- 0.08% )
>>
>> 23.8854 +- 0.0804 seconds time elapsed ( +- 0.34% )
>>
>> After:
>> Performance counter stats for './hackbench -g 100 -l 1000' (32 runs):
>>
>> 34812735277 branches ( +- 0.01% )
>> 393449644 branch-misses # 1.13% of all
>> branches ( +- 0.15% )
>> 1095185949 cache-misses ( +- 0.15% )
>> 76995789602 L1-dcache-loads ( +- 0.03% )
>> 1095185949 L1-dcache-load-misses # 1.42% of all
>> L1-dcache hits ( +- 0.15% )
>>
>> 23.341 +- 0.104 seconds time elapsed ( +- 0.45% )
>>
>> It seems having less L1-dcache-load-misses.
>>
>> >
>> >> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
>> >> ---
>> >> mm/slub.c | 1 +
>> >> 1 file changed, 1 insertion(+)
>> >>
>> >> diff --git a/mm/slub.c b/mm/slub.c
>> >> index c9d8a2497fd6..f9daaff10c6a 100644
>> >> --- a/mm/slub.c
>> >> +++ b/mm/slub.c
>> >> @@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> >> VM_BUG_ON(!c->slab->frozen);
>> >> c->freelist = get_freepointer(s, freelist);
>> >> c->tid = next_tid(c->tid);
>> >> + prefetch_freepointer(s, c->freelist);
>> >> local_unlock_irqrestore(&s->cpu_slab->lock, flags);
Ideally you'd store c->freelist in a local variable and move the prefetch
below local_unlock_irqrestore() so we don't make the locked section larger.
>> >> return freelist;
>> >>
>> >> --
>> >> 2.25.1
>> >>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-02 11:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-19 7:02 [PATCH] mm, slub: prefetch freelist in ___slab_alloc() Yongqiang Liu
2024-08-19 9:33 ` Hyeonggon Yoo
2024-08-21 6:58 ` Yongqiang Liu
2024-09-14 13:45 ` Hyeonggon Yoo
2024-10-02 11:57 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox