From: Marcelo Tosatti <mtosatti@redhat.com>
To: Vlastimil Babka <vbabka@suse.com>
Cc: Michal Hocko <mhocko@suse.com>,
Leonardo Bras <leobras.c@gmail.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Vlastimil Babka <vbabka@suse.cz>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
Frederic Weisbecker <fweisbecker@suse.de>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Thu, 26 Feb 2026 15:24:29 -0300 [thread overview]
Message-ID: <aaCP3V64INRZiZUH@tpad> (raw)
In-Reply-To: <1fd2efef-888b-4d3c-9c72-bdb2d594336f@suse.com>
On Mon, Feb 23, 2026 at 07:09:47PM +0100, Vlastimil Babka wrote:
> On 2/20/26 17:55, Marcelo Tosatti wrote:
> >
> > #include <linux/module.h>
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > #include <linux/timex.h>
> > #include <linux/preempt.h>
> > #include <linux/irqflags.h>
> > #include <linux/vmalloc.h>
> >
> > MODULE_LICENSE("GPL");
> > MODULE_AUTHOR("Gemini AI");
> > MODULE_DESCRIPTION("A simple kmalloc performance benchmark");
> >
> > static int size = 64; // Default allocation size in bytes
> > module_param(size, int, 0644);
> >
> > static int iterations = 1000000; // Default number of iterations
> > module_param(iterations, int, 0644);
> >
> > static int __init kmalloc_bench_init(void) {
> > void **ptrs;
> > cycles_t start, end;
> > uint64_t total_cycles;
> > int i;
> > pr_info("kmalloc_bench: Starting test (size=%d, iterations=%d)\n", size, iterations);
> >
> > // Allocate an array to store pointers to avoid immediate kfree-reuse optimization
> > ptrs = vmalloc(sizeof(void *) * iterations);
> > if (!ptrs) {
> > pr_err("kmalloc_bench: Failed to allocate pointer array\n");
> > return -ENOMEM;
> > }
> >
> > preempt_disable();
> > start = get_cycles();
> >
> > for (i = 0; i < iterations; i++) {
> > ptrs[i] = kmalloc(size, GFP_ATOMIC);
> > }
> >
> > end = get_cycles();
> >
> > total_cycles = end - start;
> > preempt_enable();
>
> While preempt_disable() simplifies things, it can misrepresent the cost of
> preempt_disable() that's part of the locking - that will become nested and
> then the nested preempt_disable() is typically cheaper, etc.
>
> Also the way it kmallocs all iterations and then kfree all iterations may
> skew the probabilities of fastpaths, cache hotness etc.
>
> When introducing sheaves I had a similar microbenchmark, but there was
> different amounts of inner-loop iteraions, no outer preempt_disable(), and
> linear vs randomized array. See:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/commit/?h=slub-percpu-sheaves-v6-benchmarking&id=04028eeffba18a4f821a7194bc9d14f7488bd7d9
>
> (at this point the SLUB_HAS_SHEAVES parts should be removed and the
> kmem_cache_print_stats() stuff also shouldn't be interesting for QPW
> evaluation).
Hi Vlastimil,
There is a problem which the numbers vary significantly across runs
(on the same kernel, system is idle, cpu is isolated).
SLUB_HAS_SHEAVES is not defined on my build. Just copied slub_kunit.c
from slub-percpu-sheaves-v6-benchmarking
to current tip (and dropped call to kmem_cache_print_stats).
1st run:
[ 635.059928] average (excl. iter 0): 56571797
[ 635.235206] average (excl. iter 0): 58329901
[ 635.409957] average (excl. iter 0): 57459678
[ 635.585128] average (excl. iter 0): 58268333
[ 635.767325] average (excl. iter 0): 60063837
[ 635.944534] average (excl. iter 0): 58912817
[ 636.154503] average (excl. iter 0): 68992131
[ 636.362533] average (excl. iter 0): 69030629
[ 636.536737] average (excl. iter 0): 56545622
[ 636.704314] average (excl. iter 0): 55536407
[ 636.879097] average (excl. iter 0): 57397803
[ 637.051157] average (excl. iter 0): 57021907
[ 637.296352] average (excl. iter 0): 81582815
[ 637.539810] average (excl. iter 0): 81126686
2nd run:
[ 662.824688] average (excl. iter 0): 56833529
[ 662.996742] average (excl. iter 0): 57145388
[ 663.167063] average (excl. iter 0): 55828870
[ 663.339814] average (excl. iter 0): 57505312
[ 663.514563] average (excl. iter 0): 57374528
[ 663.690328] average (excl. iter 0): 57282062
[ 663.896128] average (excl. iter 0): 68097440
[ 664.103029] average (excl. iter 0): 69263914
[ 664.276497] average (excl. iter 0): 57073271
[ 664.442210] average (excl. iter 0): 54895879
[ 664.617186] average (excl. iter 0): 56972700
[ 664.787353] average (excl. iter 0): 56457173
[ 665.028944] average (excl. iter 0): 80339269
[ 665.268597] average (excl. iter 0): 80371907
3rd run:
[ 716.278750] average (excl. iter 0): 54191777
[ 716.442014] average (excl. iter 0): 54151132
[ 716.605254] average (excl. iter 0): 53148722
[ 716.766461] average (excl. iter 0): 53204894
[ 716.933339] average (excl. iter 0): 54719251
[ 717.098761] average (excl. iter 0): 54922923
[ 717.296178] average (excl. iter 0): 65351864
[ 717.491440] average (excl. iter 0): 65264027
[ 717.660778] average (excl. iter 0): 54370768
[ 717.823625] average (excl. iter 0): 54137410
[ 717.988983] average (excl. iter 0): 54222488
[ 718.152716] average (excl. iter 0): 54339019
[ 718.387978] average (excl. iter 0): 78249026
[ 718.619598] average (excl. iter 0): 77746198
Increasing total parameter from 10^6 to 10^7 does
not help:
1st run:
[ 1074.601686] average (excl. iter 0): 650711901
[ 1076.450880] average (excl. iter 0): 633014260
[ 1078.363300] average (excl. iter 0): 660440649
[ 1080.266134] average (excl. iter 0): 652695083
[ 1082.117007] average (excl. iter 0): 635632144
[ 1084.009277] average (excl. iter 0): 654270513
[ 1086.286343] average (excl. iter 0): 790520038
[ 1088.512516] average (excl. iter 0): 768071705
[ 1090.448161] average (excl. iter 0): 664564330
[ 1092.349683] average (excl. iter 0): 659016349
[ 1094.274099] average (excl. iter 0): 662388982
[ 1096.172362] average (excl. iter 0): 647972747
[ 1098.753304] average (excl. iter 0): 887576313
[ 1101.339897] average (excl. iter 0): 885102019
2nd run:
[ 1120.186284] average (excl. iter 0): 615756734
[ 1122.019323] average (excl. iter 0): 623846524
[ 1123.885801] average (excl. iter 0): 639124895
[ 1125.693617] average (excl. iter 0): 623667563
[ 1127.588515] average (excl. iter 0): 646441510
[ 1129.410285] average (excl. iter 0): 628291996
[ 1131.542157] average (excl. iter 0): 728497604
[ 1133.698744] average (excl. iter 0): 743717953
[ 1135.514112] average (excl. iter 0): 616621660
[ 1137.306874] average (excl. iter 0): 615863807
[ 1139.110637] average (excl. iter 0): 616425899
[ 1140.948769] average (excl. iter 0): 638115570
[ 1143.426557] average (excl. iter 0): 847799304
[ 1145.914827] average (excl. iter 0): 861180802
Will switch back to the simple test (and its pretty obvious
from the patch itself that if qpw=0 the overhead should
be zero, and it is). Its numbers are more
stable across runs.
next prev parent reply other threads:[~2026-03-02 15:53 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 14:34 Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20 ` Marcelo Tosatti
2026-02-07 0:16 ` Leonardo Bras
2026-02-11 12:09 ` Marcelo Tosatti
2026-02-14 21:32 ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07 1:06 ` Leonardo Bras
2026-02-26 15:49 ` Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07 1:27 ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01 ` Marcelo Tosatti
2026-02-11 12:11 ` Marcelo Tosatti
2026-02-14 21:35 ` Leonardo Bras
2026-02-11 16:38 ` Michal Hocko
2026-02-11 16:50 ` Marcelo Tosatti
2026-02-11 16:59 ` Vlastimil Babka
2026-02-11 17:07 ` Michal Hocko
2026-02-14 22:02 ` Leonardo Bras
2026-02-16 11:00 ` Michal Hocko
2026-02-19 15:27 ` Marcelo Tosatti
2026-02-19 19:30 ` Michal Hocko
2026-02-20 14:30 ` Marcelo Tosatti
2026-02-23 9:18 ` Michal Hocko
2026-02-23 21:56 ` Frederic Weisbecker
2026-02-24 17:23 ` Marcelo Tosatti
2026-02-25 21:49 ` Frederic Weisbecker
2026-02-26 7:06 ` Michal Hocko
2026-02-26 11:41 ` Marcelo Tosatti
2026-02-20 10:48 ` Vlastimil Babka
2026-02-20 12:31 ` Michal Hocko
2026-02-20 17:35 ` Marcelo Tosatti
2026-02-20 17:58 ` Vlastimil Babka
2026-02-20 19:01 ` Marcelo Tosatti
2026-02-23 9:11 ` Michal Hocko
2026-02-23 11:20 ` Marcelo Tosatti
2026-02-24 14:40 ` Frederic Weisbecker
2026-02-24 18:12 ` Marcelo Tosatti
2026-02-20 16:51 ` Marcelo Tosatti
2026-02-20 16:55 ` Marcelo Tosatti
2026-02-20 22:38 ` Leonardo Bras
2026-02-23 18:09 ` Vlastimil Babka
2026-02-26 18:24 ` Marcelo Tosatti [this message]
2026-02-20 21:58 ` Leonardo Bras
2026-02-23 9:06 ` Michal Hocko
2026-02-28 1:23 ` Leonardo Bras
2026-02-19 13:15 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaCP3V64INRZiZUH@tpad \
--to=mtosatti@redhat.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=fweisbecker@suse.de \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=leobras.c@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox