From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Kairui Song <ryncsn@gmail.com>
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
Date: Wed, 12 Feb 2025 15:35:59 +0000 [thread overview]
Message-ID: <Z6y_3xS_8pmZ2bCz@google.com> (raw)
In-Reply-To: <droaoze6w4atf7guiv6t4imhcmkpteyvoaigdnw5p3vdg75ebx@m56xi2y527i4>
On Wed, Feb 12, 2025 at 02:00:26PM +0900, Sergey Senozhatsky wrote:
> On (25/02/07 21:09), Yosry Ahmed wrote:
> > Can we do some perf testing to make sure this custom locking is not
> > regressing performance (selfishly I'd like some zswap testing too)?
>
> So for zsmalloc I (usually) write some simple testing code which is
> triggered via sysfs (device attr) and that is completely reproducible,
> so that I compares apples to apples. In this particular case I just
> have a loop that creates objects (we don't need to compress or decompress
> anything, zsmalloc doesn't really care)
>
> - echo 1 > /sys/ ... / test_prepare
>
> for (sz = 32; sz < PAGE_SIZE; sz += 64) {
> for (i = 0; i < 4096; i++) {
> ent->handle = zs_malloc(zram->mem_pool, sz)
> list_add(ent)
> }
> }
>
>
> And now I just `perf stat` writes:
>
> - perf stat echo 1 > /sys/ ... / test_exec_old
>
> list_for_each_entry
> zs_map_object(ent->handle, ZS_MM_RO);
> zs_unmap_object(ent->handle)
>
> list_for_each_entry
> dst = zs_map_object(ent->handle, ZS_MM_WO);
> memcpy(dst, tmpbuf, ent->sz)
> zs_unmap_object(ent->handle)
>
>
>
> - perf stat echo 1 > /sys/ ... / test_exec_new
>
> list_for_each_entry
> dst = zs_obj_read_begin(ent->handle, loc);
> zs_obj_read_end(ent->handle, dst);
>
> list_for_each_entry
> zs_obj_write(ent->handle, tmpbuf, ent->sz);
>
>
> - echo 1 > /sys/ ... / test_finish
>
> free all handles and ent-s
>
>
> The nice part is that we don't depend on any of the upper layers, we
> don't even need to compress/decompress anything; we allocate objects
> of required sizes and memcpy static data there (zsmalloc doesn't have
> any opinion on that) and that's pretty much it.
>
>
> OLD API
> =======
>
> 10 runs
>
> 369,205,778 instructions # 0.80 insn per cycle
> 40,467,926 branches # 113.732 M/sec
>
> 369,002,122 instructions # 0.62 insn per cycle
> 40,426,145 branches # 189.361 M/sec
>
> 369,051,170 instructions # 0.45 insn per cycle
> 40,434,677 branches # 157.574 M/sec
>
> 369,014,522 instructions # 0.63 insn per cycle
> 40,427,754 branches # 201.464 M/sec
>
> 369,019,179 instructions # 0.64 insn per cycle
> 40,429,327 branches # 198.321 M/sec
>
> 368,973,095 instructions # 0.64 insn per cycle
> 40,419,245 branches # 234.210 M/sec
>
> 368,950,705 instructions # 0.64 insn per cycle
> 40,414,305 branches # 231.460 M/sec
>
> 369,041,288 instructions # 0.46 insn per cycle
> 40,432,599 branches # 155.576 M/sec
>
> 368,964,080 instructions # 0.67 insn per cycle
> 40,417,025 branches # 245.665 M/sec
>
> 369,036,706 instructions # 0.63 insn per cycle
> 40,430,860 branches # 204.105 M/sec
>
>
> NEW API
> =======
>
> 10 runs
>
> 265,799,293 instructions # 0.51 insn per cycle
> 29,834,567 branches # 170.281 M/sec
>
> 265,765,970 instructions # 0.55 insn per cycle
> 29,829,019 branches # 161.602 M/sec
>
> 265,764,702 instructions # 0.51 insn per cycle
> 29,828,015 branches # 189.677 M/sec
>
> 265,836,506 instructions # 0.38 insn per cycle
> 29,840,650 branches # 124.237 M/sec
>
> 265,836,061 instructions # 0.36 insn per cycle
> 29,842,285 branches # 137.670 M/sec
>
> 265,887,080 instructions # 0.37 insn per cycle
> 29,852,881 branches # 126.060 M/sec
>
> 265,769,869 instructions # 0.57 insn per cycle
> 29,829,873 branches # 210.157 M/sec
>
> 265,803,732 instructions # 0.58 insn per cycle
> 29,835,391 branches # 186.940 M/sec
>
> 265,766,624 instructions # 0.58 insn per cycle
> 29,827,537 branches # 212.609 M/sec
>
> 265,843,597 instructions # 0.57 insn per cycle
> 29,843,650 branches # 171.877 M/sec
>
>
> x old-api-insn
> + new-api-insn
> +-------------------------------------------------------------------------------------+
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |+ x|
> |A A|
> +-------------------------------------------------------------------------------------+
> N Min Max Median Avg Stddev
> x 10 3.689507e+08 3.6920578e+08 3.6901918e+08 3.6902586e+08 71765.519
> + 10 2.657647e+08 2.6588708e+08 2.6580373e+08 2.6580734e+08 42187.024
> Difference at 95.0% confidence
> -1.03219e+08 +/- 55308.7
> -27.9705% +/- 0.0149878%
> (Student's t, pooled s = 58864.4)
Thanks for sharing these results, but I wonder if this will capture
regressions from locking changes (e.g. a lock being preemtible)? IIUC
this is counting the instructions executed in these paths, and that
won't change if the task gets preempted. Lock contention may be captured
as extra instructions, but I am not sure we'll directly see its effect
in terms of serialization and delays.
I think we also need some high level testing (e.g. concurrent
swapins/swapouts) to find that out. I think that's what Kairui's testing
covers.
next prev parent reply other threads:[~2025-02-12 15:36 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 9:05 [PATCHv4 00/17] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 01/17] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-31 11:41 ` Hillf Danton
2025-02-03 3:21 ` Sergey Senozhatsky
2025-02-03 3:52 ` Sergey Senozhatsky
2025-02-03 12:39 ` Sergey Senozhatsky
2025-01-31 22:55 ` Andrew Morton
2025-02-03 3:26 ` Sergey Senozhatsky
2025-02-03 7:11 ` Sergey Senozhatsky
2025-02-03 7:33 ` Sergey Senozhatsky
2025-02-04 0:19 ` Andrew Morton
2025-02-04 4:22 ` Sergey Senozhatsky
2025-02-06 7:01 ` Sergey Senozhatsky
2025-02-06 7:38 ` Sebastian Andrzej Siewior
2025-02-06 7:47 ` Sergey Senozhatsky
2025-02-06 8:13 ` Sebastian Andrzej Siewior
2025-02-06 8:17 ` Sergey Senozhatsky
2025-02-06 8:26 ` Sebastian Andrzej Siewior
2025-02-06 8:29 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 02/17] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-02-01 9:21 ` Kairui Song
2025-02-03 3:49 ` Sergey Senozhatsky
2025-02-03 21:00 ` Yosry Ahmed
2025-02-06 12:26 ` Sergey Senozhatsky
2025-02-06 6:55 ` Kairui Song
2025-02-06 7:22 ` Sergey Senozhatsky
2025-02-06 8:22 ` Sergey Senozhatsky
2025-02-06 16:16 ` Yosry Ahmed
2025-02-07 2:56 ` Sergey Senozhatsky
2025-02-07 6:12 ` Sergey Senozhatsky
2025-02-07 21:07 ` Yosry Ahmed
2025-02-08 16:20 ` Sergey Senozhatsky
2025-02-08 16:41 ` Sergey Senozhatsky
2025-02-09 6:22 ` Sergey Senozhatsky
2025-02-09 7:42 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 03/17] zram: remove crypto include Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 04/17] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 05/17] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 06/17] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 07/17] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 08/17] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 09/17] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 10/17] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 11/17] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 12/17] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-01-31 15:46 ` Yosry Ahmed
2025-02-03 4:57 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 13/17] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-31 15:51 ` Yosry Ahmed
2025-02-03 3:13 ` Sergey Senozhatsky
2025-02-03 4:56 ` Sergey Senozhatsky
2025-02-03 21:11 ` Yosry Ahmed
2025-02-04 6:59 ` Sergey Senozhatsky
2025-02-04 17:19 ` Yosry Ahmed
2025-02-05 2:43 ` Sergey Senozhatsky
2025-02-05 19:06 ` Yosry Ahmed
2025-02-06 3:05 ` Sergey Senozhatsky
2025-02-06 3:28 ` Sergey Senozhatsky
2025-02-06 16:19 ` Yosry Ahmed
2025-02-07 2:48 ` Sergey Senozhatsky
2025-02-07 21:09 ` Yosry Ahmed
2025-02-12 5:00 ` Sergey Senozhatsky
2025-02-12 15:35 ` Yosry Ahmed [this message]
2025-02-13 2:18 ` Sergey Senozhatsky
2025-02-13 2:57 ` Yosry Ahmed
2025-02-13 7:21 ` Sergey Senozhatsky
2025-02-13 8:22 ` Sergey Senozhatsky
2025-02-13 15:25 ` Yosry Ahmed
2025-02-14 3:33 ` Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 15/17] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 16/17] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-01-31 9:06 ` [PATCHv4 17/17] zram: add might_sleep to zcomp API Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6y_3xS_8pmZ2bCz@google.com \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ryncsn@gmail.com \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox