linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Kairui Song <ryncsn@gmail.com>
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
Date: Wed, 12 Feb 2025 15:35:59 +0000	[thread overview]
Message-ID: <Z6y_3xS_8pmZ2bCz@google.com> (raw)
In-Reply-To: <droaoze6w4atf7guiv6t4imhcmkpteyvoaigdnw5p3vdg75ebx@m56xi2y527i4>

On Wed, Feb 12, 2025 at 02:00:26PM +0900, Sergey Senozhatsky wrote:
> On (25/02/07 21:09), Yosry Ahmed wrote:
> > Can we do some perf testing to make sure this custom locking is not
> > regressing performance (selfishly I'd like some zswap testing too)?
> 
> So for zsmalloc I (usually) write some simple testing code which is
> triggered via sysfs (device attr) and that is completely reproducible,
> so that I compares apples to apples.  In this particular case I just
> have a loop that creates objects (we don't need to compress or decompress
> anything, zsmalloc doesn't really care)
> 
> -	echo 1 > /sys/ ... / test_prepare
> 
> 	for (sz = 32; sz < PAGE_SIZE; sz += 64) {
> 		for (i = 0; i < 4096; i++) {
> 			ent->handle = zs_malloc(zram->mem_pool, sz)
> 			list_add(ent)
> 		}
> 	}
> 
> 
> And now I just `perf stat` writes:
> 
> -	perf stat echo 1 > /sys/ ... / test_exec_old
> 
> 	list_for_each_entry
> 		zs_map_object(ent->handle, ZS_MM_RO);
> 		zs_unmap_object(ent->handle)
> 
> 	list_for_each_entry
> 		dst = zs_map_object(ent->handle, ZS_MM_WO);
> 		memcpy(dst, tmpbuf, ent->sz)
> 		zs_unmap_object(ent->handle)
> 
> 
> 
> -	perf stat echo 1 > /sys/ ... / test_exec_new
> 
> 	list_for_each_entry
> 		dst = zs_obj_read_begin(ent->handle, loc);
> 		zs_obj_read_end(ent->handle, dst);
> 
> 	list_for_each_entry
> 		zs_obj_write(ent->handle, tmpbuf, ent->sz);
> 
> 
> -	echo 1 > /sys/ ... / test_finish
> 
> 	free all handles and ent-s
> 
> 
> The nice part is that we don't depend on any of the upper layers, we
> don't even need to compress/decompress anything; we allocate objects
> of required sizes and memcpy static data there (zsmalloc doesn't have
> any opinion on that) and that's pretty much it.
> 
> 
> OLD API
> =======
> 
> 10 runs
> 
>        369,205,778      instructions                     #    0.80  insn per cycle            
>         40,467,926      branches                         #  113.732 M/sec                     
> 
>        369,002,122      instructions                     #    0.62  insn per cycle            
>         40,426,145      branches                         #  189.361 M/sec                     
> 
>        369,051,170      instructions                     #    0.45  insn per cycle            
>         40,434,677      branches                         #  157.574 M/sec                     
> 
>        369,014,522      instructions                     #    0.63  insn per cycle            
>         40,427,754      branches                         #  201.464 M/sec                     
> 
>        369,019,179      instructions                     #    0.64  insn per cycle            
>         40,429,327      branches                         #  198.321 M/sec                     
> 
>        368,973,095      instructions                     #    0.64  insn per cycle            
>         40,419,245      branches                         #  234.210 M/sec                     
> 
>        368,950,705      instructions                     #    0.64  insn per cycle            
>         40,414,305      branches                         #  231.460 M/sec                     
> 
>        369,041,288      instructions                     #    0.46  insn per cycle            
>         40,432,599      branches                         #  155.576 M/sec                     
> 
>        368,964,080      instructions                     #    0.67  insn per cycle            
>         40,417,025      branches                         #  245.665 M/sec                     
> 
>        369,036,706      instructions                     #    0.63  insn per cycle            
>         40,430,860      branches                         #  204.105 M/sec                     
> 
> 
> NEW API
> =======
> 
> 10 runs
> 
>        265,799,293      instructions                     #    0.51  insn per cycle            
>         29,834,567      branches                         #  170.281 M/sec                     
> 
>        265,765,970      instructions                     #    0.55  insn per cycle            
>         29,829,019      branches                         #  161.602 M/sec                     
> 
>        265,764,702      instructions                     #    0.51  insn per cycle            
>         29,828,015      branches                         #  189.677 M/sec                     
> 
>        265,836,506      instructions                     #    0.38  insn per cycle            
>         29,840,650      branches                         #  124.237 M/sec                     
> 
>        265,836,061      instructions                     #    0.36  insn per cycle            
>         29,842,285      branches                         #  137.670 M/sec                     
> 
>        265,887,080      instructions                     #    0.37  insn per cycle            
>         29,852,881      branches                         #  126.060 M/sec                     
> 
>        265,769,869      instructions                     #    0.57  insn per cycle            
>         29,829,873      branches                         #  210.157 M/sec                     
> 
>        265,803,732      instructions                     #    0.58  insn per cycle            
>         29,835,391      branches                         #  186.940 M/sec                     
> 
>        265,766,624      instructions                     #    0.58  insn per cycle            
>         29,827,537      branches                         #  212.609 M/sec                     
> 
>        265,843,597      instructions                     #    0.57  insn per cycle            
>         29,843,650      branches                         #  171.877 M/sec                     
> 
> 
> x old-api-insn
> + new-api-insn
> +-------------------------------------------------------------------------------------+
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |+                                                                                   x|
> |A                                                                                   A|
> +-------------------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x  10  3.689507e+08 3.6920578e+08 3.6901918e+08 3.6902586e+08     71765.519
> +  10  2.657647e+08 2.6588708e+08 2.6580373e+08 2.6580734e+08     42187.024
> Difference at 95.0% confidence
> 	-1.03219e+08 +/- 55308.7
> 	-27.9705% +/- 0.0149878%
> 	(Student's t, pooled s = 58864.4)

Thanks for sharing these results, but I wonder if this will capture
regressions from locking changes (e.g. a lock being preemtible)? IIUC
this is counting the instructions executed in these paths, and that
won't change if the task gets preempted. Lock contention may be captured
as extra instructions, but I am not sure we'll directly see its effect
in terms of serialization and delays.

I think we also need some high level testing (e.g. concurrent
swapins/swapouts) to find that out. I think that's what Kairui's testing
covers.


  reply	other threads:[~2025-02-12 15:36 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-31  9:05 [PATCHv4 00/17] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 01/17] zram: switch to non-atomic entry locking Sergey Senozhatsky
2025-01-31 11:41   ` Hillf Danton
2025-02-03  3:21     ` Sergey Senozhatsky
2025-02-03  3:52       ` Sergey Senozhatsky
2025-02-03 12:39       ` Sergey Senozhatsky
2025-01-31 22:55   ` Andrew Morton
2025-02-03  3:26     ` Sergey Senozhatsky
2025-02-03  7:11       ` Sergey Senozhatsky
2025-02-03  7:33         ` Sergey Senozhatsky
2025-02-04  0:19       ` Andrew Morton
2025-02-04  4:22         ` Sergey Senozhatsky
2025-02-06  7:01     ` Sergey Senozhatsky
2025-02-06  7:38       ` Sebastian Andrzej Siewior
2025-02-06  7:47         ` Sergey Senozhatsky
2025-02-06  8:13           ` Sebastian Andrzej Siewior
2025-02-06  8:17             ` Sergey Senozhatsky
2025-02-06  8:26               ` Sebastian Andrzej Siewior
2025-02-06  8:29                 ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 02/17] zram: do not use per-CPU compression streams Sergey Senozhatsky
2025-02-01  9:21   ` Kairui Song
2025-02-03  3:49     ` Sergey Senozhatsky
2025-02-03 21:00       ` Yosry Ahmed
2025-02-06 12:26         ` Sergey Senozhatsky
2025-02-06  6:55       ` Kairui Song
2025-02-06  7:22         ` Sergey Senozhatsky
2025-02-06  8:22           ` Sergey Senozhatsky
2025-02-06 16:16           ` Yosry Ahmed
2025-02-07  2:56             ` Sergey Senozhatsky
2025-02-07  6:12               ` Sergey Senozhatsky
2025-02-07 21:07                 ` Yosry Ahmed
2025-02-08 16:20                   ` Sergey Senozhatsky
2025-02-08 16:41                     ` Sergey Senozhatsky
2025-02-09  6:22                     ` Sergey Senozhatsky
2025-02-09  7:42                       ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 03/17] zram: remove crypto include Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 04/17] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 05/17] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 06/17] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 07/17] zram: permit reclaim in recompression handle allocation Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 08/17] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 09/17] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 10/17] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 11/17] zram: unlock slot during recompression Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 12/17] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-01-31 15:46   ` Yosry Ahmed
2025-02-03  4:57     ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 13/17] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-01-31 15:51   ` Yosry Ahmed
2025-02-03  3:13     ` Sergey Senozhatsky
2025-02-03  4:56       ` Sergey Senozhatsky
2025-02-03 21:11       ` Yosry Ahmed
2025-02-04  6:59         ` Sergey Senozhatsky
2025-02-04 17:19           ` Yosry Ahmed
2025-02-05  2:43             ` Sergey Senozhatsky
2025-02-05 19:06               ` Yosry Ahmed
2025-02-06  3:05                 ` Sergey Senozhatsky
2025-02-06  3:28                   ` Sergey Senozhatsky
2025-02-06 16:19                   ` Yosry Ahmed
2025-02-07  2:48                     ` Sergey Senozhatsky
2025-02-07 21:09                       ` Yosry Ahmed
2025-02-12  5:00                         ` Sergey Senozhatsky
2025-02-12 15:35                           ` Yosry Ahmed [this message]
2025-02-13  2:18                             ` Sergey Senozhatsky
2025-02-13  2:57                               ` Yosry Ahmed
2025-02-13  7:21                                 ` Sergey Senozhatsky
2025-02-13  8:22                                   ` Sergey Senozhatsky
2025-02-13 15:25                                     ` Yosry Ahmed
2025-02-14  3:33                                       ` Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 15/17] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 16/17] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-01-31  9:06 ` [PATCHv4 17/17] zram: add might_sleep to zcomp API Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6y_3xS_8pmZ2bCz@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox