* [RFC] Analyzing zpool allocators / Removing zbud and z3fold
@ 2024-02-09 3:27 Yosry Ahmed
2024-02-22 3:54 ` Chengming Zhou
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Yosry Ahmed @ 2024-02-09 3:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Vitaly Wool, Miaohe Lin, Johannes Weiner, Nhat Pham, Linux-MM,
Linux Kernel Mailing List, Christoph Hellwig, Sergey Senozhatsky,
Minchan Kim, Chris Down, Seth Jennings, Dan Streetman, Chris Li
Hey folks,
This is a follow up on my previously sent RFC patch to deprecate
z3fold [1]. This is an RFC without code, I thought I could get some
discussion going before writing (or rather deleting) more code. I went
back to do some analysis on the 3 zpool allocators: zbud, zsmalloc,
and z3fold.
[1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@google.com/
In this analysis, for each of the allocators I ran a kernel build test
on tmpfs in a limit cgroup 5 times and captured:
(a) The build times.
(b) zswap_load() and zswap_store() latencies using bpftrace.
(c) The maximum size of the zswap pool from /proc/meminfo::Zswapped.
Here are the results I have. I am using zsmalloc as the base for all
comparisons.
-------------------------------- <Results> --------------------------------
(a) Build times
*** zsmalloc ***
──────────────────────────────────────────────────────────────
LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
────────────────────┼──────────┼──────────┼──────────┼────────
real │ 108.890 │ 116.160 │ 111.304 │ 110.310 │ 2.719
sys │ 6838.860 │ 7137.830 │ 6936.414 │ 6862.160 │ 114.860
user │ 2838.270 │ 2859.050 │ 2850.116 │ 2852.590 │ 7.388
──────────────────────────────────────────────────────────────
*** zbud ***
──────────────────────────────────────────────────────────────
LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
────────────────────┼──────────┼──────────┼──────────┼────────
real │ 105.540 │ 114.430 │ 108.738 │ 108.140 │ 3.027
sys │ 6553.680 │ 6794.330 │ 6688.184 │ 6661.840 │ 86.471
user │ 2836.390 │ 2847.850 │ 2842.952 │ 2843.450 │ 3.721
──────────────────────────────────────────────────────────────
*** z3fold ***
──────────────────────────────────────────────────────────────
LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
────────────────────┼──────────┼──────────┼──────────┼────────
real │ 113.020 │ 118.110 │ 114.642 │ 114.010 │ 1.803
sys │ 7168.860 │ 7284.900 │ 7243.930 │ 7265.290 │ 42.254
user │ 2865.630 │ 2869.840 │ 2868.208 │ 2868.710 │ 1.625
──────────────────────────────────────────────────────────────
Comparing the means, zbud is 2.3% faster, and z3fold is 3% slower.
(b) zswap_load() and zswap_store() latencies
*** zsmalloc ***
@load_ns:
[128, 256) 377 | |
[256, 512) 772 | |
[512, 1K) 923 | |
[1K, 2K) 22141 | |
[2K, 4K) 88297 | |
[4K, 8K) 1685833 |@@@@@ |
[8K, 16K) 17087712 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K) 10875077 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32K, 64K) 777656 |@@ |
[64K, 128K) 127239 | |
[128K, 256K) 50301 | |
[256K, 512K) 1669 | |
[512K, 1M) 37 | |
[1M, 2M) 3 | |
@store_ns:
[512, 1K) 279 | |
[1K, 2K) 15969 | |
[2K, 4K) 193446 | |
[4K, 8K) 823283 | |
[8K, 16K) 14209844 |@@@@@@@@@@@ |
[16K, 32K) 62040863 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32K, 64K) 9737713 |@@@@@@@@ |
[64K, 128K) 1278302 |@ |
[128K, 256K) 487285 | |
[256K, 512K) 4406 | |
[512K, 1M) 117 | |
[1M, 2M) 24 | |
*** zbud ***
@load_ns:
[128, 256) 452 | |
[256, 512) 834 | |
[512, 1K) 998 | |
[1K, 2K) 22708 | |
[2K, 4K) 171247 | |
[4K, 8K) 2853227 |@@@@@@@@ |
[8K, 16K) 17727445 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K) 9523050 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32K, 64K) 752423 |@@ |
[64K, 128K) 135560 | |
[128K, 256K) 52360 | |
[256K, 512K) 4071 | |
[512K, 1M) 57 | |
@store_ns:
[512, 1K) 518 | |
[1K, 2K) 13337 | |
[2K, 4K) 193043 | |
[4K, 8K) 846118 | |
[8K, 16K) 15240682 |@@@@@@@@@@@@@ |
[16K, 32K) 60945786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32K, 64K) 10230719 |@@@@@@@@ |
[64K, 128K) 1612647 |@ |
[128K, 256K) 498344 | |
[256K, 512K) 8550 | |
[512K, 1M) 199 | |
[1M, 2M) 1 | |
*** z3fold ***
@load_ns:
[128, 256) 344 | |
[256, 512) 999 | |
[512, 1K) 859 | |
[1K, 2K) 21069 | |
[2K, 4K) 53704 | |
[4K, 8K) 1351571 |@@@@ |
[8K, 16K) 14142680 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16K, 32K) 11788684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32K, 64K) 1133377 |@@@@ |
[64K, 128K) 121670 | |
[128K, 256K) 68663 | |
[256K, 512K) 120 | |
[512K, 1M) 21 | |
[512, 1K) 257 | |
[1K, 2K) 10162 | |
[2K, 4K) 149599 | |
[4K, 8K) 648121 | |
[8K, 16K) 9115497 |@@@@@@@@ |
[16K, 32K) 56467456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32K, 64K) 16235236 |@@@@@@@@@@@@@@ |
[64K, 128K) 1397437 |@ |
[128K, 256K) 705916 | |
[256K, 512K) 3087 | |
[512K, 1M) 62 | |
[1M, 2M) 1 | |
I did not perform any sophisticated analysis on these histograms, but
eyeballing them makes it clear that all allocators have somewhat
similar latencies. zbud is slightly better than zsmalloc, and z3fold
is slightly worse than zsmalloc. This corresponds naturally to the
build times in (a).
(c) Maximum size of the zswap pool
*** zsmalloc ***
1,137,659,904 bytes = ~1.13G
*** zbud ***
1,535,741,952 bytes = ~1.5G
*** z3fold ***
1,151,303,680 bytes = ~1.15G
zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more
memory. This makes sense because zbud only stores a maximum of two
compressed pages on each order-0 page, regardless of the compression
ratio, so it is bound to consume more memory.
-------------------------------- </Results> --------------------------------
According to those results, it seems like zsmalloc is superior to
z3fold in both efficiency and latency. Zbud has a small latency
advantage, but that comes with a huge cost in terms of memory
consumption. Moreover, most known users of zswap are currently using
zsmalloc. Perhaps some folks are using zbud because it was the default
allocator up until recently. The only known disadvantage of zsmalloc
is the dependency on MMU.
Based on that, I think it doesn't make sense to keep all 3 allocators
going forward. I believe we should start with removing either zbud or
z3fold, leaving only one allocator supporting MMU. Once zsmalloc
supports !MMU (if possible), we can keep zsmalloc as the only
allocator.
Thoughts and feedback are highly appreciated. I tried to CC all the
interested folks, but others feel free to chime in.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Analyzing zpool allocators / Removing zbud and z3fold
2024-02-09 3:27 [RFC] Analyzing zpool allocators / Removing zbud and z3fold Yosry Ahmed
@ 2024-02-22 3:54 ` Chengming Zhou
2024-02-22 5:56 ` Yosry Ahmed
2024-02-22 6:23 ` Nhat Pham
2024-02-22 6:46 ` [External] " Zhongkun He
2 siblings, 1 reply; 6+ messages in thread
From: Chengming Zhou @ 2024-02-22 3:54 UTC (permalink / raw)
To: Yosry Ahmed, Andrew Morton
Cc: Vitaly Wool, Miaohe Lin, Johannes Weiner, Nhat Pham, Linux-MM,
Linux Kernel Mailing List, Christoph Hellwig, Sergey Senozhatsky,
Minchan Kim, Chris Down, Seth Jennings, Dan Streetman, Chris Li
On 2024/2/9 11:27, Yosry Ahmed wrote:
> Hey folks,
>
> This is a follow up on my previously sent RFC patch to deprecate
> z3fold [1]. This is an RFC without code, I thought I could get some
> discussion going before writing (or rather deleting) more code. I went
> back to do some analysis on the 3 zpool allocators: zbud, zsmalloc,
> and z3fold.
This is a great analysis! Sorry for being late to see it.
I want to vote for this direction, zram has been using zsmalloc directly,
zswap can also do this, which is simpler and we can just maintain and optimize
only one allocator. The only evident downside is dependence on MMU, right?
And I'm trying to optimize the scalability performance for zsmalloc now,
which is bad so zswap has to use 32 pools to workaround it. (zram only use
one pool, should also have the scalability problem on big server, maybe
have to use many zram block devices to workaround it too.)
But too many pools would cause more memory waste and more fragmentation,
so the resulted compression ratio is not good enough.
As for the MMU dependence, we can actually avoid it? Maybe I missed something,
we can get object's memory vecs from zsmalloc, then send it to decompress,
which should support length(memory vecs) > 1?
>
> [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@google.com/
>
> In this analysis, for each of the allocators I ran a kernel build test
> on tmpfs in a limit cgroup 5 times and captured:
> (a) The build times.
> (b) zswap_load() and zswap_store() latencies using bpftrace.
> (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped.
Here should use /proc/meminfo::Zswap, right?
Zswap is the sum of pool pages size, Zswapped is the swapped/compressed pages.
Thanks!
>
> Here are the results I have. I am using zsmalloc as the base for all
> comparisons.
>
> -------------------------------- <Results> --------------------------------
>
> (a) Build times
>
> *** zsmalloc ***
> ──────────────────────────────────────────────────────────────
> LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
> ────────────────────┼──────────┼──────────┼──────────┼────────
> real │ 108.890 │ 116.160 │ 111.304 │ 110.310 │ 2.719
> sys │ 6838.860 │ 7137.830 │ 6936.414 │ 6862.160 │ 114.860
> user │ 2838.270 │ 2859.050 │ 2850.116 │ 2852.590 │ 7.388
> ──────────────────────────────────────────────────────────────
>
> *** zbud ***
> ──────────────────────────────────────────────────────────────
> LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
> ────────────────────┼──────────┼──────────┼──────────┼────────
> real │ 105.540 │ 114.430 │ 108.738 │ 108.140 │ 3.027
> sys │ 6553.680 │ 6794.330 │ 6688.184 │ 6661.840 │ 86.471
> user │ 2836.390 │ 2847.850 │ 2842.952 │ 2843.450 │ 3.721
> ──────────────────────────────────────────────────────────────
>
> *** z3fold ***
> ──────────────────────────────────────────────────────────────
> LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
> ────────────────────┼──────────┼──────────┼──────────┼────────
> real │ 113.020 │ 118.110 │ 114.642 │ 114.010 │ 1.803
> sys │ 7168.860 │ 7284.900 │ 7243.930 │ 7265.290 │ 42.254
> user │ 2865.630 │ 2869.840 │ 2868.208 │ 2868.710 │ 1.625
> ──────────────────────────────────────────────────────────────
>
> Comparing the means, zbud is 2.3% faster, and z3fold is 3% slower.
>
> (b) zswap_load() and zswap_store() latencies
>
> *** zsmalloc ***
>
> @load_ns:
> [128, 256) 377 | |
> [256, 512) 772 | |
> [512, 1K) 923 | |
> [1K, 2K) 22141 | |
> [2K, 4K) 88297 | |
> [4K, 8K) 1685833 |@@@@@ |
> [8K, 16K) 17087712 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K) 10875077 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K) 777656 |@@ |
> [64K, 128K) 127239 | |
> [128K, 256K) 50301 | |
> [256K, 512K) 1669 | |
> [512K, 1M) 37 | |
> [1M, 2M) 3 | |
>
> @store_ns:
> [512, 1K) 279 | |
> [1K, 2K) 15969 | |
> [2K, 4K) 193446 | |
> [4K, 8K) 823283 | |
> [8K, 16K) 14209844 |@@@@@@@@@@@ |
> [16K, 32K) 62040863 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32K, 64K) 9737713 |@@@@@@@@ |
> [64K, 128K) 1278302 |@ |
> [128K, 256K) 487285 | |
> [256K, 512K) 4406 | |
> [512K, 1M) 117 | |
> [1M, 2M) 24 | |
>
> *** zbud ***
>
> @load_ns:
> [128, 256) 452 | |
> [256, 512) 834 | |
> [512, 1K) 998 | |
> [1K, 2K) 22708 | |
> [2K, 4K) 171247 | |
> [4K, 8K) 2853227 |@@@@@@@@ |
> [8K, 16K) 17727445 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K) 9523050 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K) 752423 |@@ |
> [64K, 128K) 135560 | |
> [128K, 256K) 52360 | |
> [256K, 512K) 4071 | |
> [512K, 1M) 57 | |
>
> @store_ns:
> [512, 1K) 518 | |
> [1K, 2K) 13337 | |
> [2K, 4K) 193043 | |
> [4K, 8K) 846118 | |
> [8K, 16K) 15240682 |@@@@@@@@@@@@@ |
> [16K, 32K) 60945786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32K, 64K) 10230719 |@@@@@@@@ |
> [64K, 128K) 1612647 |@ |
> [128K, 256K) 498344 | |
> [256K, 512K) 8550 | |
> [512K, 1M) 199 | |
> [1M, 2M) 1 | |
>
> *** z3fold ***
>
> @load_ns:
> [128, 256) 344 | |
> [256, 512) 999 | |
> [512, 1K) 859 | |
> [1K, 2K) 21069 | |
> [2K, 4K) 53704 | |
> [4K, 8K) 1351571 |@@@@ |
> [8K, 16K) 14142680 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K) 11788684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K) 1133377 |@@@@ |
> [64K, 128K) 121670 | |
> [128K, 256K) 68663 | |
> [256K, 512K) 120 | |
> [512K, 1M) 21 | |
>
> [512, 1K) 257 | |
> [1K, 2K) 10162 | |
> [2K, 4K) 149599 | |
> [4K, 8K) 648121 | |
> [8K, 16K) 9115497 |@@@@@@@@ |
> [16K, 32K) 56467456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32K, 64K) 16235236 |@@@@@@@@@@@@@@ |
> [64K, 128K) 1397437 |@ |
> [128K, 256K) 705916 | |
> [256K, 512K) 3087 | |
> [512K, 1M) 62 | |
> [1M, 2M) 1 | |
>
> I did not perform any sophisticated analysis on these histograms, but
> eyeballing them makes it clear that all allocators have somewhat
> similar latencies. zbud is slightly better than zsmalloc, and z3fold
> is slightly worse than zsmalloc. This corresponds naturally to the
> build times in (a).
>
> (c) Maximum size of the zswap pool
>
> *** zsmalloc ***
> 1,137,659,904 bytes = ~1.13G
>
> *** zbud ***
> 1,535,741,952 bytes = ~1.5G
>
> *** z3fold ***
> 1,151,303,680 bytes = ~1.15G
>
> zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more
> memory. This makes sense because zbud only stores a maximum of two
> compressed pages on each order-0 page, regardless of the compression
> ratio, so it is bound to consume more memory.
>
> -------------------------------- </Results> --------------------------------
>
> According to those results, it seems like zsmalloc is superior to
> z3fold in both efficiency and latency. Zbud has a small latency
> advantage, but that comes with a huge cost in terms of memory
> consumption. Moreover, most known users of zswap are currently using
> zsmalloc. Perhaps some folks are using zbud because it was the default
> allocator up until recently. The only known disadvantage of zsmalloc
> is the dependency on MMU.
>
> Based on that, I think it doesn't make sense to keep all 3 allocators
> going forward. I believe we should start with removing either zbud or
> z3fold, leaving only one allocator supporting MMU. Once zsmalloc
> supports !MMU (if possible), we can keep zsmalloc as the only
> allocator.
>
> Thoughts and feedback are highly appreciated. I tried to CC all the
> interested folks, but others feel free to chime in.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Analyzing zpool allocators / Removing zbud and z3fold
2024-02-22 3:54 ` Chengming Zhou
@ 2024-02-22 5:56 ` Yosry Ahmed
0 siblings, 0 replies; 6+ messages in thread
From: Yosry Ahmed @ 2024-02-22 5:56 UTC (permalink / raw)
To: Chengming Zhou
Cc: Andrew Morton, Vitaly Wool, Miaohe Lin, Johannes Weiner,
Nhat Pham, Linux-MM, Linux Kernel Mailing List,
Christoph Hellwig, Sergey Senozhatsky, Minchan Kim, Chris Down,
Seth Jennings, Dan Streetman, Chris Li
On Thu, Feb 22, 2024 at 11:54:44AM +0800, Chengming Zhou wrote:
> On 2024/2/9 11:27, Yosry Ahmed wrote:
> > Hey folks,
> >
> > This is a follow up on my previously sent RFC patch to deprecate
> > z3fold [1]. This is an RFC without code, I thought I could get some
> > discussion going before writing (or rather deleting) more code. I went
> > back to do some analysis on the 3 zpool allocators: zbud, zsmalloc,
> > and z3fold.
>
> This is a great analysis! Sorry for being late to see it.
>
> I want to vote for this direction, zram has been using zsmalloc directly,
> zswap can also do this, which is simpler and we can just maintain and optimize
> only one allocator. The only evident downside is dependence on MMU, right?
AFAICT, yes. I saw a lot of positive responses when I sent an RFC to
mark z3fold as deprecated, but there were some opposing opinions as
well, which is why I did this simple analysis. I was hoping we can make
forward progress with that, but was disappointed it didn't get as much
attention as the deprecation RFC :)
>
> And I'm trying to optimize the scalability performance for zsmalloc now,
> which is bad so zswap has to use 32 pools to workaround it. (zram only use
> one pool, should also have the scalability problem on big server, maybe
> have to use many zram block devices to workaround it too.)
That's slightly orthogonal. Zsmalloc is not really showing worse
performance than other allocators, so this should be a separate effort.
>
> But too many pools would cause more memory waste and more fragmentation,
> so the resulted compression ratio is not good enough.
>
> As for the MMU dependence, we can actually avoid it? Maybe I missed something,
> we can get object's memory vecs from zsmalloc, then send it to decompress,
> which should support length(memory vecs) > 1?
IIUC the dependency on MMU is due to the use of kmalloc() APIs and the
fact that we may be using highmem pages. I think we may be able to work
around that dependency but I didn't look closely. Hopefully Minchan or
Sergey could shed more light on this.
>
> >
> > [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@google.com/
> >
> > In this analysis, for each of the allocators I ran a kernel build test
> > on tmpfs in a limit cgroup 5 times and captured:
> > (a) The build times.
> > (b) zswap_load() and zswap_store() latencies using bpftrace.
> > (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped.
>
> Here should use /proc/meminfo::Zswap, right?
> Zswap is the sum of pool pages size, Zswapped is the swapped/compressed pages.
Oh yes, it is /proc/meminfo::Zswap actually. I miswrote it in my email.
Thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Analyzing zpool allocators / Removing zbud and z3fold
2024-02-09 3:27 [RFC] Analyzing zpool allocators / Removing zbud and z3fold Yosry Ahmed
2024-02-22 3:54 ` Chengming Zhou
@ 2024-02-22 6:23 ` Nhat Pham
2024-02-22 19:20 ` Yosry Ahmed
2024-02-22 6:46 ` [External] " Zhongkun He
2 siblings, 1 reply; 6+ messages in thread
From: Nhat Pham @ 2024-02-22 6:23 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Andrew Morton, Vitaly Wool, Miaohe Lin, Johannes Weiner,
Linux-MM, Linux Kernel Mailing List, Christoph Hellwig,
Sergey Senozhatsky, Minchan Kim, Chris Down, Seth Jennings,
Dan Streetman, Chris Li
On Fri, Feb 9, 2024 at 10:27 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> I did not perform any sophisticated analysis on these histograms, but
> eyeballing them makes it clear that all allocators have somewhat
> similar latencies. zbud is slightly better than zsmalloc, and z3fold
> is slightly worse than zsmalloc. This corresponds naturally to the
> build times in (a).
>
> (c) Maximum size of the zswap pool
>
> *** zsmalloc ***
> 1,137,659,904 bytes = ~1.13G
>
> *** zbud ***
> 1,535,741,952 bytes = ~1.5G
>
> *** z3fold ***
> 1,151,303,680 bytes = ~1.15G
>
> zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more
> memory. This makes sense because zbud only stores a maximum of two
> compressed pages on each order-0 page, regardless of the compression
> ratio, so it is bound to consume more memory.
>
> -------------------------------- </Results> --------------------------------
>
> According to those results, it seems like zsmalloc is superior to
> z3fold in both efficiency and latency. Zbud has a small latency
> advantage, but that comes with a huge cost in terms of memory
> consumption. Moreover, most known users of zswap are currently using
> zsmalloc. Perhaps some folks are using zbud because it was the default
> allocator up until recently. The only known disadvantage of zsmalloc
> is the dependency on MMU.
>
> Based on that, I think it doesn't make sense to keep all 3 allocators
> going forward. I believe we should start with removing either zbud or
> z3fold, leaving only one allocator supporting MMU. Once zsmalloc
> supports !MMU (if possible), we can keep zsmalloc as the only
> allocator.
>
> Thoughts and feedback are highly appreciated. I tried to CC all the
> interested folks, but others feel free to chime in.
I already voiced my opinion on the other thread, but to reiterate, my
vote is towards deprecating/removing z3fold :)
Unless someone can present a convincing argument/use case/workload,
where z3fold outshines both zbud and zsmalloc, or at least is another
point on the Pareto front of (latency x memory saving).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [External] [RFC] Analyzing zpool allocators / Removing zbud and z3fold
2024-02-09 3:27 [RFC] Analyzing zpool allocators / Removing zbud and z3fold Yosry Ahmed
2024-02-22 3:54 ` Chengming Zhou
2024-02-22 6:23 ` Nhat Pham
@ 2024-02-22 6:46 ` Zhongkun He
2 siblings, 0 replies; 6+ messages in thread
From: Zhongkun He @ 2024-02-22 6:46 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Andrew Morton, Vitaly Wool, Miaohe Lin, Johannes Weiner,
Nhat Pham, Linux-MM, Linux Kernel Mailing List,
Christoph Hellwig, Sergey Senozhatsky, Minchan Kim, Chris Down,
Seth Jennings, Dan Streetman, Chris Li
On Fri, Feb 9, 2024 at 11:28 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Hey folks,
>
> This is a follow up on my previously sent RFC patch to deprecate
> z3fold [1]. This is an RFC without code, I thought I could get some
> discussion going before writing (or rather deleting) more code. I went
> back to do some analysis on the 3 zpool allocators: zbud, zsmalloc,
> and z3fold.
>
> [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@google.com/
>
> In this analysis, for each of the allocators I ran a kernel build test
> on tmpfs in a limit cgroup 5 times and captured:
> (a) The build times.
> (b) zswap_load() and zswap_store() latencies using bpftrace.
> (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped.
>
> Here are the results I have. I am using zsmalloc as the base for all
> comparisons.
>
> -------------------------------- <Results> --------------------------------
>
> (a) Build times
>
> *** zsmalloc ***
> ──────────────────────────────────────────────────────────────
> LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
> ────────────────────┼──────────┼──────────┼──────────┼────────
> real │ 108.890 │ 116.160 │ 111.304 │ 110.310 │ 2.719
> sys │ 6838.860 │ 7137.830 │ 6936.414 │ 6862.160 │ 114.860
> user │ 2838.270 │ 2859.050 │ 2850.116 │ 2852.590 │ 7.388
> ──────────────────────────────────────────────────────────────
>
> *** zbud ***
> ──────────────────────────────────────────────────────────────
> LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
> ────────────────────┼──────────┼──────────┼──────────┼────────
> real │ 105.540 │ 114.430 │ 108.738 │ 108.140 │ 3.027
> sys │ 6553.680 │ 6794.330 │ 6688.184 │ 6661.840 │ 86.471
> user │ 2836.390 │ 2847.850 │ 2842.952 │ 2843.450 │ 3.721
> ──────────────────────────────────────────────────────────────
>
> *** z3fold ***
> ──────────────────────────────────────────────────────────────
> LABEL │ MIN │ MAX │ MEAN │ MEDIAN │ STDDEV
> ────────────────────┼──────────┼──────────┼──────────┼────────
> real │ 113.020 │ 118.110 │ 114.642 │ 114.010 │ 1.803
> sys │ 7168.860 │ 7284.900 │ 7243.930 │ 7265.290 │ 42.254
> user │ 2865.630 │ 2869.840 │ 2868.208 │ 2868.710 │ 1.625
> ──────────────────────────────────────────────────────────────
>
> Comparing the means, zbud is 2.3% faster, and z3fold is 3% slower.
>
> (b) zswap_load() and zswap_store() latencies
>
> *** zsmalloc ***
>
> @load_ns:
> [128, 256) 377 | |
> [256, 512) 772 | |
> [512, 1K) 923 | |
> [1K, 2K) 22141 | |
> [2K, 4K) 88297 | |
> [4K, 8K) 1685833 |@@@@@ |
> [8K, 16K) 17087712 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K) 10875077 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K) 777656 |@@ |
> [64K, 128K) 127239 | |
> [128K, 256K) 50301 | |
> [256K, 512K) 1669 | |
> [512K, 1M) 37 | |
> [1M, 2M) 3 | |
>
> @store_ns:
> [512, 1K) 279 | |
> [1K, 2K) 15969 | |
> [2K, 4K) 193446 | |
> [4K, 8K) 823283 | |
> [8K, 16K) 14209844 |@@@@@@@@@@@ |
> [16K, 32K) 62040863 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32K, 64K) 9737713 |@@@@@@@@ |
> [64K, 128K) 1278302 |@ |
> [128K, 256K) 487285 | |
> [256K, 512K) 4406 | |
> [512K, 1M) 117 | |
> [1M, 2M) 24 | |
>
> *** zbud ***
>
> @load_ns:
> [128, 256) 452 | |
> [256, 512) 834 | |
> [512, 1K) 998 | |
> [1K, 2K) 22708 | |
> [2K, 4K) 171247 | |
> [4K, 8K) 2853227 |@@@@@@@@ |
> [8K, 16K) 17727445 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K) 9523050 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K) 752423 |@@ |
> [64K, 128K) 135560 | |
> [128K, 256K) 52360 | |
> [256K, 512K) 4071 | |
> [512K, 1M) 57 | |
>
> @store_ns:
> [512, 1K) 518 | |
> [1K, 2K) 13337 | |
> [2K, 4K) 193043 | |
> [4K, 8K) 846118 | |
> [8K, 16K) 15240682 |@@@@@@@@@@@@@ |
> [16K, 32K) 60945786 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32K, 64K) 10230719 |@@@@@@@@ |
> [64K, 128K) 1612647 |@ |
> [128K, 256K) 498344 | |
> [256K, 512K) 8550 | |
> [512K, 1M) 199 | |
> [1M, 2M) 1 | |
>
> *** z3fold ***
>
> @load_ns:
> [128, 256) 344 | |
> [256, 512) 999 | |
> [512, 1K) 859 | |
> [1K, 2K) 21069 | |
> [2K, 4K) 53704 | |
> [4K, 8K) 1351571 |@@@@ |
> [8K, 16K) 14142680 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16K, 32K) 11788684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32K, 64K) 1133377 |@@@@ |
> [64K, 128K) 121670 | |
> [128K, 256K) 68663 | |
> [256K, 512K) 120 | |
> [512K, 1M) 21 | |
>
> [512, 1K) 257 | |
> [1K, 2K) 10162 | |
> [2K, 4K) 149599 | |
> [4K, 8K) 648121 | |
> [8K, 16K) 9115497 |@@@@@@@@ |
> [16K, 32K) 56467456 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32K, 64K) 16235236 |@@@@@@@@@@@@@@ |
> [64K, 128K) 1397437 |@ |
> [128K, 256K) 705916 | |
> [256K, 512K) 3087 | |
> [512K, 1M) 62 | |
> [1M, 2M) 1 | |
>
> I did not perform any sophisticated analysis on these histograms, but
> eyeballing them makes it clear that all allocators have somewhat
> similar latencies. zbud is slightly better than zsmalloc, and z3fold
> is slightly worse than zsmalloc. This corresponds naturally to the
> build times in (a).
>
> (c) Maximum size of the zswap pool
>
> *** zsmalloc ***
> 1,137,659,904 bytes = ~1.13G
>
> *** zbud ***
> 1,535,741,952 bytes = ~1.5G
>
> *** z3fold ***
> 1,151,303,680 bytes = ~1.15G
>
> zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more
> memory. This makes sense because zbud only stores a maximum of two
> compressed pages on each order-0 page, regardless of the compression
> ratio, so it is bound to consume more memory.
>
> -------------------------------- </Results> --------------------------------
>
> According to those results, it seems like zsmalloc is superior to
> z3fold in both efficiency and latency. Zbud has a small latency
> advantage, but that comes with a huge cost in terms of memory
> consumption. Moreover, most known users of zswap are currently using
> zsmalloc. Perhaps some folks are using zbud because it was the default
> allocator up until recently. The only known disadvantage of zsmalloc
> is the dependency on MMU.
>
> Based on that, I think it doesn't make sense to keep all 3 allocators
> going forward. I believe we should start with removing either zbud or
> z3fold, leaving only one allocator supporting MMU. Once zsmalloc
> supports !MMU (if possible), we can keep zsmalloc as the only
> allocator.
Hi Yosry, that sounds greate to me.
I was reviewing the code for allocators recently and couldn't find the
advantages of z3fold even without doing performance testing.
It would be better if there was only one allocator which would simplify
the code and interface.
>
> Thoughts and feedback are highly appreciated. I tried to CC all the
> interested folks, but others feel free to chime in.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] Analyzing zpool allocators / Removing zbud and z3fold
2024-02-22 6:23 ` Nhat Pham
@ 2024-02-22 19:20 ` Yosry Ahmed
0 siblings, 0 replies; 6+ messages in thread
From: Yosry Ahmed @ 2024-02-22 19:20 UTC (permalink / raw)
To: Nhat Pham
Cc: Andrew Morton, Vitaly Wool, Miaohe Lin, Johannes Weiner,
Linux-MM, Linux Kernel Mailing List, Christoph Hellwig,
Sergey Senozhatsky, Minchan Kim, Chris Down, Seth Jennings,
Dan Streetman, Chris Li
On Thu, Feb 22, 2024 at 01:23:43PM +0700, Nhat Pham wrote:
> On Fri, Feb 9, 2024 at 10:27 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > I did not perform any sophisticated analysis on these histograms, but
> > eyeballing them makes it clear that all allocators have somewhat
> > similar latencies. zbud is slightly better than zsmalloc, and z3fold
> > is slightly worse than zsmalloc. This corresponds naturally to the
> > build times in (a).
> >
> > (c) Maximum size of the zswap pool
> >
> > *** zsmalloc ***
> > 1,137,659,904 bytes = ~1.13G
> >
> > *** zbud ***
> > 1,535,741,952 bytes = ~1.5G
> >
> > *** z3fold ***
> > 1,151,303,680 bytes = ~1.15G
> >
> > zbud consumes ~32.7% more memory, and z3fold consumes ~1.8% more
> > memory. This makes sense because zbud only stores a maximum of two
> > compressed pages on each order-0 page, regardless of the compression
> > ratio, so it is bound to consume more memory.
> >
> > -------------------------------- </Results> --------------------------------
> >
> > According to those results, it seems like zsmalloc is superior to
> > z3fold in both efficiency and latency. Zbud has a small latency
> > advantage, but that comes with a huge cost in terms of memory
> > consumption. Moreover, most known users of zswap are currently using
> > zsmalloc. Perhaps some folks are using zbud because it was the default
> > allocator up until recently. The only known disadvantage of zsmalloc
> > is the dependency on MMU.
> >
> > Based on that, I think it doesn't make sense to keep all 3 allocators
> > going forward. I believe we should start with removing either zbud or
> > z3fold, leaving only one allocator supporting MMU. Once zsmalloc
> > supports !MMU (if possible), we can keep zsmalloc as the only
> > allocator.
> >
> > Thoughts and feedback are highly appreciated. I tried to CC all the
> > interested folks, but others feel free to chime in.
>
> I already voiced my opinion on the other thread, but to reiterate, my
> vote is towards deprecating/removing z3fold :)
> Unless someone can present a convincing argument/use case/workload,
> where z3fold outshines both zbud and zsmalloc, or at least is another
> point on the Pareto front of (latency x memory saving).
I can re-send the RFC to mark z3fold as deprecated with a reference to
the data here or a quote to some of it. Alternatively, we can remove the
code directly if we believe there are no users.
There were some conflicting opinions last time and I was hoping we can
settle them.
I am also low key hoping Andrew would chime in at some point with what
he prefers (deprecate, remove, or leave as-is).
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-02-22 19:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-09 3:27 [RFC] Analyzing zpool allocators / Removing zbud and z3fold Yosry Ahmed
2024-02-22 3:54 ` Chengming Zhou
2024-02-22 5:56 ` Yosry Ahmed
2024-02-22 6:23 ` Nhat Pham
2024-02-22 19:20 ` Yosry Ahmed
2024-02-22 6:46 ` [External] " Zhongkun He
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox