From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Nhat Pham <nphamcs@gmail.com>, Minchan Kim <minchan@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Brian Geffon <bgeffon@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
Date: Fri, 2 Jan 2026 18:29:56 +0000 [thread overview]
Message-ID: <ddfs43qldaws5urlnpah3ibp5xeu7st37p5hgdfajvdtwor4sd@fkcm3brinygo> (raw)
In-Reply-To: <20260101013814.2312147-3-senozhatsky@chromium.org>
On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> This is the first step towards re-thinking optimization strategy
> during chain-size (the number of 0-order physical pages a zspage
> chains for most optimal performance) configuration. Currently,
> we only consider one metric - "wasted" memory - and try various
> chain length configurations in order to find the minimal wasted
> space configuration. However, this strategy doesn't consider
> the fact that our optimization space is not single-dimensional.
> When we increase zspage chain length we at the same increase the
> number of spanning objects (objects that span two physical pages).
> Such objects slow down read() operations because zsmalloc needs to
> kmap both pages and memcpy objects' chunks. This clearly increases
> CPU usage and battery drain.
>
> We, most likely, need to consider numerous metrics and optimize
> in a multi-dimensional space. These can be wired in later on, for
> now we just add some heuristic to increase zspage chain length only
> if there are substantial savings memory usage wise. We can tune
> these threshold values (there is a simple user-space tool [2] to
> experiment with those knobs), but what we currently is already
> interesting enough. Where does this bring us, using a synthetic
> test [1], which produces byte-to-byte comparable workloads, on a
> 4K PAGE_SIZE, chain size 10 system:
>
> BASE
> ====
> zsmalloc_test: num write objects: 339598
> zsmalloc_test: pool pages used 175111, total allocated size 698213488
> zsmalloc_test: pool memory utilization: 97.3
> zsmalloc_test: num read objects: 339598
> zsmalloc_test: spanning objects: 110377, total memcpy size: 278318624
>
> PATCHED
> =======
> zsmalloc_test: num write objects: 339598
> zsmalloc_test: pool pages used 175920, total allocated size 698213488
> zsmalloc_test: pool memory utilization: 96.8
> zsmalloc_test: num read objects: 339598
> zsmalloc_test: spanning objects: 103256, total memcpy size: 265378608
>
> At a price of 0.5% increased pool memory usage there was a 6.5%
> reduction in a number of spanning objects (4.6% less copied bytes).
>
> Note, the results are specific to this particular test case. The
> savings are not uniformly distributed: according to [2] for some
> size classes the reduction in the number of spanning objects
> per-zspage goes down from 7 to 0 (e.g. size class 368), for other
> from 4 to 2 (e.g. size class 640). So the actual memcpy savings
> are data-pattern dependent, as always.
I worry that the heuristics are too hand-wavy, and I wonder if the
memcpy savings actually show up as perf improvements in any real life
workload. Do we have data about this?
I also vaguely recall discussions about other ways to avoid the memcpy
using scatterlists, so I am wondering if this is the right metric to
optimize.
What are the main pain points for PAGE_SIZE > 4K configs? Is it the
compression/decompression time? In my experience this is usually not the
bottleneck, I would imagine the real problem would be the internal
fragmentation.
>
> [1] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/0001-zsmalloc-add-zsmalloc_test-module.patch
> [2] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/simulate_zsmalloc.c
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> mm/zsmalloc.c | 39 +++++++++++++++++++++++++++++++--------
> 1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 5e7501d36161..929db7cf6c19 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -2000,22 +2000,45 @@ static int zs_register_shrinker(struct zs_pool *pool)
> static int calculate_zspage_chain_size(int class_size)
> {
> int i, min_waste = INT_MAX;
> - int chain_size = 1;
> + int best_chain_size = 1;
>
> if (is_power_of_2(class_size))
> - return chain_size;
> + return best_chain_size;
>
> for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
> - int waste;
> + int curr_waste = (i * PAGE_SIZE) % class_size;
>
> - waste = (i * PAGE_SIZE) % class_size;
> - if (waste < min_waste) {
> - min_waste = waste;
> - chain_size = i;
> + if (curr_waste == 0)
> + return i;
> +
> + /*
> + * Accept the new chain size if:
> + * 1. The current best is wasteful (> 10% of zspage size),
> + * accept anything that is better.
> + * 2. The current best is efficient, accept only significant
> + * (25%) improvement.
> + */
> + if (min_waste * 10 > best_chain_size * PAGE_SIZE) {
> + if (curr_waste < min_waste) {
> + min_waste = curr_waste;
> + best_chain_size = i;
> + }
> + } else {
> + if (curr_waste * 4 < min_waste * 3) {
> + min_waste = curr_waste;
> + best_chain_size = i;
> + }
> }
> +
> + /*
> + * If the current best chain has low waste (approx < 1.5%
> + * relative to zspage size) then accept it right away.
> + */
> + if (min_waste * 64 <= best_chain_size * PAGE_SIZE)
> + break;
> }
>
> - return chain_size;
> + return best_chain_size;
> }
>
> /**
> --
> 2.52.0.351.gbe84eed79e-goog
>
prev parent reply other threads:[~2026-01-02 18:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-01 1:38 [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
2026-01-02 18:29 ` Yosry Ahmed [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ddfs43qldaws5urlnpah3ibp5xeu7st37p5hgdfajvdtwor4sd@fkcm3brinygo \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=bgeffon@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox