linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Nhat Pham <nphamcs@gmail.com>, Minchan Kim <minchan@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Brian Geffon <bgeffon@google.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
Date: Fri, 2 Jan 2026 18:29:56 +0000	[thread overview]
Message-ID: <ddfs43qldaws5urlnpah3ibp5xeu7st37p5hgdfajvdtwor4sd@fkcm3brinygo> (raw)
In-Reply-To: <20260101013814.2312147-3-senozhatsky@chromium.org>

On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> This is the first step towards re-thinking optimization strategy
> during chain-size (the number of 0-order physical pages a zspage
> chains for most optimal performance) configuration. Currently,
> we only consider one metric - "wasted" memory - and try various
> chain length configurations in order to find the minimal wasted
> space configuration.  However, this strategy doesn't consider
> the fact that our optimization space is not single-dimensional.
> When we increase zspage chain length we at the same increase the
> number of spanning objects (objects that span two physical pages).
> Such objects slow down read() operations because zsmalloc needs to
> kmap both pages and memcpy objects' chunks.  This clearly increases
> CPU usage and battery drain.
> 
> We, most likely, need to consider numerous metrics and optimize
> in a multi-dimensional space.  These can be wired in later on, for
> now we just add some heuristic to increase zspage chain length only
> if there are substantial savings memory usage wise.  We can tune
> these threshold values (there is a simple user-space tool [2] to
> experiment with those knobs), but what we currently is already
> interesting enough.  Where does this bring us, using a synthetic
> test [1], which produces byte-to-byte comparable workloads, on a
> 4K PAGE_SIZE, chain size 10 system:
> 
> BASE
> ====
>  zsmalloc_test: num write objects: 339598
>  zsmalloc_test: pool pages used 175111, total allocated size 698213488
>  zsmalloc_test: pool memory utilization: 97.3
>  zsmalloc_test: num read objects: 339598
>  zsmalloc_test: spanning objects: 110377, total memcpy size: 278318624
> 
> PATCHED
> =======
>  zsmalloc_test: num write objects: 339598
>  zsmalloc_test: pool pages used 175920, total allocated size 698213488
>  zsmalloc_test: pool memory utilization: 96.8
>  zsmalloc_test: num read objects: 339598
>  zsmalloc_test: spanning objects: 103256, total memcpy size: 265378608
> 
> At a price of 0.5% increased pool memory usage there was a 6.5%
> reduction in a number of spanning objects (4.6% less copied bytes).
> 
> Note, the results are specific to this particular test case.  The
> savings are not uniformly distributed: according to [2] for some
> size classes the reduction in the number of spanning objects
> per-zspage goes down from 7 to 0 (e.g. size class 368), for other
> from 4 to 2 (e.g. size class 640).  So the actual memcpy savings
> are data-pattern dependent, as always.

I worry that the heuristics are too hand-wavy, and I wonder if the
memcpy savings actually show up as perf improvements in any real life
workload. Do we have data about this?

I also vaguely recall discussions about other ways to avoid the memcpy
using scatterlists, so I am wondering if this is the right metric to
optimize.

What are the main pain points for PAGE_SIZE > 4K configs? Is it the
compression/decompression time? In my experience this is usually not the
bottleneck, I would imagine the real problem would be the internal
fragmentation.

> 
> [1] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/0001-zsmalloc-add-zsmalloc_test-module.patch
> [2] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/simulate_zsmalloc.c
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  mm/zsmalloc.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 5e7501d36161..929db7cf6c19 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -2000,22 +2000,45 @@ static int zs_register_shrinker(struct zs_pool *pool)
>  static int calculate_zspage_chain_size(int class_size)
>  {
>  	int i, min_waste = INT_MAX;
> -	int chain_size = 1;
> +	int best_chain_size = 1;
>  
>  	if (is_power_of_2(class_size))
> -		return chain_size;
> +		return best_chain_size;
>  
>  	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
> -		int waste;
> +		int curr_waste = (i * PAGE_SIZE) % class_size;
>  
> -		waste = (i * PAGE_SIZE) % class_size;
> -		if (waste < min_waste) {
> -			min_waste = waste;
> -			chain_size = i;
> +		if (curr_waste == 0)
> +			return i;
> +
> +		/*
> +		 * Accept the new chain size if:
> +		 * 1. The current best is wasteful (> 10% of zspage size),
> +		 *    accept anything that is better.
> +		 * 2. The current best is efficient, accept only significant
> +		 *    (25%) improvement.
> +		 */
> +		if (min_waste * 10 > best_chain_size * PAGE_SIZE) {
> +			if (curr_waste < min_waste) {
> +				min_waste = curr_waste;
> +				best_chain_size = i;
> +			}
> +		} else {
> +			if (curr_waste * 4 < min_waste * 3) {
> +				min_waste = curr_waste;
> +				best_chain_size = i;
> +			}
>  		}
> +
> +		/*
> +		 * If the current best chain has low waste (approx < 1.5%
> +		 * relative to zspage size) then accept it right away.
> +		 */
> +		if (min_waste * 64 <= best_chain_size * PAGE_SIZE)
> +			break;
>  	}
>  
> -	return chain_size;
> +	return best_chain_size;
>  }
>  
>  /**
> -- 
> 2.52.0.351.gbe84eed79e-goog
> 


      reply	other threads:[~2026-01-02 18:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-01  1:38 [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings Sergey Senozhatsky
2026-01-01  1:38 ` [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes Sergey Senozhatsky
2026-01-01  1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
2026-01-02 18:29   ` Yosry Ahmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddfs43qldaws5urlnpah3ibp5xeu7st37p5hgdfajvdtwor4sd@fkcm3brinygo \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox