Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Nhat Pham <nphamcs@gmail.com>, Minchan Kim <minchan@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Brian Geffon <bgeffon@google.com>,
	linux-kernel@vger.kernel.org,
	 Herbert Xu <herbert@gondor.apana.org.au>,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
Date: Mon, 5 Jan 2026 15:58:42 +0000	[thread overview]
Message-ID: <v5veb673xrz6z3tevfdymhuik7nltojrvcqyjih4ds7co4p4hr@5e7ngdkrxo32> (raw)
In-Reply-To: <gg5zdpzrk47tljbnaudcy2gnsodyhmmar23qb57b67bhx6ntje@eq2fcrl2dk4z>

On Mon, Jan 05, 2026 at 10:42:51AM +0900, Sergey Senozhatsky wrote:
> On (26/01/02 18:29), Yosry Ahmed wrote:
> > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> [..]
> > 
> > I worry that the heuristics are too hand-wavy
> 
> I don't disagree.  Am not super excited about the heuristics either.
> 
> > and I wonder if the memcpy savings actually show up as perf improvements
> > in any real life workload. Do we have data about this?
> 
> I don't have real life 16K PAGE_SIZE devices.  However, on 16K PAGE_SIZE
> systems we have "normal" size-classes up to a very large size, and normal
> class means chaining of 0-order physical pages, and chaining means spanning.
> So on 16K memcpy overhead is expected to be somewhat noticeable.

I don't disagree that it could be a problem, I am just against
optimizations without data. It makes it hard to modify these heuristics
later or remove them, since we don't really know what effect they had in
the first place.

We also don't know if the 0.5% increase in memory usage is actually
offset by CPU gains.

> 
> > I also vaguely recall discussions about other ways to avoid the memcpy
> > using scatterlists, so I am wondering if this is the right metric to
> > optimize.
> 
> As far as I understand SG-list based approach is that it will require
> implementing split-data handling on the compression algorithms side,
> which is not trivial (especially if the only reason to do that is
> zsmalloc).

I am not sure tbh, adding Herbert here. I remember looking at the code
in scomp_acomp_comp_decomp() at some point, and I think it will take
care of non-contiguous SG-lists. Not sure if that's the correct place to
look tho.

> 
> Alternatively, we maybe can try to vmap spanning objects:

Using vmap makes sense in theory, but in practice (at least for zswap)
it doesn't help because SG lists do not support vmap addresses. Zswap
will actually treat them the same as highmem and copy them to a buffer
before putting them in an SG list, so we effectively just do the
memcpy() in zswap instead of zsmalloc.

> 
> ---
>  mm/zsmalloc.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 6fc216ab8190..4a68c27cb5d4 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -38,6 +38,7 @@
>  #include <linux/zsmalloc.h>
>  #include <linux/fs.h>
>  #include <linux/workqueue.h>
> +#include <linux/vmalloc.h>
>  #include "zpdesc.h"
>  
>  #define ZSPAGE_MAGIC	0x58
> @@ -1097,19 +1098,15 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
>  		addr = kmap_local_zpdesc(zpdesc);
>  		addr += off;
>  	} else {
> -		size_t sizes[2];
> +		struct page *pages[2];
>  
>  		/* this object spans two pages */
> -		sizes[0] = PAGE_SIZE - off;
> -		sizes[1] = class->size - sizes[0];
> -		addr = local_copy;
> -
> -		memcpy_from_page(addr, zpdesc_page(zpdesc),
> -				 off, sizes[0]);
> -		zpdesc = get_next_zpdesc(zpdesc);
> -		memcpy_from_page(addr + sizes[0],
> -				 zpdesc_page(zpdesc),
> -				 0, sizes[1]);
> +		pages[0] = zpdesc_page(zpdesc);
> +		pages[1] = zpdesc_page(get_next_zpdesc(zpdesc));
> +		addr = vm_map_ram(pages, 2, NUMA_NO_NODE);
> +		if (!addr)
> +			return NULL;
> +		addr += off;
>  	}
>  
>  	if (!ZsHugePage(zspage))
> @@ -1139,6 +1136,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
>  			off += ZS_HANDLE_SIZE;
>  		handle_mem -= off;
>  		kunmap_local(handle_mem);
> +	} else {
> +		if (!ZsHugePage(zspage))
> +			off += ZS_HANDLE_SIZE;
> +		handle_mem -= off;
> +		vm_unmap_ram(handle_mem, 2);
>  	}
>  
>  	zspage_read_unlock(zspage);
> -- 
> 2.52.0.351.gbe84eed79e-goog
> 
> 
> > What are the main pain points for PAGE_SIZE > 4K configs? Is it the
> > compression/decompression time? In my experience this is usually not the
> > bottleneck, I would imagine the real problem would be the internal
> > fragmentation.
> 
> Right, internal fragmentation can be the main problem.

next prev parent reply	other threads:[~2026-01-05 15:58 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-01  1:38 [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings Sergey Senozhatsky
2026-01-01  1:38 ` [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes Sergey Senozhatsky
2026-01-01  1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
2026-01-02 18:29   ` Yosry Ahmed
2026-01-05  1:42     ` Sergey Senozhatsky
2026-01-05  7:23       ` Sergey Senozhatsky
2026-01-05 16:01         ` Yosry Ahmed
2026-01-06  4:10           ` Sergey Senozhatsky
2026-01-05 15:58       ` Yosry Ahmed [this message]
2026-01-06  4:20         ` Sergey Senozhatsky
2026-01-06  4:22           ` Sergey Senozhatsky
2026-01-06  5:08             ` Herbert Xu
2026-01-06 16:24               ` Yosry Ahmed
2026-01-07  5:25                 ` Herbert Xu
2026-01-07  5:39                   ` Yosry Ahmed
2026-01-07  5:42                     ` Herbert Xu
2026-01-07  5:43                     ` Sergey Senozhatsky
2026-01-07 17:12                       ` Yosry Ahmed
2026-01-08  7:37                         ` Sergey Senozhatsky
2026-01-08  8:01                           ` Yosry Ahmed
2026-01-08  8:05                             ` Herbert Xu
2026-01-09  3:29                             ` Sergey Senozhatsky
2026-01-09 16:02                               ` Yosry Ahmed
2026-01-12  5:01                                 ` Herbert Xu
2026-01-12  5:07                                   ` Sergey Senozhatsky
2026-01-12 20:56                                     ` Yosry Ahmed
2026-01-13  2:36                                       ` Sergey Senozhatsky
2026-01-06  9:47           ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=v5veb673xrz6z3tevfdymhuik7nltojrvcqyjih4ds7co4p4hr@5e7ngdkrxo32 \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox