linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Nhat Pham <nphamcs@gmail.com>, Minchan Kim <minchan@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Brian Geffon <bgeffon@google.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH] zsmalloc: use actual object size to detect spans
Date: Wed, 7 Jan 2026 00:23:25 +0000	[thread overview]
Message-ID: <qh2p3k5c2ulb5g2zmckhrm4o2eoruatb5lnumkrwrvmsxwcn3s@nh5vx4kjn5in> (raw)
In-Reply-To: <20260106042507.2579150-1-senozhatsky@chromium.org>

On Tue, Jan 06, 2026 at 01:25:07PM +0900, Sergey Senozhatsky wrote:
> Using class->size to detect spanning objects is not entirely correct,
> because some size classes can hold a range of object sizes of up to
> class->size bytes in length, due to size-classes merge.  Such classes
> use padding for cases when actually written objects are smaller than
> class->size.  zs_obj_read_begin() can incorrectly hit the slow path
> and perform memcpy of such objects, basically copying padding bytes.
> Instead of class->size zs_obj_read_begin() should use the actual
> compressed object length (both zram and zswap know it) so that it can
> correctly handle situations when a written object is small enough to
> fit into the first physical page.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  drivers/block/zram/zram_drv.c | 14 ++++++++------
>  include/linux/zsmalloc.h      |  4 ++--
>  mm/zsmalloc.c                 | 16 ++++++++++++----
>  mm/zswap.c                    |  5 +++--
>  4 files changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a6587bed6a03..76a54eabe889 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -2065,11 +2065,11 @@ static int read_incompressible_page(struct zram *zram, struct page *page,
>  	void *src, *dst;
>  
>  	handle = get_slot_handle(zram, index);
> -	src = zs_obj_read_begin(zram->mem_pool, handle, NULL);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, PAGE_SIZE, NULL);
>  	dst = kmap_local_page(page);
>  	copy_page(dst, src);
>  	kunmap_local(dst);
> -	zs_obj_read_end(zram->mem_pool, handle, src);
> +	zs_obj_read_end(zram->mem_pool, handle, PAGE_SIZE, src);
>  
>  	return 0;
>  }
> @@ -2087,11 +2087,12 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
>  	prio = get_slot_comp_priority(zram, index);
>  
>  	zstrm = zcomp_stream_get(zram->comps[prio]);
> -	src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, size,
> +				zstrm->local_copy);
>  	dst = kmap_local_page(page);
>  	ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
>  	kunmap_local(dst);
> -	zs_obj_read_end(zram->mem_pool, handle, src);
> +	zs_obj_read_end(zram->mem_pool, handle, size, src);
>  	zcomp_stream_put(zstrm);
>  
>  	return ret;
> @@ -2114,9 +2115,10 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
>  	 * takes place here, as we read raw compressed data.
>  	 */
>  	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
> -	src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, size,
> +				zstrm->local_copy);
>  	memcpy_to_page(page, 0, src, size);
> -	zs_obj_read_end(zram->mem_pool, handle, src);
> +	zs_obj_read_end(zram->mem_pool, handle, size, src);
>  	zcomp_stream_put(zstrm);
>  
>  	return 0;
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index f3ccff2d966c..5565c3171007 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -40,9 +40,9 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size);
>  void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
>  
>  void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> -			void *local_copy);
> +			size_t mem_len, void *local_copy);
>  void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
> -		     void *handle_mem);
> +		     size_t mem_len, void *handle_mem);
>  void zs_obj_write(struct zs_pool *pool, unsigned long handle,
>  		  void *handle_mem, size_t mem_len);
>  
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 84da164dcbc5..ab7767d9d87d 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1065,7 +1065,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
>  EXPORT_SYMBOL_GPL(zs_get_total_pages);
>  
>  void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> -			void *local_copy)
> +			size_t mem_len, void *local_copy)
>  {
>  	struct zspage *zspage;
>  	struct zpdesc *zpdesc;
> @@ -1087,7 +1087,11 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
>  	class = zspage_class(pool, zspage);
>  	off = offset_in_page(class->size * obj_idx);
>  
> -	if (off + class->size <= PAGE_SIZE) {
> +	/* Normal classes have inlined handle */
> +	if (!ZsHugePage(zspage))
> +		mem_len += ZS_HANDLE_SIZE;

Instead of modifying mem_len, can we modify 'off' like zs_obj_write()
and zs_obj_read_end()? I think this can actually be done as a prequel to
this patch. Arguably, it makes more sense as we avoid unnecessarily
copying the handle (completely untested):

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..48c288da43b8 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1087,6 +1087,9 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
        class = zspage_class(pool, zspage);
        off = offset_in_page(class->size * obj_idx);

+       if (!ZsHugePage(zspage))
+               off += ZS_HANDLE_SIZE;
+
        if (off + class->size <= PAGE_SIZE) {
                /* this object is contained entirely within a page */
                addr = kmap_local_zpdesc(zpdesc);
@@ -1107,9 +1110,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
                                 0, sizes[1]);
        }

-       if (!ZsHugePage(zspage))
-               addr += ZS_HANDLE_SIZE;
-
        return addr;
 }
 EXPORT_SYMBOL_GPL(zs_obj_read_begin);
@@ -1129,9 +1129,10 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
        class = zspage_class(pool, zspage);
        off = offset_in_page(class->size * obj_idx);

+       if (!ZsHugePage(zspage))
+               off += ZS_HANDLE_SIZE;
+
        if (off + class->size <= PAGE_SIZE) {
-               if (!ZsHugePage(zspage))
-                       off += ZS_HANDLE_SIZE;
                handle_mem -= off;
                kunmap_local(handle_mem);
        }

---
Does this work?

> +
> +	if (off + mem_len <= PAGE_SIZE) {
>  		/* this object is contained entirely within a page */
>  		addr = kmap_local_zpdesc(zpdesc);
>  		addr += off;

In the else case below (spanning object), should we also use mem_len
instead of class->size to determine the copy size?

> @@ -1115,7 +1119,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
>  EXPORT_SYMBOL_GPL(zs_obj_read_begin);
>  
>  void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
> -		     void *handle_mem)
> +		     size_t mem_len, void *handle_mem)
>  {
>  	struct zspage *zspage;
>  	struct zpdesc *zpdesc;
> @@ -1129,7 +1133,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
>  	class = zspage_class(pool, zspage);
>  	off = offset_in_page(class->size * obj_idx);
>  
> -	if (off + class->size <= PAGE_SIZE) {
> +	/* Normal classes have inlined handle */
> +	if (!ZsHugePage(zspage))
> +		mem_len += ZS_HANDLE_SIZE;
> +
> +	if (off + mem_len <= PAGE_SIZE) {
>  		if (!ZsHugePage(zspage))
>  			off += ZS_HANDLE_SIZE;

With the proposed prequel patch, I think we won't need to handle
ZS_HANDLE_SIZE twice here. WDYT? Did I miss sth?

>  		handle_mem -= off;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index de8858ff1521..a3811b05ab57 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -937,7 +937,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>  	u8 *src, *obj;
>  
>  	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
> -	obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer);
> +	obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length,
> +				acomp_ctx->buffer);
>  
>  	/* zswap entries of length PAGE_SIZE are not compressed. */
>  	if (entry->length == PAGE_SIZE) {
> @@ -966,7 +967,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>  	dlen = acomp_ctx->req->dlen;
>  
>  read_done:
> -	zs_obj_read_end(pool->zs_pool, entry->handle, obj);
> +	zs_obj_read_end(pool->zs_pool, entry->handle, entry->length, obj);
>  	acomp_ctx_put_unlock(acomp_ctx);
>  
>  	if (!decomp_ret && dlen == PAGE_SIZE)
> -- 
> 2.52.0.351.gbe84eed79e-goog
> 


  reply	other threads:[~2026-01-07  0:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-06  4:25 Sergey Senozhatsky
2026-01-07  0:23 ` Yosry Ahmed [this message]
2026-01-07  0:59   ` Sergey Senozhatsky
2026-01-07  1:37     ` Sergey Senozhatsky
2026-01-07  1:56       ` Yosry Ahmed
2026-01-07  2:06         ` Sergey Senozhatsky
2026-01-07  2:10           ` Yosry Ahmed
2026-01-07  2:20             ` Sergey Senozhatsky
2026-01-07  2:22               ` Sergey Senozhatsky
2026-01-07  5:19               ` Yosry Ahmed
2026-01-07  5:30                 ` Sergey Senozhatsky
2026-01-07  7:12                   ` Sergey Senozhatsky
2026-01-07  3:03             ` Sergey Senozhatsky
2026-01-07  5:22               ` Yosry Ahmed
2026-01-07  5:38                 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qh2p3k5c2ulb5g2zmckhrm4o2eoruatb5lnumkrwrvmsxwcn3s@nh5vx4kjn5in \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox