* [PATCH] zsmalloc: use actual object size to detect spans @ 2026-01-06 4:25 Sergey Senozhatsky 2026-01-07 0:23 ` Yosry Ahmed 0 siblings, 1 reply; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-06 4:25 UTC (permalink / raw) To: Andrew Morton, Yosry Ahmed, Nhat Pham Cc: Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm, Sergey Senozhatsky Using class->size to detect spanning objects is not entirely correct, because some size classes can hold a range of object sizes of up to class->size bytes in length, due to size-classes merge. Such classes use padding for cases when actually written objects are smaller than class->size. zs_obj_read_begin() can incorrectly hit the slow path and perform memcpy of such objects, basically copying padding bytes. Instead of class->size zs_obj_read_begin() should use the actual compressed object length (both zram and zswap know it) so that it can correctly handle situations when a written object is small enough to fit into the first physical page. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- drivers/block/zram/zram_drv.c | 14 ++++++++------ include/linux/zsmalloc.h | 4 ++-- mm/zsmalloc.c | 16 ++++++++++++---- mm/zswap.c | 5 +++-- 4 files changed, 25 insertions(+), 14 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index a6587bed6a03..76a54eabe889 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2065,11 +2065,11 @@ static int read_incompressible_page(struct zram *zram, struct page *page, void *src, *dst; handle = get_slot_handle(zram, index); - src = zs_obj_read_begin(zram->mem_pool, handle, NULL); + src = zs_obj_read_begin(zram->mem_pool, handle, PAGE_SIZE, NULL); dst = kmap_local_page(page); copy_page(dst, src); kunmap_local(dst); - zs_obj_read_end(zram->mem_pool, handle, src); + zs_obj_read_end(zram->mem_pool, handle, PAGE_SIZE, src); return 0; } @@ -2087,11 +2087,12 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index) prio = get_slot_comp_priority(zram, index); zstrm = zcomp_stream_get(zram->comps[prio]); - src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy); + src = zs_obj_read_begin(zram->mem_pool, handle, size, + zstrm->local_copy); dst = kmap_local_page(page); ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst); kunmap_local(dst); - zs_obj_read_end(zram->mem_pool, handle, src); + zs_obj_read_end(zram->mem_pool, handle, size, src); zcomp_stream_put(zstrm); return ret; @@ -2114,9 +2115,10 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index) * takes place here, as we read raw compressed data. */ zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]); - src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy); + src = zs_obj_read_begin(zram->mem_pool, handle, size, + zstrm->local_copy); memcpy_to_page(page, 0, src, size); - zs_obj_read_end(zram->mem_pool, handle, src); + zs_obj_read_end(zram->mem_pool, handle, size, src); zcomp_stream_put(zstrm); return 0; diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index f3ccff2d966c..5565c3171007 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -40,9 +40,9 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size); void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, - void *local_copy); + size_t mem_len, void *local_copy); void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, - void *handle_mem); + size_t mem_len, void *handle_mem); void zs_obj_write(struct zs_pool *pool, unsigned long handle, void *handle_mem, size_t mem_len); diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 84da164dcbc5..ab7767d9d87d 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1065,7 +1065,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool) EXPORT_SYMBOL_GPL(zs_get_total_pages); void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, - void *local_copy) + size_t mem_len, void *local_copy) { struct zspage *zspage; struct zpdesc *zpdesc; @@ -1087,7 +1087,11 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); - if (off + class->size <= PAGE_SIZE) { + /* Normal classes have inlined handle */ + if (!ZsHugePage(zspage)) + mem_len += ZS_HANDLE_SIZE; + + if (off + mem_len <= PAGE_SIZE) { /* this object is contained entirely within a page */ addr = kmap_local_zpdesc(zpdesc); addr += off; @@ -1115,7 +1119,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, EXPORT_SYMBOL_GPL(zs_obj_read_begin); void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, - void *handle_mem) + size_t mem_len, void *handle_mem) { struct zspage *zspage; struct zpdesc *zpdesc; @@ -1129,7 +1133,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); - if (off + class->size <= PAGE_SIZE) { + /* Normal classes have inlined handle */ + if (!ZsHugePage(zspage)) + mem_len += ZS_HANDLE_SIZE; + + if (off + mem_len <= PAGE_SIZE) { if (!ZsHugePage(zspage)) off += ZS_HANDLE_SIZE; handle_mem -= off; diff --git a/mm/zswap.c b/mm/zswap.c index de8858ff1521..a3811b05ab57 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -937,7 +937,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) u8 *src, *obj; acomp_ctx = acomp_ctx_get_cpu_lock(pool); - obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer); + obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length, + acomp_ctx->buffer); /* zswap entries of length PAGE_SIZE are not compressed. */ if (entry->length == PAGE_SIZE) { @@ -966,7 +967,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) dlen = acomp_ctx->req->dlen; read_done: - zs_obj_read_end(pool->zs_pool, entry->handle, obj); + zs_obj_read_end(pool->zs_pool, entry->handle, entry->length, obj); acomp_ctx_put_unlock(acomp_ctx); if (!decomp_ret && dlen == PAGE_SIZE) -- 2.52.0.351.gbe84eed79e-goog ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-06 4:25 [PATCH] zsmalloc: use actual object size to detect spans Sergey Senozhatsky @ 2026-01-07 0:23 ` Yosry Ahmed 2026-01-07 0:59 ` Sergey Senozhatsky 0 siblings, 1 reply; 15+ messages in thread From: Yosry Ahmed @ 2026-01-07 0:23 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On Tue, Jan 06, 2026 at 01:25:07PM +0900, Sergey Senozhatsky wrote: > Using class->size to detect spanning objects is not entirely correct, > because some size classes can hold a range of object sizes of up to > class->size bytes in length, due to size-classes merge. Such classes > use padding for cases when actually written objects are smaller than > class->size. zs_obj_read_begin() can incorrectly hit the slow path > and perform memcpy of such objects, basically copying padding bytes. > Instead of class->size zs_obj_read_begin() should use the actual > compressed object length (both zram and zswap know it) so that it can > correctly handle situations when a written object is small enough to > fit into the first physical page. > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> > --- > drivers/block/zram/zram_drv.c | 14 ++++++++------ > include/linux/zsmalloc.h | 4 ++-- > mm/zsmalloc.c | 16 ++++++++++++---- > mm/zswap.c | 5 +++-- > 4 files changed, 25 insertions(+), 14 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index a6587bed6a03..76a54eabe889 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -2065,11 +2065,11 @@ static int read_incompressible_page(struct zram *zram, struct page *page, > void *src, *dst; > > handle = get_slot_handle(zram, index); > - src = zs_obj_read_begin(zram->mem_pool, handle, NULL); > + src = zs_obj_read_begin(zram->mem_pool, handle, PAGE_SIZE, NULL); > dst = kmap_local_page(page); > copy_page(dst, src); > kunmap_local(dst); > - zs_obj_read_end(zram->mem_pool, handle, src); > + zs_obj_read_end(zram->mem_pool, handle, PAGE_SIZE, src); > > return 0; > } > @@ -2087,11 +2087,12 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index) > prio = get_slot_comp_priority(zram, index); > > zstrm = zcomp_stream_get(zram->comps[prio]); > - src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy); > + src = zs_obj_read_begin(zram->mem_pool, handle, size, > + zstrm->local_copy); > dst = kmap_local_page(page); > ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst); > kunmap_local(dst); > - zs_obj_read_end(zram->mem_pool, handle, src); > + zs_obj_read_end(zram->mem_pool, handle, size, src); > zcomp_stream_put(zstrm); > > return ret; > @@ -2114,9 +2115,10 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index) > * takes place here, as we read raw compressed data. > */ > zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]); > - src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy); > + src = zs_obj_read_begin(zram->mem_pool, handle, size, > + zstrm->local_copy); > memcpy_to_page(page, 0, src, size); > - zs_obj_read_end(zram->mem_pool, handle, src); > + zs_obj_read_end(zram->mem_pool, handle, size, src); > zcomp_stream_put(zstrm); > > return 0; > diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h > index f3ccff2d966c..5565c3171007 100644 > --- a/include/linux/zsmalloc.h > +++ b/include/linux/zsmalloc.h > @@ -40,9 +40,9 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size); > void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); > > void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > - void *local_copy); > + size_t mem_len, void *local_copy); > void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > - void *handle_mem); > + size_t mem_len, void *handle_mem); > void zs_obj_write(struct zs_pool *pool, unsigned long handle, > void *handle_mem, size_t mem_len); > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 84da164dcbc5..ab7767d9d87d 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1065,7 +1065,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool) > EXPORT_SYMBOL_GPL(zs_get_total_pages); > > void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > - void *local_copy) > + size_t mem_len, void *local_copy) > { > struct zspage *zspage; > struct zpdesc *zpdesc; > @@ -1087,7 +1087,11 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > class = zspage_class(pool, zspage); > off = offset_in_page(class->size * obj_idx); > > - if (off + class->size <= PAGE_SIZE) { > + /* Normal classes have inlined handle */ > + if (!ZsHugePage(zspage)) > + mem_len += ZS_HANDLE_SIZE; Instead of modifying mem_len, can we modify 'off' like zs_obj_write() and zs_obj_read_end()? I think this can actually be done as a prequel to this patch. Arguably, it makes more sense as we avoid unnecessarily copying the handle (completely untested): diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 5bf832f9c05c..48c288da43b8 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1087,6 +1087,9 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); + if (!ZsHugePage(zspage)) + off += ZS_HANDLE_SIZE; + if (off + class->size <= PAGE_SIZE) { /* this object is contained entirely within a page */ addr = kmap_local_zpdesc(zpdesc); @@ -1107,9 +1110,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, 0, sizes[1]); } - if (!ZsHugePage(zspage)) - addr += ZS_HANDLE_SIZE; - return addr; } EXPORT_SYMBOL_GPL(zs_obj_read_begin); @@ -1129,9 +1129,10 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); + if (!ZsHugePage(zspage)) + off += ZS_HANDLE_SIZE; + if (off + class->size <= PAGE_SIZE) { - if (!ZsHugePage(zspage)) - off += ZS_HANDLE_SIZE; handle_mem -= off; kunmap_local(handle_mem); } --- Does this work? > + > + if (off + mem_len <= PAGE_SIZE) { > /* this object is contained entirely within a page */ > addr = kmap_local_zpdesc(zpdesc); > addr += off; In the else case below (spanning object), should we also use mem_len instead of class->size to determine the copy size? > @@ -1115,7 +1119,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > EXPORT_SYMBOL_GPL(zs_obj_read_begin); > > void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > - void *handle_mem) > + size_t mem_len, void *handle_mem) > { > struct zspage *zspage; > struct zpdesc *zpdesc; > @@ -1129,7 +1133,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > class = zspage_class(pool, zspage); > off = offset_in_page(class->size * obj_idx); > > - if (off + class->size <= PAGE_SIZE) { > + /* Normal classes have inlined handle */ > + if (!ZsHugePage(zspage)) > + mem_len += ZS_HANDLE_SIZE; > + > + if (off + mem_len <= PAGE_SIZE) { > if (!ZsHugePage(zspage)) > off += ZS_HANDLE_SIZE; With the proposed prequel patch, I think we won't need to handle ZS_HANDLE_SIZE twice here. WDYT? Did I miss sth? > handle_mem -= off; > diff --git a/mm/zswap.c b/mm/zswap.c > index de8858ff1521..a3811b05ab57 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -937,7 +937,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) > u8 *src, *obj; > > acomp_ctx = acomp_ctx_get_cpu_lock(pool); > - obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer); > + obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length, > + acomp_ctx->buffer); > > /* zswap entries of length PAGE_SIZE are not compressed. */ > if (entry->length == PAGE_SIZE) { > @@ -966,7 +967,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) > dlen = acomp_ctx->req->dlen; > > read_done: > - zs_obj_read_end(pool->zs_pool, entry->handle, obj); > + zs_obj_read_end(pool->zs_pool, entry->handle, entry->length, obj); > acomp_ctx_put_unlock(acomp_ctx); > > if (!decomp_ret && dlen == PAGE_SIZE) > -- > 2.52.0.351.gbe84eed79e-goog > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 0:23 ` Yosry Ahmed @ 2026-01-07 0:59 ` Sergey Senozhatsky 2026-01-07 1:37 ` Sergey Senozhatsky 0 siblings, 1 reply; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 0:59 UTC (permalink / raw) To: Yosry Ahmed Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 00:23), Yosry Ahmed wrote: > Instead of modifying mem_len, can we modify 'off' like zs_obj_write() > and zs_obj_read_end()? I think this can actually be done as a prequel to > this patch. Arguably, it makes more sense as we avoid unnecessarily > copying the handle (completely untested): > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 5bf832f9c05c..48c288da43b8 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1087,6 +1087,9 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > class = zspage_class(pool, zspage); > off = offset_in_page(class->size * obj_idx); > > + if (!ZsHugePage(zspage)) > + off += ZS_HANDLE_SIZE; > + > if (off + class->size <= PAGE_SIZE) { > /* this object is contained entirely within a page */ > addr = kmap_local_zpdesc(zpdesc); > @@ -1107,9 +1110,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > 0, sizes[1]); > } > > - if (!ZsHugePage(zspage)) > - addr += ZS_HANDLE_SIZE; > - > return addr; > } > EXPORT_SYMBOL_GPL(zs_obj_read_begin); > @@ -1129,9 +1129,10 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > class = zspage_class(pool, zspage); > off = offset_in_page(class->size * obj_idx); > > + if (!ZsHugePage(zspage)) > + off += ZS_HANDLE_SIZE; > + > if (off + class->size <= PAGE_SIZE) { > - if (!ZsHugePage(zspage)) > - off += ZS_HANDLE_SIZE; > handle_mem -= off; > kunmap_local(handle_mem); > } > > --- > Does this work? Sounds interesting. Let me try it out. > > + > > + if (off + mem_len <= PAGE_SIZE) { > > /* this object is contained entirely within a page */ > > addr = kmap_local_zpdesc(zpdesc); > > addr += off; > > In the else case below (spanning object), should we also use mem_len > instead of class->size to determine the copy size? Good catch. > > @@ -1115,7 +1119,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > EXPORT_SYMBOL_GPL(zs_obj_read_begin); > > > > void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > > - void *handle_mem) > > + size_t mem_len, void *handle_mem) > > { > > struct zspage *zspage; > > struct zpdesc *zpdesc; > > @@ -1129,7 +1133,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > > class = zspage_class(pool, zspage); > > off = offset_in_page(class->size * obj_idx); > > > > - if (off + class->size <= PAGE_SIZE) { > > + /* Normal classes have inlined handle */ > > + if (!ZsHugePage(zspage)) > > + mem_len += ZS_HANDLE_SIZE; > > + > > + if (off + mem_len <= PAGE_SIZE) { > > if (!ZsHugePage(zspage)) > > off += ZS_HANDLE_SIZE; > > With the proposed prequel patch, I think we won't need to handle > ZS_HANDLE_SIZE twice here. WDYT? Did I miss sth? Let me look more closely. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 0:59 ` Sergey Senozhatsky @ 2026-01-07 1:37 ` Sergey Senozhatsky 2026-01-07 1:56 ` Yosry Ahmed 0 siblings, 1 reply; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 1:37 UTC (permalink / raw) To: Yosry Ahmed Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm, Sergey Senozhatsky On (26/01/07 09:59), Sergey Senozhatsky wrote: > On (26/01/07 00:23), Yosry Ahmed wrote: > > Instead of modifying mem_len, can we modify 'off' like zs_obj_write() > > and zs_obj_read_end()? I think this can actually be done as a prequel to > > this patch. Arguably, it makes more sense as we avoid unnecessarily > > copying the handle (completely untested): > > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > index 5bf832f9c05c..48c288da43b8 100644 > > --- a/mm/zsmalloc.c > > +++ b/mm/zsmalloc.c > > @@ -1087,6 +1087,9 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > class = zspage_class(pool, zspage); > > off = offset_in_page(class->size * obj_idx); > > > > + if (!ZsHugePage(zspage)) > > + off += ZS_HANDLE_SIZE; > > + > > if (off + class->size <= PAGE_SIZE) { > > /* this object is contained entirely within a page */ > > addr = kmap_local_zpdesc(zpdesc); > > @@ -1107,9 +1110,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > 0, sizes[1]); > > } > > > > - if (!ZsHugePage(zspage)) > > - addr += ZS_HANDLE_SIZE; > > - > > return addr; > > } > > EXPORT_SYMBOL_GPL(zs_obj_read_begin); > > @@ -1129,9 +1129,10 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > > class = zspage_class(pool, zspage); > > off = offset_in_page(class->size * obj_idx); > > > > + if (!ZsHugePage(zspage)) > > + off += ZS_HANDLE_SIZE; > > + > > if (off + class->size <= PAGE_SIZE) { > > - if (!ZsHugePage(zspage)) > > - off += ZS_HANDLE_SIZE; > > handle_mem -= off; > > kunmap_local(handle_mem); > > } > > > > --- > > Does this work? > > Sounds interesting. Let me try it out. I recall us having exactly this idea when we first introduced zs_obj_{read,write}_end() functions, and I do recall that it did not work. Somehow this panics in __memcpy+0xc/0x44. Let me dig into it again. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 1:37 ` Sergey Senozhatsky @ 2026-01-07 1:56 ` Yosry Ahmed 2026-01-07 2:06 ` Sergey Senozhatsky 0 siblings, 1 reply; 15+ messages in thread From: Yosry Ahmed @ 2026-01-07 1:56 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On Wed, Jan 07, 2026 at 10:37:24AM +0900, Sergey Senozhatsky wrote: > On (26/01/07 09:59), Sergey Senozhatsky wrote: > > On (26/01/07 00:23), Yosry Ahmed wrote: > > > Instead of modifying mem_len, can we modify 'off' like zs_obj_write() > > > and zs_obj_read_end()? I think this can actually be done as a prequel to > > > this patch. Arguably, it makes more sense as we avoid unnecessarily > > > copying the handle (completely untested): > > > > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > > index 5bf832f9c05c..48c288da43b8 100644 > > > --- a/mm/zsmalloc.c > > > +++ b/mm/zsmalloc.c > > > @@ -1087,6 +1087,9 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > > class = zspage_class(pool, zspage); > > > off = offset_in_page(class->size * obj_idx); > > > > > > + if (!ZsHugePage(zspage)) > > > + off += ZS_HANDLE_SIZE; > > > + > > > if (off + class->size <= PAGE_SIZE) { > > > /* this object is contained entirely within a page */ > > > addr = kmap_local_zpdesc(zpdesc); > > > @@ -1107,9 +1110,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > > 0, sizes[1]); > > > } > > > > > > - if (!ZsHugePage(zspage)) > > > - addr += ZS_HANDLE_SIZE; > > > - > > > return addr; > > > } > > > EXPORT_SYMBOL_GPL(zs_obj_read_begin); > > > @@ -1129,9 +1129,10 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > > > class = zspage_class(pool, zspage); > > > off = offset_in_page(class->size * obj_idx); > > > > > > + if (!ZsHugePage(zspage)) > > > + off += ZS_HANDLE_SIZE; > > > + > > > if (off + class->size <= PAGE_SIZE) { > > > - if (!ZsHugePage(zspage)) > > > - off += ZS_HANDLE_SIZE; > > > handle_mem -= off; > > > kunmap_local(handle_mem); > > > } > > > > > > --- > > > Does this work? > > > > Sounds interesting. Let me try it out. > > I recall us having exactly this idea when we first introduced > zs_obj_{read,write}_end() functions, and I do recall that it > did not work. Somehow this panics in __memcpy+0xc/0x44. Let > me dig into it again. Maybe because at this point we are trying to memcpy() class->size, which already includes ZS_HANDLE_SIZE. So reading after increasing the offset reads ZS_HANDLE_SIZE after class->size. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 1:56 ` Yosry Ahmed @ 2026-01-07 2:06 ` Sergey Senozhatsky 2026-01-07 2:10 ` Yosry Ahmed 0 siblings, 1 reply; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 2:06 UTC (permalink / raw) To: Yosry Ahmed Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 01:56), Yosry Ahmed wrote: > > I recall us having exactly this idea when we first introduced > > zs_obj_{read,write}_end() functions, and I do recall that it > > did not work. Somehow this panics in __memcpy+0xc/0x44. Let > > me dig into it again. > > Maybe because at this point we are trying to memcpy() class->size, which > already includes ZS_HANDLE_SIZE. So reading after increasing the offset > reads ZS_HANDLE_SIZE after class->size. Yeah, I guess that falsely hits the spanning path because of extra sizeof(unsigned long). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 2:06 ` Sergey Senozhatsky @ 2026-01-07 2:10 ` Yosry Ahmed 2026-01-07 2:20 ` Sergey Senozhatsky 2026-01-07 3:03 ` Sergey Senozhatsky 0 siblings, 2 replies; 15+ messages in thread From: Yosry Ahmed @ 2026-01-07 2:10 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On Wed, Jan 07, 2026 at 11:06:09AM +0900, Sergey Senozhatsky wrote: > On (26/01/07 01:56), Yosry Ahmed wrote: > > > I recall us having exactly this idea when we first introduced > > > zs_obj_{read,write}_end() functions, and I do recall that it > > > did not work. Somehow this panics in __memcpy+0xc/0x44. Let > > > me dig into it again. > > > > Maybe because at this point we are trying to memcpy() class->size, which > > already includes ZS_HANDLE_SIZE. So reading after increasing the offset > > reads ZS_HANDLE_SIZE after class->size. > > Yeah, I guess that falsely hits the spanning path because of extra > sizeof(unsigned long). Or the object could be spanning two pages indeed, but we're copying extra sizeof(unsigned long), that shouldn't crash tho. I think the changes need to be shuffled around to avoid this, or just have a combined patch, which would be less pretty. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 2:10 ` Yosry Ahmed @ 2026-01-07 2:20 ` Sergey Senozhatsky 2026-01-07 2:22 ` Sergey Senozhatsky 2026-01-07 5:19 ` Yosry Ahmed 2026-01-07 3:03 ` Sergey Senozhatsky 1 sibling, 2 replies; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 2:20 UTC (permalink / raw) To: Yosry Ahmed Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 02:10), Yosry Ahmed wrote: > On Wed, Jan 07, 2026 at 11:06:09AM +0900, Sergey Senozhatsky wrote: > > On (26/01/07 01:56), Yosry Ahmed wrote: > > > > I recall us having exactly this idea when we first introduced > > > > zs_obj_{read,write}_end() functions, and I do recall that it > > > > did not work. Somehow this panics in __memcpy+0xc/0x44. Let > > > > me dig into it again. > > > > > > Maybe because at this point we are trying to memcpy() class->size, which > > > already includes ZS_HANDLE_SIZE. So reading after increasing the offset > > > reads ZS_HANDLE_SIZE after class->size. > > > > Yeah, I guess that falsely hits the spanning path because of extra > > sizeof(unsigned long). > > Or the object could be spanning two pages indeed, but we're copying > extra sizeof(unsigned long), that shouldn't crash tho. It seems there is no second page, it's a pow-of-two size class. So we mis-detect spanning. [ 51.406310] zsmalloc: :: size class 48, orig offt 16336, page size 16384, memcpy sizes 40, 8 [ 51.407571] Unable to handle kernel paging request at virtual address ffffc04000000000 [ 51.420816] pc : __memcpy+0xc/0x44 Second memcpy() of sizeof(unsigned long) traps. > I think the changes need to be shuffled around to avoid this, or just > have a combined patch, which would be less pretty. I think I prefer a shuffle. There is another possible improvement point (UNTESTED): if the first page holds only ZS_HANDLE bytes, then we can avoid memcpy() path and instead just kmap the second page + offset. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 2:20 ` Sergey Senozhatsky @ 2026-01-07 2:22 ` Sergey Senozhatsky 2026-01-07 5:19 ` Yosry Ahmed 1 sibling, 0 replies; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 2:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Yosry Ahmed, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 11:20), Sergey Senozhatsky wrote: > It seems there is no second page, it's a pow-of-two size class Oh, class 48. That's certainly not quite pow of 2... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 2:20 ` Sergey Senozhatsky 2026-01-07 2:22 ` Sergey Senozhatsky @ 2026-01-07 5:19 ` Yosry Ahmed 2026-01-07 5:30 ` Sergey Senozhatsky 1 sibling, 1 reply; 15+ messages in thread From: Yosry Ahmed @ 2026-01-07 5:19 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On Wed, Jan 07, 2026 at 11:20:20AM +0900, Sergey Senozhatsky wrote: > On (26/01/07 02:10), Yosry Ahmed wrote: > > On Wed, Jan 07, 2026 at 11:06:09AM +0900, Sergey Senozhatsky wrote: > > > On (26/01/07 01:56), Yosry Ahmed wrote: > > > > > I recall us having exactly this idea when we first introduced > > > > > zs_obj_{read,write}_end() functions, and I do recall that it > > > > > did not work. Somehow this panics in __memcpy+0xc/0x44. Let > > > > > me dig into it again. > > > > > > > > Maybe because at this point we are trying to memcpy() class->size, which > > > > already includes ZS_HANDLE_SIZE. So reading after increasing the offset > > > > reads ZS_HANDLE_SIZE after class->size. > > > > > > Yeah, I guess that falsely hits the spanning path because of extra > > > sizeof(unsigned long). > > > > Or the object could be spanning two pages indeed, but we're copying > > extra sizeof(unsigned long), that shouldn't crash tho. > > It seems there is no second page, it's a pow-of-two size class. So > we mis-detect spanning. > > [ 51.406310] zsmalloc: :: size class 48, orig offt 16336, page size 16384, memcpy sizes 40, 8 > [ 51.407571] Unable to handle kernel paging request at virtual address ffffc04000000000 > [ 51.420816] pc : __memcpy+0xc/0x44 > > Second memcpy() of sizeof(unsigned long) traps. I think this case is exactly what you expected earlier (not sure what you mean by the pow of 2 reply). We increase the offset by 8 bytes (ZS_HANDLE_SIZE), but we still copy 48 bytes, even though 48 bytes includes both the object and ZS_HANDLE_SIZE. So we end up copying 8 bytes beyond the end of the object, which puts us in the next page which we should not be copying. I think to fix the bug at this point we need to subtract ZS_HANDLE_SIZE from class->size before we use it for copying or spanning detection. Something like (untested): diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 5bf832f9c05c..894783d2526c 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1072,6 +1072,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, unsigned long obj, off; unsigned int obj_idx; struct size_class *class; + unsigned long size; void *addr; /* Guarantee we can get zspage from handle safely */ @@ -1087,7 +1088,13 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); - if (off + class->size <= PAGE_SIZE) { + size = class->size; + if (!ZsHugePage(zspage)) { + off += ZS_HANDLE_SIZE; + size -= ZS_HANDLE_SIZE; + } + + if (off + size <= PAGE_SIZE) { /* this object is contained entirely within a page */ addr = kmap_local_zpdesc(zpdesc); addr += off; @@ -1096,7 +1103,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, /* this object spans two pages */ sizes[0] = PAGE_SIZE - off; - sizes[1] = class->size - sizes[0]; + sizes[1] = size - sizes[0]; addr = local_copy; memcpy_from_page(addr, zpdesc_page(zpdesc), @@ -1107,9 +1114,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, 0, sizes[1]); } - if (!ZsHugePage(zspage)) - addr += ZS_HANDLE_SIZE; - return addr; } EXPORT_SYMBOL_GPL(zs_obj_read_begin); @@ -1121,6 +1125,7 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; + unsigned long size; struct size_class *class; obj = handle_to_obj(handle); @@ -1129,9 +1134,13 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); - if (off + class->size <= PAGE_SIZE) { - if (!ZsHugePage(zspage)) - off += ZS_HANDLE_SIZE; + size = class->size; + if (!ZsHugePage(zspage)) { + off += ZS_HANDLE_SIZE; + size -= ZS_HANDLE_SIZE; + } + + if (off + size <= PAGE_SIZE) { handle_mem -= off; kunmap_local(handle_mem); } > > > I think the changes need to be shuffled around to avoid this, or just > > have a combined patch, which would be less pretty. > > I think I prefer a shuffle. > > There is another possible improvement point (UNTESTED): if the first > page holds only ZS_HANDLE bytes, then we can avoid memcpy() path and > instead just kmap the second page + offset. Yeah good point. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 5:19 ` Yosry Ahmed @ 2026-01-07 5:30 ` Sergey Senozhatsky 2026-01-07 7:12 ` Sergey Senozhatsky 0 siblings, 1 reply; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 5:30 UTC (permalink / raw) To: Yosry Ahmed Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 05:19), Yosry Ahmed wrote: > > It seems there is no second page, it's a pow-of-two size class. So > > we mis-detect spanning. > > > > [ 51.406310] zsmalloc: :: size class 48, orig offt 16336, page size 16384, memcpy sizes 40, 8 > > [ 51.407571] Unable to handle kernel paging request at virtual address ffffc04000000000 > > [ 51.420816] pc : __memcpy+0xc/0x44 > > > > Second memcpy() of sizeof(unsigned long) traps. > > I think this case is exactly what you expected earlier (not sure what > you mean by the pow of 2 reply). So "pow-of-two" I was just misguided by the second memcpy() size being 8 (which looked like the whole object fits exactly into its physical page, which I thought was happening only with pow-of-two sizes). > We increase the offset by 8 bytes (ZS_HANDLE_SIZE), but we still copy 48 > bytes, even though 48 bytes includes both the object and ZS_HANDLE_SIZE. > So we end up copying 8 bytes beyond the end of the object, which puts us > in the next page which we should not be copying. Correct. We increased it twice: off +8 and mem_len +8. > I think to fix the bug at this point we need to subtract ZS_HANDLE_SIZE > from class->size before we use it for copying or spanning detection. I just re-shuffled the patches and it seems to be working, passes all my tests. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 5:30 ` Sergey Senozhatsky @ 2026-01-07 7:12 ` Sergey Senozhatsky 0 siblings, 0 replies; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 7:12 UTC (permalink / raw) To: Yosry Ahmed Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm, Sergey Senozhatsky A correction: On (26/01/07 14:30), Sergey Senozhatsky wrote: > > We increase the offset by 8 bytes (ZS_HANDLE_SIZE), but we still copy 48 > > bytes, even though 48 bytes includes both the object and ZS_HANDLE_SIZE. > > So we end up copying 8 bytes beyond the end of the object, which puts us > > in the next page which we should not be copying. > > Correct. We increased it twice: off +8 and mem_len +8. ^^ class->size (implicit +8) I tested your patch as is, w/o applying mine. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 2:10 ` Yosry Ahmed 2026-01-07 2:20 ` Sergey Senozhatsky @ 2026-01-07 3:03 ` Sergey Senozhatsky 2026-01-07 5:22 ` Yosry Ahmed 1 sibling, 1 reply; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 3:03 UTC (permalink / raw) To: Yosry Ahmed Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 02:10), Yosry Ahmed wrote: > I think the changes need to be shuffled around to avoid this, or just > have a combined patch, which would be less pretty. Dunno. Do we want to completely separate HugePage handling and make it a fast path? Then it seems things begin to work. --- diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index cb449acc8809..9b067853b6c2 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1077,6 +1077,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, unsigned long obj, off; unsigned int obj_idx; struct size_class *class; + size_t sizes[2]; void *addr; /* Guarantee we can get zspage from handle safely */ @@ -1089,35 +1090,27 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, zspage_read_lock(zspage); read_unlock(&pool->lock); + /* Fast path for huge size class */ + if (ZsHugePage(zspage)) + return kmap_local_zpdesc(zpdesc); + class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); - /* Normal classes have inlined handle */ - if (!ZsHugePage(zspage)) - mem_len += ZS_HANDLE_SIZE; - + off += ZS_HANDLE_SIZE; if (off + mem_len <= PAGE_SIZE) { /* this object is contained entirely within a page */ - addr = kmap_local_zpdesc(zpdesc); - addr += off; - } else { - size_t sizes[2]; - - /* this object spans two pages */ - sizes[0] = PAGE_SIZE - off; - sizes[1] = mem_len - sizes[0]; - addr = local_copy; - - memcpy_from_page(addr, zpdesc_page(zpdesc), - off, sizes[0]); - zpdesc = get_next_zpdesc(zpdesc); - memcpy_from_page(addr + sizes[0], - zpdesc_page(zpdesc), - 0, sizes[1]); + return kmap_local_zpdesc(zpdesc) + off; } - if (!ZsHugePage(zspage)) - addr += ZS_HANDLE_SIZE; + /* this object spans two pages */ + sizes[0] = PAGE_SIZE - off; + sizes[1] = mem_len - sizes[0]; + addr = local_copy; + + memcpy_from_page(addr, zpdesc_page(zpdesc), off, sizes[0]); + zpdesc = get_next_zpdesc(zpdesc); + memcpy_from_page(addr + sizes[0], zpdesc_page(zpdesc), 0, sizes[1]); return addr; } @@ -1135,20 +1128,21 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, obj = handle_to_obj(handle); obj_to_location(obj, &zpdesc, &obj_idx); zspage = get_zspage(zpdesc); + + /* Fast path for huge size class */ + if (ZsHugePage(zspage)) { + kunmap_local(handle_mem); + goto unlock; + } + class = zspage_class(pool, zspage); off = offset_in_page(class->size * obj_idx); - /* Normal classes have inlined handle */ - if (!ZsHugePage(zspage)) - mem_len += ZS_HANDLE_SIZE; - - if (off + mem_len <= PAGE_SIZE) { - if (!ZsHugePage(zspage)) - off += ZS_HANDLE_SIZE; - handle_mem -= off; + off += ZS_HANDLE_SIZE; + if (off + mem_len <= PAGE_SIZE) kunmap_local(handle_mem); - } +unlock: zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_obj_read_end); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 3:03 ` Sergey Senozhatsky @ 2026-01-07 5:22 ` Yosry Ahmed 2026-01-07 5:38 ` Sergey Senozhatsky 0 siblings, 1 reply; 15+ messages in thread From: Yosry Ahmed @ 2026-01-07 5:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On Wed, Jan 07, 2026 at 12:03:37PM +0900, Sergey Senozhatsky wrote: > On (26/01/07 02:10), Yosry Ahmed wrote: > > I think the changes need to be shuffled around to avoid this, or just > > have a combined patch, which would be less pretty. > > Dunno. Do we want to completely separate HugePage handling > and make it a fast path? Then it seems things begin to work. HugePage should always be PAGE_SIZE, so never spans two pages, right? I like separating the logic because it's cleaner, but I want us to understand the problem first (see my other reply) instead of just papering over it. > > --- > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index cb449acc8809..9b067853b6c2 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1077,6 +1077,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > unsigned long obj, off; > unsigned int obj_idx; > struct size_class *class; > + size_t sizes[2]; > void *addr; > > /* Guarantee we can get zspage from handle safely */ > @@ -1089,35 +1090,27 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > zspage_read_lock(zspage); > read_unlock(&pool->lock); > > + /* Fast path for huge size class */ > + if (ZsHugePage(zspage)) > + return kmap_local_zpdesc(zpdesc); Can we WARN here if somehow the HugePage is spanning two pages? > + > class = zspage_class(pool, zspage); > off = offset_in_page(class->size * obj_idx); > > - /* Normal classes have inlined handle */ > - if (!ZsHugePage(zspage)) > - mem_len += ZS_HANDLE_SIZE; > - > + off += ZS_HANDLE_SIZE; > if (off + mem_len <= PAGE_SIZE) { > /* this object is contained entirely within a page */ > - addr = kmap_local_zpdesc(zpdesc); > - addr += off; > - } else { > - size_t sizes[2]; > - > - /* this object spans two pages */ > - sizes[0] = PAGE_SIZE - off; > - sizes[1] = mem_len - sizes[0]; > - addr = local_copy; > - > - memcpy_from_page(addr, zpdesc_page(zpdesc), > - off, sizes[0]); > - zpdesc = get_next_zpdesc(zpdesc); > - memcpy_from_page(addr + sizes[0], > - zpdesc_page(zpdesc), > - 0, sizes[1]); > + return kmap_local_zpdesc(zpdesc) + off; > } > > - if (!ZsHugePage(zspage)) > - addr += ZS_HANDLE_SIZE; > + /* this object spans two pages */ > + sizes[0] = PAGE_SIZE - off; > + sizes[1] = mem_len - sizes[0]; > + addr = local_copy; > + > + memcpy_from_page(addr, zpdesc_page(zpdesc), off, sizes[0]); > + zpdesc = get_next_zpdesc(zpdesc); > + memcpy_from_page(addr + sizes[0], zpdesc_page(zpdesc), 0, sizes[1]); > > return addr; > } > @@ -1135,20 +1128,21 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, > obj = handle_to_obj(handle); > obj_to_location(obj, &zpdesc, &obj_idx); > zspage = get_zspage(zpdesc); > + > + /* Fast path for huge size class */ > + if (ZsHugePage(zspage)) { > + kunmap_local(handle_mem); > + goto unlock; > + } > + > class = zspage_class(pool, zspage); > off = offset_in_page(class->size * obj_idx); > > - /* Normal classes have inlined handle */ > - if (!ZsHugePage(zspage)) > - mem_len += ZS_HANDLE_SIZE; > - > - if (off + mem_len <= PAGE_SIZE) { > - if (!ZsHugePage(zspage)) > - off += ZS_HANDLE_SIZE; > - handle_mem -= off; > + off += ZS_HANDLE_SIZE; > + if (off + mem_len <= PAGE_SIZE) > kunmap_local(handle_mem); > - } > > +unlock: > zspage_read_unlock(zspage); > } > EXPORT_SYMBOL_GPL(zs_obj_read_end); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] zsmalloc: use actual object size to detect spans 2026-01-07 5:22 ` Yosry Ahmed @ 2026-01-07 5:38 ` Sergey Senozhatsky 0 siblings, 0 replies; 15+ messages in thread From: Sergey Senozhatsky @ 2026-01-07 5:38 UTC (permalink / raw) To: Yosry Ahmed Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel, linux-mm On (26/01/07 05:22), Yosry Ahmed wrote: > On Wed, Jan 07, 2026 at 12:03:37PM +0900, Sergey Senozhatsky wrote: > > On (26/01/07 02:10), Yosry Ahmed wrote: > > > I think the changes need to be shuffled around to avoid this, or just > > > have a combined patch, which would be less pretty. > > > > Dunno. Do we want to completely separate HugePage handling > > and make it a fast path? Then it seems things begin to work. > > HugePage should always be PAGE_SIZE, so never spans two pages, right? Right if (unlikely(class->objs_per_zspage == 1 && class->pages_per_zspage == 1)) SetZsHugePage(zspage); > I like separating the logic because it's cleaner, but I want us to > understand the problem first (see my other reply) instead of just > papering over it. Sure. > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > index cb449acc8809..9b067853b6c2 100644 > > --- a/mm/zsmalloc.c > > +++ b/mm/zsmalloc.c > > @@ -1077,6 +1077,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > unsigned long obj, off; > > unsigned int obj_idx; > > struct size_class *class; > > + size_t sizes[2]; > > void *addr; > > > > /* Guarantee we can get zspage from handle safely */ > > @@ -1089,35 +1090,27 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, > > zspage_read_lock(zspage); > > read_unlock(&pool->lock); > > > > + /* Fast path for huge size class */ > > + if (ZsHugePage(zspage)) > > + return kmap_local_zpdesc(zpdesc); > > Can we WARN here if somehow the HugePage is spanning two pages? I can add a WARN, but that really cannot happen. We always allocate just one physical page per zspage for such size classes. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-01-07 7:12 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-01-06 4:25 [PATCH] zsmalloc: use actual object size to detect spans Sergey Senozhatsky 2026-01-07 0:23 ` Yosry Ahmed 2026-01-07 0:59 ` Sergey Senozhatsky 2026-01-07 1:37 ` Sergey Senozhatsky 2026-01-07 1:56 ` Yosry Ahmed 2026-01-07 2:06 ` Sergey Senozhatsky 2026-01-07 2:10 ` Yosry Ahmed 2026-01-07 2:20 ` Sergey Senozhatsky 2026-01-07 2:22 ` Sergey Senozhatsky 2026-01-07 5:19 ` Yosry Ahmed 2026-01-07 5:30 ` Sergey Senozhatsky 2026-01-07 7:12 ` Sergey Senozhatsky 2026-01-07 3:03 ` Sergey Senozhatsky 2026-01-07 5:22 ` Yosry Ahmed 2026-01-07 5:38 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox