* [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings
@ 2026-01-01 1:38 Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
0 siblings, 2 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-01 1:38 UTC (permalink / raw)
To: Andrew Morton, Yosry Ahmed, Nhat Pham
Cc: Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel,
linux-mm, Sergey Senozhatsky
This is an RFC series that follows up on the handling of 16K
PAGE_SIZE discussion [1].
[1] https://lore.kernel.org/linux-mm/fui4gqm6pealaxooz3xv3dnnqxscefyvhw5bhntedwh4tgjvdq@ootmbuoc3dpa
Sergey Senozhatsky (2):
zsmalloc: drop hard limit on the number of size classes
zsmalloc: chain-length configuration should consider other metrics
mm/zsmalloc.c | 48 ++++++++++++++++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 10 deletions(-)
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes
2026-01-01 1:38 [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings Sergey Senozhatsky
@ 2026-01-01 1:38 ` Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
1 sibling, 0 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-01 1:38 UTC (permalink / raw)
To: Andrew Morton, Yosry Ahmed, Nhat Pham
Cc: Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel,
linux-mm, Sergey Senozhatsky
For the reasons unknown, zsmalloc limits the number of size-classes
to 256. On PAGE_SIZE 4K systems this works pretty fine, as those
256 classes are 4096/256 = 16 (known as size class delta) bytes apart.
However, as the PAGE_SIZE grows, e.g. 16K, the hard limit pushes the
size-class delta significantly further (e.g. 16384/256 = 64) leading
to increased internal fragmentation. For example, on a 16K page system,
an object of size 65 bytes is rounded up to the next 64-byte boundary
(128 bytes), wasting nearly 50% of the allocated space.
Instead of calculating size-class delta based on both PAGE_SIZE and
hard limit of 256 the ZS_SIZE_CLASS_DELTA is set to constant value
of 16 bytes. This results in a much higher than 256 number of size
classes on systems with PAGE_SIZE larger than 4K. These extra size
classes split existing cluster into smaller ones. For example, using
tool [1] 16K PAGE_SIZE, chain size 8:
BASE (delta 64 bytes)
=====================
Log | Phys | Chain | Objs/Page | TailWaste | MergeWaste
[..]
1072 | 1120 | 8 | 117 | 32 | 5616
1088 | 1120 | 8 | 117 | 32 | 3744
1104 | 1120 | 8 | 117 | 32 | 1872
1120 | 1120 | 8 | 117 | 32 | 0
[..]
PATCHED (delta 16 bytes)
========================
[..]
1072 | 1072 | 4 | 61 | 144 | 0
1088 | 1088 | 1 | 15 | 64 | 0
1104 | 1104 | 6 | 89 | 48 | 0
1120 | 1120 | 8 | 117 | 32 | 0
[..]
In default configuration (delta 64) size classes 1072 to 1104
are merged into 1120. Size class 1120 holds 117 objects
per-zspage, so worst case every zspage can lose 5616 bytes
(1120-1072 times 117). With delta 16 this cluster doesn't
exist, reducing memory waste.
[1] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/simulate_zsmalloc.c
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
mm/zsmalloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..5e7501d36161 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -92,7 +92,7 @@
#define HUGE_BITS 1
#define FULLNESS_BITS 4
-#define CLASS_BITS 8
+#define CLASS_BITS 12
#define MAGIC_VAL_BITS 8
#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL))
@@ -115,8 +115,13 @@
*
* ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
* (reason above)
+ *
+ * We set ZS_SIZE_CLASS_DELTA to 16 bytes to maintain high granularity
+ * even on systems with large PAGE_SIZE (e.g. 16K, 64K). This prevents
+ * internal fragmentation. CLASS_BITS is increased to 12 to address the
+ * larger number of size classes on such systems (up to 4096 classes on 64K).
*/
-#define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS)
+#define ZS_SIZE_CLASS_DELTA 16
#define ZS_SIZE_CLASSES (DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \
ZS_SIZE_CLASS_DELTA) + 1)
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-01 1:38 [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes Sergey Senozhatsky
@ 2026-01-01 1:38 ` Sergey Senozhatsky
2026-01-02 18:29 ` Yosry Ahmed
1 sibling, 1 reply; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-01 1:38 UTC (permalink / raw)
To: Andrew Morton, Yosry Ahmed, Nhat Pham
Cc: Minchan Kim, Johannes Weiner, Brian Geffon, linux-kernel,
linux-mm, Sergey Senozhatsky
This is the first step towards re-thinking optimization strategy
during chain-size (the number of 0-order physical pages a zspage
chains for most optimal performance) configuration. Currently,
we only consider one metric - "wasted" memory - and try various
chain length configurations in order to find the minimal wasted
space configuration. However, this strategy doesn't consider
the fact that our optimization space is not single-dimensional.
When we increase zspage chain length we at the same increase the
number of spanning objects (objects that span two physical pages).
Such objects slow down read() operations because zsmalloc needs to
kmap both pages and memcpy objects' chunks. This clearly increases
CPU usage and battery drain.
We, most likely, need to consider numerous metrics and optimize
in a multi-dimensional space. These can be wired in later on, for
now we just add some heuristic to increase zspage chain length only
if there are substantial savings memory usage wise. We can tune
these threshold values (there is a simple user-space tool [2] to
experiment with those knobs), but what we currently is already
interesting enough. Where does this bring us, using a synthetic
test [1], which produces byte-to-byte comparable workloads, on a
4K PAGE_SIZE, chain size 10 system:
BASE
====
zsmalloc_test: num write objects: 339598
zsmalloc_test: pool pages used 175111, total allocated size 698213488
zsmalloc_test: pool memory utilization: 97.3
zsmalloc_test: num read objects: 339598
zsmalloc_test: spanning objects: 110377, total memcpy size: 278318624
PATCHED
=======
zsmalloc_test: num write objects: 339598
zsmalloc_test: pool pages used 175920, total allocated size 698213488
zsmalloc_test: pool memory utilization: 96.8
zsmalloc_test: num read objects: 339598
zsmalloc_test: spanning objects: 103256, total memcpy size: 265378608
At a price of 0.5% increased pool memory usage there was a 6.5%
reduction in a number of spanning objects (4.6% less copied bytes).
Note, the results are specific to this particular test case. The
savings are not uniformly distributed: according to [2] for some
size classes the reduction in the number of spanning objects
per-zspage goes down from 7 to 0 (e.g. size class 368), for other
from 4 to 2 (e.g. size class 640). So the actual memcpy savings
are data-pattern dependent, as always.
[1] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/0001-zsmalloc-add-zsmalloc_test-module.patch
[2] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/simulate_zsmalloc.c
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
mm/zsmalloc.c | 39 +++++++++++++++++++++++++++++++--------
1 file changed, 31 insertions(+), 8 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5e7501d36161..929db7cf6c19 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2000,22 +2000,45 @@ static int zs_register_shrinker(struct zs_pool *pool)
static int calculate_zspage_chain_size(int class_size)
{
int i, min_waste = INT_MAX;
- int chain_size = 1;
+ int best_chain_size = 1;
if (is_power_of_2(class_size))
- return chain_size;
+ return best_chain_size;
for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
- int waste;
+ int curr_waste = (i * PAGE_SIZE) % class_size;
- waste = (i * PAGE_SIZE) % class_size;
- if (waste < min_waste) {
- min_waste = waste;
- chain_size = i;
+ if (curr_waste == 0)
+ return i;
+
+ /*
+ * Accept the new chain size if:
+ * 1. The current best is wasteful (> 10% of zspage size),
+ * accept anything that is better.
+ * 2. The current best is efficient, accept only significant
+ * (25%) improvement.
+ */
+ if (min_waste * 10 > best_chain_size * PAGE_SIZE) {
+ if (curr_waste < min_waste) {
+ min_waste = curr_waste;
+ best_chain_size = i;
+ }
+ } else {
+ if (curr_waste * 4 < min_waste * 3) {
+ min_waste = curr_waste;
+ best_chain_size = i;
+ }
}
+
+ /*
+ * If the current best chain has low waste (approx < 1.5%
+ * relative to zspage size) then accept it right away.
+ */
+ if (min_waste * 64 <= best_chain_size * PAGE_SIZE)
+ break;
}
- return chain_size;
+ return best_chain_size;
}
/**
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-01 1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
@ 2026-01-02 18:29 ` Yosry Ahmed
2026-01-05 1:42 ` Sergey Senozhatsky
0 siblings, 1 reply; 14+ messages in thread
From: Yosry Ahmed @ 2026-01-02 18:29 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner,
Brian Geffon, linux-kernel, linux-mm
On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> This is the first step towards re-thinking optimization strategy
> during chain-size (the number of 0-order physical pages a zspage
> chains for most optimal performance) configuration. Currently,
> we only consider one metric - "wasted" memory - and try various
> chain length configurations in order to find the minimal wasted
> space configuration. However, this strategy doesn't consider
> the fact that our optimization space is not single-dimensional.
> When we increase zspage chain length we at the same increase the
> number of spanning objects (objects that span two physical pages).
> Such objects slow down read() operations because zsmalloc needs to
> kmap both pages and memcpy objects' chunks. This clearly increases
> CPU usage and battery drain.
>
> We, most likely, need to consider numerous metrics and optimize
> in a multi-dimensional space. These can be wired in later on, for
> now we just add some heuristic to increase zspage chain length only
> if there are substantial savings memory usage wise. We can tune
> these threshold values (there is a simple user-space tool [2] to
> experiment with those knobs), but what we currently is already
> interesting enough. Where does this bring us, using a synthetic
> test [1], which produces byte-to-byte comparable workloads, on a
> 4K PAGE_SIZE, chain size 10 system:
>
> BASE
> ====
> zsmalloc_test: num write objects: 339598
> zsmalloc_test: pool pages used 175111, total allocated size 698213488
> zsmalloc_test: pool memory utilization: 97.3
> zsmalloc_test: num read objects: 339598
> zsmalloc_test: spanning objects: 110377, total memcpy size: 278318624
>
> PATCHED
> =======
> zsmalloc_test: num write objects: 339598
> zsmalloc_test: pool pages used 175920, total allocated size 698213488
> zsmalloc_test: pool memory utilization: 96.8
> zsmalloc_test: num read objects: 339598
> zsmalloc_test: spanning objects: 103256, total memcpy size: 265378608
>
> At a price of 0.5% increased pool memory usage there was a 6.5%
> reduction in a number of spanning objects (4.6% less copied bytes).
>
> Note, the results are specific to this particular test case. The
> savings are not uniformly distributed: according to [2] for some
> size classes the reduction in the number of spanning objects
> per-zspage goes down from 7 to 0 (e.g. size class 368), for other
> from 4 to 2 (e.g. size class 640). So the actual memcpy savings
> are data-pattern dependent, as always.
I worry that the heuristics are too hand-wavy, and I wonder if the
memcpy savings actually show up as perf improvements in any real life
workload. Do we have data about this?
I also vaguely recall discussions about other ways to avoid the memcpy
using scatterlists, so I am wondering if this is the right metric to
optimize.
What are the main pain points for PAGE_SIZE > 4K configs? Is it the
compression/decompression time? In my experience this is usually not the
bottleneck, I would imagine the real problem would be the internal
fragmentation.
>
> [1] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/0001-zsmalloc-add-zsmalloc_test-module.patch
> [2] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/simulate_zsmalloc.c
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> mm/zsmalloc.c | 39 +++++++++++++++++++++++++++++++--------
> 1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 5e7501d36161..929db7cf6c19 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -2000,22 +2000,45 @@ static int zs_register_shrinker(struct zs_pool *pool)
> static int calculate_zspage_chain_size(int class_size)
> {
> int i, min_waste = INT_MAX;
> - int chain_size = 1;
> + int best_chain_size = 1;
>
> if (is_power_of_2(class_size))
> - return chain_size;
> + return best_chain_size;
>
> for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
> - int waste;
> + int curr_waste = (i * PAGE_SIZE) % class_size;
>
> - waste = (i * PAGE_SIZE) % class_size;
> - if (waste < min_waste) {
> - min_waste = waste;
> - chain_size = i;
> + if (curr_waste == 0)
> + return i;
> +
> + /*
> + * Accept the new chain size if:
> + * 1. The current best is wasteful (> 10% of zspage size),
> + * accept anything that is better.
> + * 2. The current best is efficient, accept only significant
> + * (25%) improvement.
> + */
> + if (min_waste * 10 > best_chain_size * PAGE_SIZE) {
> + if (curr_waste < min_waste) {
> + min_waste = curr_waste;
> + best_chain_size = i;
> + }
> + } else {
> + if (curr_waste * 4 < min_waste * 3) {
> + min_waste = curr_waste;
> + best_chain_size = i;
> + }
> }
> +
> + /*
> + * If the current best chain has low waste (approx < 1.5%
> + * relative to zspage size) then accept it right away.
> + */
> + if (min_waste * 64 <= best_chain_size * PAGE_SIZE)
> + break;
> }
>
> - return chain_size;
> + return best_chain_size;
> }
>
> /**
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-02 18:29 ` Yosry Ahmed
@ 2026-01-05 1:42 ` Sergey Senozhatsky
2026-01-05 7:23 ` Sergey Senozhatsky
2026-01-05 15:58 ` Yosry Ahmed
0 siblings, 2 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-05 1:42 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, linux-mm
On (26/01/02 18:29), Yosry Ahmed wrote:
> On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
[..]
>
> I worry that the heuristics are too hand-wavy
I don't disagree. Am not super excited about the heuristics either.
> and I wonder if the memcpy savings actually show up as perf improvements
> in any real life workload. Do we have data about this?
I don't have real life 16K PAGE_SIZE devices. However, on 16K PAGE_SIZE
systems we have "normal" size-classes up to a very large size, and normal
class means chaining of 0-order physical pages, and chaining means spanning.
So on 16K memcpy overhead is expected to be somewhat noticeable.
> I also vaguely recall discussions about other ways to avoid the memcpy
> using scatterlists, so I am wondering if this is the right metric to
> optimize.
As far as I understand SG-list based approach is that it will require
implementing split-data handling on the compression algorithms side,
which is not trivial (especially if the only reason to do that is
zsmalloc).
Alternatively, we maybe can try to vmap spanning objects:
---
mm/zsmalloc.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 6fc216ab8190..4a68c27cb5d4 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -38,6 +38,7 @@
#include <linux/zsmalloc.h>
#include <linux/fs.h>
#include <linux/workqueue.h>
+#include <linux/vmalloc.h>
#include "zpdesc.h"
#define ZSPAGE_MAGIC 0x58
@@ -1097,19 +1098,15 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
addr = kmap_local_zpdesc(zpdesc);
addr += off;
} else {
- size_t sizes[2];
+ struct page *pages[2];
/* this object spans two pages */
- sizes[0] = PAGE_SIZE - off;
- sizes[1] = class->size - sizes[0];
- addr = local_copy;
-
- memcpy_from_page(addr, zpdesc_page(zpdesc),
- off, sizes[0]);
- zpdesc = get_next_zpdesc(zpdesc);
- memcpy_from_page(addr + sizes[0],
- zpdesc_page(zpdesc),
- 0, sizes[1]);
+ pages[0] = zpdesc_page(zpdesc);
+ pages[1] = zpdesc_page(get_next_zpdesc(zpdesc));
+ addr = vm_map_ram(pages, 2, NUMA_NO_NODE);
+ if (!addr)
+ return NULL;
+ addr += off;
}
if (!ZsHugePage(zspage))
@@ -1139,6 +1136,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
off += ZS_HANDLE_SIZE;
handle_mem -= off;
kunmap_local(handle_mem);
+ } else {
+ if (!ZsHugePage(zspage))
+ off += ZS_HANDLE_SIZE;
+ handle_mem -= off;
+ vm_unmap_ram(handle_mem, 2);
}
zspage_read_unlock(zspage);
--
2.52.0.351.gbe84eed79e-goog
> What are the main pain points for PAGE_SIZE > 4K configs? Is it the
> compression/decompression time? In my experience this is usually not the
> bottleneck, I would imagine the real problem would be the internal
> fragmentation.
Right, internal fragmentation can be the main problem.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-05 1:42 ` Sergey Senozhatsky
@ 2026-01-05 7:23 ` Sergey Senozhatsky
2026-01-05 16:01 ` Yosry Ahmed
2026-01-05 15:58 ` Yosry Ahmed
1 sibling, 1 reply; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-05 7:23 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner,
Brian Geffon, linux-kernel, linux-mm, Sergey Senozhatsky
On (26/01/05 10:42), Sergey Senozhatsky wrote:
> On (26/01/02 18:29), Yosry Ahmed wrote:
> > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> [..]
> >
> > I worry that the heuristics are too hand-wavy
>
> I don't disagree. Am not super excited about the heuristics either.
>
> > and I wonder if the memcpy savings actually show up as perf improvements
> > in any real life workload. Do we have data about this?
>
> I don't have real life 16K PAGE_SIZE devices. However, on 16K PAGE_SIZE
> systems we have "normal" size-classes up to a very large size, and normal
> class means chaining of 0-order physical pages, and chaining means spanning.
> So on 16K memcpy overhead is expected to be somewhat noticeable.
By the way, while looking at it, I think we need to "fix" obj_read_begin().
Currently, it uses "off + class->size" to detect spanning objects, which is
incorrect: size classes get merged, so a typical size class can hold a range
of sizes, using padding for smaller objects. So instead of class->size we
need to use the actual compressed objects size, just in case if actual written
size was small enough to fit into the first physical page (we do that in
obj_write()). I'll cook a patch.
Something like this:
---
drivers/block/zram/zram_drv.c | 8 +++++---
include/linux/zsmalloc.h | 2 +-
mm/zsmalloc.c | 4 ++--
mm/zswap.c | 3 ++-
4 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a6587bed6a03..b371ba6bfec2 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2065,7 +2065,7 @@ static int read_incompressible_page(struct zram *zram, struct page *page,
void *src, *dst;
handle = get_slot_handle(zram, index);
- src = zs_obj_read_begin(zram->mem_pool, handle, NULL);
+ src = zs_obj_read_begin(zram->mem_pool, handle, PAGE_SIZE, NULL);
dst = kmap_local_page(page);
copy_page(dst, src);
kunmap_local(dst);
@@ -2087,7 +2087,8 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
prio = get_slot_comp_priority(zram, index);
zstrm = zcomp_stream_get(zram->comps[prio]);
- src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
+ src = zs_obj_read_begin(zram->mem_pool, handle, size,
+ zstrm->local_copy);
dst = kmap_local_page(page);
ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
kunmap_local(dst);
@@ -2114,7 +2115,8 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
* takes place here, as we read raw compressed data.
*/
zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
- src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
+ src = zs_obj_read_begin(zram->mem_pool, handle, size,
+ zstrm->local_copy);
memcpy_to_page(page, 0, src, size);
zs_obj_read_end(zram->mem_pool, handle, src);
zcomp_stream_put(zstrm);
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index f3ccff2d966c..64f65c1f14d6 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -40,7 +40,7 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size);
void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
- void *local_copy);
+ size_t mem_len, void *local_copy);
void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
void *handle_mem);
void zs_obj_write(struct zs_pool *pool, unsigned long handle,
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index be385609ef8a..2da60c23cd18 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1070,7 +1070,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
EXPORT_SYMBOL_GPL(zs_get_total_pages);
void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
- void *local_copy)
+ size_t mem_len, void *local_copy)
{
struct zspage *zspage;
struct zpdesc *zpdesc;
@@ -1092,7 +1092,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);
- if (off + class->size <= PAGE_SIZE) {
+ if (off + mem_len <= PAGE_SIZE) {
/* this object is contained entirely within a page */
addr = kmap_local_zpdesc(zpdesc);
addr += off;
diff --git a/mm/zswap.c b/mm/zswap.c
index de8858ff1521..291352629616 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -937,7 +937,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
u8 *src, *obj;
acomp_ctx = acomp_ctx_get_cpu_lock(pool);
- obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer);
+ obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length,
+ acomp_ctx->buffer);
/* zswap entries of length PAGE_SIZE are not compressed. */
if (entry->length == PAGE_SIZE) {
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-05 1:42 ` Sergey Senozhatsky
2026-01-05 7:23 ` Sergey Senozhatsky
@ 2026-01-05 15:58 ` Yosry Ahmed
2026-01-06 4:20 ` Sergey Senozhatsky
1 sibling, 1 reply; 14+ messages in thread
From: Yosry Ahmed @ 2026-01-05 15:58 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner,
Brian Geffon, linux-kernel, Herbert Xu, linux-mm
On Mon, Jan 05, 2026 at 10:42:51AM +0900, Sergey Senozhatsky wrote:
> On (26/01/02 18:29), Yosry Ahmed wrote:
> > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> [..]
> >
> > I worry that the heuristics are too hand-wavy
>
> I don't disagree. Am not super excited about the heuristics either.
>
> > and I wonder if the memcpy savings actually show up as perf improvements
> > in any real life workload. Do we have data about this?
>
> I don't have real life 16K PAGE_SIZE devices. However, on 16K PAGE_SIZE
> systems we have "normal" size-classes up to a very large size, and normal
> class means chaining of 0-order physical pages, and chaining means spanning.
> So on 16K memcpy overhead is expected to be somewhat noticeable.
I don't disagree that it could be a problem, I am just against
optimizations without data. It makes it hard to modify these heuristics
later or remove them, since we don't really know what effect they had in
the first place.
We also don't know if the 0.5% increase in memory usage is actually
offset by CPU gains.
>
> > I also vaguely recall discussions about other ways to avoid the memcpy
> > using scatterlists, so I am wondering if this is the right metric to
> > optimize.
>
> As far as I understand SG-list based approach is that it will require
> implementing split-data handling on the compression algorithms side,
> which is not trivial (especially if the only reason to do that is
> zsmalloc).
I am not sure tbh, adding Herbert here. I remember looking at the code
in scomp_acomp_comp_decomp() at some point, and I think it will take
care of non-contiguous SG-lists. Not sure if that's the correct place to
look tho.
>
> Alternatively, we maybe can try to vmap spanning objects:
Using vmap makes sense in theory, but in practice (at least for zswap)
it doesn't help because SG lists do not support vmap addresses. Zswap
will actually treat them the same as highmem and copy them to a buffer
before putting them in an SG list, so we effectively just do the
memcpy() in zswap instead of zsmalloc.
>
> ---
> mm/zsmalloc.c | 24 +++++++++++++-----------
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 6fc216ab8190..4a68c27cb5d4 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -38,6 +38,7 @@
> #include <linux/zsmalloc.h>
> #include <linux/fs.h>
> #include <linux/workqueue.h>
> +#include <linux/vmalloc.h>
> #include "zpdesc.h"
>
> #define ZSPAGE_MAGIC 0x58
> @@ -1097,19 +1098,15 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> addr = kmap_local_zpdesc(zpdesc);
> addr += off;
> } else {
> - size_t sizes[2];
> + struct page *pages[2];
>
> /* this object spans two pages */
> - sizes[0] = PAGE_SIZE - off;
> - sizes[1] = class->size - sizes[0];
> - addr = local_copy;
> -
> - memcpy_from_page(addr, zpdesc_page(zpdesc),
> - off, sizes[0]);
> - zpdesc = get_next_zpdesc(zpdesc);
> - memcpy_from_page(addr + sizes[0],
> - zpdesc_page(zpdesc),
> - 0, sizes[1]);
> + pages[0] = zpdesc_page(zpdesc);
> + pages[1] = zpdesc_page(get_next_zpdesc(zpdesc));
> + addr = vm_map_ram(pages, 2, NUMA_NO_NODE);
> + if (!addr)
> + return NULL;
> + addr += off;
> }
>
> if (!ZsHugePage(zspage))
> @@ -1139,6 +1136,11 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
> off += ZS_HANDLE_SIZE;
> handle_mem -= off;
> kunmap_local(handle_mem);
> + } else {
> + if (!ZsHugePage(zspage))
> + off += ZS_HANDLE_SIZE;
> + handle_mem -= off;
> + vm_unmap_ram(handle_mem, 2);
> }
>
> zspage_read_unlock(zspage);
> --
> 2.52.0.351.gbe84eed79e-goog
>
>
> > What are the main pain points for PAGE_SIZE > 4K configs? Is it the
> > compression/decompression time? In my experience this is usually not the
> > bottleneck, I would imagine the real problem would be the internal
> > fragmentation.
>
> Right, internal fragmentation can be the main problem.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-05 7:23 ` Sergey Senozhatsky
@ 2026-01-05 16:01 ` Yosry Ahmed
2026-01-06 4:10 ` Sergey Senozhatsky
0 siblings, 1 reply; 14+ messages in thread
From: Yosry Ahmed @ 2026-01-05 16:01 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Andrew Morton, Nhat Pham, Minchan Kim, Johannes Weiner,
Brian Geffon, linux-kernel, linux-mm
On Mon, Jan 05, 2026 at 04:23:39PM +0900, Sergey Senozhatsky wrote:
> On (26/01/05 10:42), Sergey Senozhatsky wrote:
> > On (26/01/02 18:29), Yosry Ahmed wrote:
> > > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> > [..]
> > >
> > > I worry that the heuristics are too hand-wavy
> >
> > I don't disagree. Am not super excited about the heuristics either.
> >
> > > and I wonder if the memcpy savings actually show up as perf improvements
> > > in any real life workload. Do we have data about this?
> >
> > I don't have real life 16K PAGE_SIZE devices. However, on 16K PAGE_SIZE
> > systems we have "normal" size-classes up to a very large size, and normal
> > class means chaining of 0-order physical pages, and chaining means spanning.
> > So on 16K memcpy overhead is expected to be somewhat noticeable.
>
> By the way, while looking at it, I think we need to "fix" obj_read_begin().
> Currently, it uses "off + class->size" to detect spanning objects, which is
> incorrect: size classes get merged, so a typical size class can hold a range
> of sizes, using padding for smaller objects. So instead of class->size we
> need to use the actual compressed objects size, just in case if actual written
> size was small enough to fit into the first physical page (we do that in
> obj_write()). I'll cook a patch.
We also need to handle zs_obj_read_end() to do the kunmap() call
correctly.
>
> Something like this:
>
> ---
>
> drivers/block/zram/zram_drv.c | 8 +++++---
> include/linux/zsmalloc.h | 2 +-
> mm/zsmalloc.c | 4 ++--
> mm/zswap.c | 3 ++-
> 4 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a6587bed6a03..b371ba6bfec2 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -2065,7 +2065,7 @@ static int read_incompressible_page(struct zram *zram, struct page *page,
> void *src, *dst;
>
> handle = get_slot_handle(zram, index);
> - src = zs_obj_read_begin(zram->mem_pool, handle, NULL);
> + src = zs_obj_read_begin(zram->mem_pool, handle, PAGE_SIZE, NULL);
> dst = kmap_local_page(page);
> copy_page(dst, src);
> kunmap_local(dst);
> @@ -2087,7 +2087,8 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
> prio = get_slot_comp_priority(zram, index);
>
> zstrm = zcomp_stream_get(zram->comps[prio]);
> - src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> + src = zs_obj_read_begin(zram->mem_pool, handle, size,
> + zstrm->local_copy);
> dst = kmap_local_page(page);
> ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
> kunmap_local(dst);
> @@ -2114,7 +2115,8 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
> * takes place here, as we read raw compressed data.
> */
> zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
> - src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> + src = zs_obj_read_begin(zram->mem_pool, handle, size,
> + zstrm->local_copy);
> memcpy_to_page(page, 0, src, size);
> zs_obj_read_end(zram->mem_pool, handle, src);
> zcomp_stream_put(zstrm);
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index f3ccff2d966c..64f65c1f14d6 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -40,7 +40,7 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size);
> void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
>
> void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> - void *local_copy);
> + size_t mem_len, void *local_copy);
> void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
> void *handle_mem);
> void zs_obj_write(struct zs_pool *pool, unsigned long handle,
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index be385609ef8a..2da60c23cd18 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1070,7 +1070,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
> EXPORT_SYMBOL_GPL(zs_get_total_pages);
>
> void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> - void *local_copy)
> + size_t mem_len, void *local_copy)
> {
> struct zspage *zspage;
> struct zpdesc *zpdesc;
> @@ -1092,7 +1092,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> class = zspage_class(pool, zspage);
> off = offset_in_page(class->size * obj_idx);
>
> - if (off + class->size <= PAGE_SIZE) {
> + if (off + mem_len <= PAGE_SIZE) {
> /* this object is contained entirely within a page */
> addr = kmap_local_zpdesc(zpdesc);
> addr += off;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index de8858ff1521..291352629616 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -937,7 +937,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
> u8 *src, *obj;
>
> acomp_ctx = acomp_ctx_get_cpu_lock(pool);
> - obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer);
> + obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length,
> + acomp_ctx->buffer);
>
> /* zswap entries of length PAGE_SIZE are not compressed. */
> if (entry->length == PAGE_SIZE) {
> --
> 2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-05 16:01 ` Yosry Ahmed
@ 2026-01-06 4:10 ` Sergey Senozhatsky
0 siblings, 0 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-06 4:10 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, linux-mm
On (26/01/05 16:01), Yosry Ahmed wrote:
> On Mon, Jan 05, 2026 at 04:23:39PM +0900, Sergey Senozhatsky wrote:
> > On (26/01/05 10:42), Sergey Senozhatsky wrote:
> > > On (26/01/02 18:29), Yosry Ahmed wrote:
> > > > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> > > [..]
> > > >
> > > > I worry that the heuristics are too hand-wavy
> > >
> > > I don't disagree. Am not super excited about the heuristics either.
> > >
> > > > and I wonder if the memcpy savings actually show up as perf improvements
> > > > in any real life workload. Do we have data about this?
> > >
> > > I don't have real life 16K PAGE_SIZE devices. However, on 16K PAGE_SIZE
> > > systems we have "normal" size-classes up to a very large size, and normal
> > > class means chaining of 0-order physical pages, and chaining means spanning.
> > > So on 16K memcpy overhead is expected to be somewhat noticeable.
> >
> > By the way, while looking at it, I think we need to "fix" obj_read_begin().
> > Currently, it uses "off + class->size" to detect spanning objects, which is
> > incorrect: size classes get merged, so a typical size class can hold a range
> > of sizes, using padding for smaller objects. So instead of class->size we
> > need to use the actual compressed objects size, just in case if actual written
> > size was small enough to fit into the first physical page (we do that in
> > obj_write()). I'll cook a patch.
>
> We also need to handle zs_obj_read_end() to do the kunmap() call
> correctly.
Good catch, I realized that only after I started working on the patch.
We also need to account for inlined zs_handle.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-05 15:58 ` Yosry Ahmed
@ 2026-01-06 4:20 ` Sergey Senozhatsky
2026-01-06 4:22 ` Sergey Senozhatsky
2026-01-06 9:47 ` Sergey Senozhatsky
0 siblings, 2 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-06 4:20 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, Herbert Xu,
linux-mm
On (26/01/05 15:58), Yosry Ahmed wrote:
> On Mon, Jan 05, 2026 at 10:42:51AM +0900, Sergey Senozhatsky wrote:
> > On (26/01/02 18:29), Yosry Ahmed wrote:
> > > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> > [..]
> > >
> > > I worry that the heuristics are too hand-wavy
> >
> > I don't disagree. Am not super excited about the heuristics either.
> >
> > > and I wonder if the memcpy savings actually show up as perf improvements
> > > in any real life workload. Do we have data about this?
> >
> > I don't have real life 16K PAGE_SIZE devices. However, on 16K PAGE_SIZE
> > systems we have "normal" size-classes up to a very large size, and normal
> > class means chaining of 0-order physical pages, and chaining means spanning.
> > So on 16K memcpy overhead is expected to be somewhat noticeable.
>
> I don't disagree that it could be a problem, I am just against
> optimizations without data. It makes it hard to modify these heuristics
> later or remove them, since we don't really know what effect they had in
> the first place.
>
> We also don't know if the 0.5% increase in memory usage is actually
> offset by CPU gains.
Sure, we are on the same page here.
Another area where we potentially could apply similar heuristics
is size-calsses merge logic: sheer fact that two size-classes have
similar objects per zspage and pages per zspage does not necessarily
mean that merging them will be beneficial. E.g. if padding between
class->size and smallest possible object (when multiplied by the number
of objects per zspage) becomes a large enough wasted space.
But again, heuristics are hard. I'm fine with us dropping that idea
for the time being.
> > > I also vaguely recall discussions about other ways to avoid the memcpy
> > > using scatterlists, so I am wondering if this is the right metric to
> > > optimize.
> >
> > As far as I understand SG-list based approach is that it will require
> > implementing split-data handling on the compression algorithms side,
> > which is not trivial (especially if the only reason to do that is
> > zsmalloc).
>
> I am not sure tbh, adding Herbert here. I remember looking at the code
> in scomp_acomp_comp_decomp() at some point, and I think it will take
> care of non-contiguous SG-lists. Not sure if that's the correct place to
> look tho.
Ah, so it does kmap under the hood. I suppose that can work.
> > Alternatively, we maybe can try to vmap spanning objects:
>
> Using vmap makes sense in theory, but in practice (at least for zswap)
> it doesn't help
OK.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-06 4:20 ` Sergey Senozhatsky
@ 2026-01-06 4:22 ` Sergey Senozhatsky
2026-01-06 5:08 ` Herbert Xu
2026-01-06 9:47 ` Sergey Senozhatsky
1 sibling, 1 reply; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-06 4:22 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Yosry Ahmed, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, Herbert Xu,
linux-mm
On (26/01/06 13:20), Sergey Senozhatsky wrote:
[..]
> > I am not sure tbh, adding Herbert here. I remember looking at the code
> > in scomp_acomp_comp_decomp() at some point, and I think it will take
> > care of non-contiguous SG-lists. Not sure if that's the correct place to
> > look tho.
>
> Ah, so it does kmap under the hood. I suppose that can work.
I'm hallucinating, sorry. Yeah, let's hear from Herbert what's
the direction here.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-06 4:22 ` Sergey Senozhatsky
@ 2026-01-06 5:08 ` Herbert Xu
2026-01-06 16:24 ` Yosry Ahmed
0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2026-01-06 5:08 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Yosry Ahmed, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, linux-mm
On Tue, Jan 06, 2026 at 01:22:45PM +0900, Sergey Senozhatsky wrote:
> On (26/01/06 13:20), Sergey Senozhatsky wrote:
> [..]
> > > I am not sure tbh, adding Herbert here. I remember looking at the code
> > > in scomp_acomp_comp_decomp() at some point, and I think it will take
> > > care of non-contiguous SG-lists. Not sure if that's the correct place to
> > > look tho.
> >
> > Ah, so it does kmap under the hood. I suppose that can work.
>
> I'm hallucinating, sorry. Yeah, let's hear from Herbert what's
> the direction here.
I have not implemented the underlying SG support yet because
there are no users in the kernel as of now. But if this is
useful for you then we can certainly do this, at least for
LZO which is fairly simple.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-06 4:20 ` Sergey Senozhatsky
2026-01-06 4:22 ` Sergey Senozhatsky
@ 2026-01-06 9:47 ` Sergey Senozhatsky
1 sibling, 0 replies; 14+ messages in thread
From: Sergey Senozhatsky @ 2026-01-06 9:47 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Yosry Ahmed, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, Herbert Xu,
linux-mm
On (26/01/06 13:20), Sergey Senozhatsky wrote:
> Another area where we potentially could apply similar heuristics
> is size-calsses merge logic: sheer fact that two size-classes have
> similar objects per zspage and pages per zspage does not necessarily
> mean that merging them will be beneficial.
That's nonsense.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
2026-01-06 5:08 ` Herbert Xu
@ 2026-01-06 16:24 ` Yosry Ahmed
0 siblings, 0 replies; 14+ messages in thread
From: Yosry Ahmed @ 2026-01-06 16:24 UTC (permalink / raw)
To: Herbert Xu
Cc: Sergey Senozhatsky, Andrew Morton, Nhat Pham, Minchan Kim,
Johannes Weiner, Brian Geffon, linux-kernel, linux-mm
On Tue, Jan 06, 2026 at 01:08:09PM +0800, Herbert Xu wrote:
> On Tue, Jan 06, 2026 at 01:22:45PM +0900, Sergey Senozhatsky wrote:
> > On (26/01/06 13:20), Sergey Senozhatsky wrote:
> > [..]
> > > > I am not sure tbh, adding Herbert here. I remember looking at the code
> > > > in scomp_acomp_comp_decomp() at some point, and I think it will take
> > > > care of non-contiguous SG-lists. Not sure if that's the correct place to
> > > > look tho.
> > >
> > > Ah, so it does kmap under the hood. I suppose that can work.
> >
> > I'm hallucinating, sorry. Yeah, let's hear from Herbert what's
> > the direction here.
>
> I have not implemented the underlying SG support yet because
> there are no users in the kernel as of now. But if this is
> useful for you then we can certainly do this, at least for
> LZO which is fairly simple.
Just to clarify, IIUC the SG support would mean that zram or zswap can
pass a non-contiguous SG-list to the crypto API, regardless of
compressor support. I assume that the crypto layer will either pass the
SG-list as-is to the compressor if it supports it, or copy it into
scratch space to be contiguous if needed.
So zswap, for example, will get an SG list from zsmalloc and pass it
directly to the crypto API for decompression. Then the effort to add
support to compressors can be done separately.
Did I get this right?
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-01-06 16:24 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-01 1:38 [RFC PATCH 0/2] zsmalloc: size-classes chain-length tunings Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 1/2] zsmalloc: drop hard limit on the number of size classes Sergey Senozhatsky
2026-01-01 1:38 ` [RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics Sergey Senozhatsky
2026-01-02 18:29 ` Yosry Ahmed
2026-01-05 1:42 ` Sergey Senozhatsky
2026-01-05 7:23 ` Sergey Senozhatsky
2026-01-05 16:01 ` Yosry Ahmed
2026-01-06 4:10 ` Sergey Senozhatsky
2026-01-05 15:58 ` Yosry Ahmed
2026-01-06 4:20 ` Sergey Senozhatsky
2026-01-06 4:22 ` Sergey Senozhatsky
2026-01-06 5:08 ` Herbert Xu
2026-01-06 16:24 ` Yosry Ahmed
2026-01-06 9:47 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox