[PATCHv2 0/7] zram: introduce compressed data writeback

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/7] zram: introduce compressed data writeback
@ 2025-12-01  9:47 Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 1/7] " Sergey Senozhatsky
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

As writeback becomes more common there is another shortcoming
that needs to be addressed - compressed data writeback.  Currently
zram does uncompressed data writeback which is not optimal due to
potential CPU and battery wastage.  This series changes suboptimal
uncompressed writeback to a more optimal compressed data writeback.

v1 -> v2:
- made compressed writeback configurable via device attribute
- added missing batch_size documentation
- switched to guard() for init_lock
- more code tweaks and cleanups

Richard Chang (2):
  zram: introduce compressed data writeback
  zram: introduce writeback_compressed device attribute

Sergey Senozhatsky (5):
  zram: document writeback_batch_size
  zram: move bd_stat to writeback section
  zram: rename zram_free_page()
  zram: switch to guard() for init_lock
  zram: consolidate device-attr declarations

 Documentation/ABI/testing/sysfs-block-zram  |  14 +
 Documentation/admin-guide/blockdev/zram.rst |  24 +-
 drivers/block/zram/zram_drv.c               | 589 ++++++++++++--------
 drivers/block/zram/zram_drv.h               |   1 +
 4 files changed, 406 insertions(+), 222 deletions(-)

--
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 1/7] zram: introduce compressed data writeback
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  2026-01-07  3:50   ` zhangdongdong
  2025-12-01  9:47 ` [PATCHv2 2/7] zram: introduce writeback_compressed device attribute Sergey Senozhatsky
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky, Minchan Kim

From: Richard Chang <richardycc@google.com>

zram stores all written back slots raw, which implies that
during writeback zram first has to decompress slots (except
for ZRAM_HUGE slots, which are raw already).  The problem
with this approach is that not every written back page gets
read back (either via read() or via page-fault), which means
that zram basically wastes CPU cycles and battery decompressing
such slots.  This changes with introduction of decompression
on demand, in other words decompression on read()/page-fault.

One caveat of decompression on demand is that async read
is completed in IRQ context, while zram decompression is
sleepable.  To workaround this, read-back decompression
is offloaded to a preemptible context - system high-prio
work-queue.

At this point compressed writeback is still disabled,
a follow up patch will introduce a new device attribute
which will make it possible to toggle compressed writeback
per-device.

[senozhatsky: rewrote original implementation]
Signed-off-by: Richard Chang <richardycc@google.com>
Co-developed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Suggested-by: Minchan Kim <minchan@google.com>
Suggested-by: Brian Geffon <bgeffon@google.com>
---
 drivers/block/zram/zram_drv.c | 279 +++++++++++++++++++++++++++-------
 drivers/block/zram/zram_drv.h |   1 +
 2 files changed, 227 insertions(+), 53 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5759823d6314..6263d300312e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -57,9 +57,6 @@ static size_t huge_class_size;
 static const struct block_device_operations zram_devops;
 
 static void zram_free_page(struct zram *zram, size_t index);
-static int zram_read_from_zspool(struct zram *zram, struct page *page,
-				 u32 index);
-
 #define slot_dep_map(zram, index) (&(zram)->table[(index)].dep_map)
 
 static void zram_slot_lock_init(struct zram *zram, u32 index)
@@ -502,6 +499,10 @@ static ssize_t idle_store(struct device *dev,
 #ifdef CONFIG_ZRAM_WRITEBACK
 #define INVALID_BDEV_BLOCK		(~0UL)
 
+static int read_from_zspool_raw(struct zram *zram, struct page *page,
+				u32 index);
+static int read_from_zspool(struct zram *zram, struct page *page, u32 index);
+
 struct zram_wb_ctl {
 	/* idle list is accessed only by the writeback task, no concurency */
 	struct list_head idle_reqs;
@@ -522,6 +523,22 @@ struct zram_wb_req {
 	struct list_head entry;
 };
 
+struct zram_rb_req {
+	struct work_struct work;
+	struct zram *zram;
+	struct page *page;
+	/* The read bio for backing device */
+	struct bio *bio;
+	unsigned long blk_idx;
+	union {
+		/* The original bio to complete (async read) */
+		struct bio *parent;
+		/* error status (sync read) */
+		int error;
+	};
+	u32 index;
+};
+
 static ssize_t writeback_limit_enable_store(struct device *dev,
 					    struct device_attribute *attr,
 					    const char *buf, size_t len)
@@ -780,18 +797,6 @@ static void zram_release_bdev_block(struct zram *zram, unsigned long blk_idx)
 	atomic64_dec(&zram->stats.bd_count);
 }
 
-static void read_from_bdev_async(struct zram *zram, struct page *page,
-			unsigned long entry, struct bio *parent)
-{
-	struct bio *bio;
-
-	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
-	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
-	__bio_add_page(bio, page, PAGE_SIZE, 0);
-	bio_chain(bio, parent);
-	submit_bio(bio);
-}
-
 static void release_wb_req(struct zram_wb_req *req)
 {
 	__free_page(req->page);
@@ -886,8 +891,9 @@ static void zram_account_writeback_submit(struct zram *zram)
 
 static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
 {
-	u32 index = req->pps->index;
-	int err;
+	u32 size, index = req->pps->index;
+	int err, prio;
+	bool huge;
 
 	err = blk_status_to_errno(req->bio.bi_status);
 	if (err) {
@@ -914,9 +920,27 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
 		goto out;
 	}
 
+	if (zram->wb_compressed) {
+		/*
+		 * ZRAM_WB slots get freed, we need to preserve data required
+		 * for read decompression.
+		 */
+		size = zram_get_obj_size(zram, index);
+		prio = zram_get_priority(zram, index);
+		huge = zram_test_flag(zram, index, ZRAM_HUGE);
+	}
+
 	zram_free_page(zram, index);
 	zram_set_flag(zram, index, ZRAM_WB);
 	zram_set_handle(zram, index, req->blk_idx);
+
+	if (zram->wb_compressed) {
+		if (huge)
+			zram_set_flag(zram, index, ZRAM_HUGE);
+		zram_set_obj_size(zram, index, size);
+		zram_set_priority(zram, index, prio);
+	}
+
 	atomic64_inc(&zram->stats.pages_stored);
 
 out:
@@ -1050,7 +1074,11 @@ static int zram_writeback_slots(struct zram *zram,
 		 */
 		if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
 			goto next;
-		if (zram_read_from_zspool(zram, req->page, index))
+		if (zram->wb_compressed)
+			err = read_from_zspool_raw(zram, req->page, index);
+		else
+			err = read_from_zspool(zram, req->page, index);
+		if (err)
 			goto next;
 		zram_slot_unlock(zram, index);
 
@@ -1313,24 +1341,140 @@ static ssize_t writeback_store(struct device *dev,
 	return ret;
 }
 
-struct zram_work {
-	struct work_struct work;
-	struct zram *zram;
-	unsigned long entry;
-	struct page *page;
-	int error;
-};
+static int decompress_bdev_page(struct zram *zram, struct page *page, u32 index)
+{
+	struct zcomp_strm *zstrm;
+	unsigned int size;
+	int ret, prio;
+	void *src;
+
+	zram_slot_lock(zram, index);
+	/* Since slot was unlocked we need to make sure it's still ZRAM_WB */
+	if (!zram_test_flag(zram, index, ZRAM_WB)) {
+		zram_slot_unlock(zram, index);
+		/* We read some stale data, zero it out */
+		memset_page(page, 0, 0, PAGE_SIZE);
+		return -EIO;
+	}
+
+	if (zram_test_flag(zram, index, ZRAM_HUGE)) {
+		zram_slot_unlock(zram, index);
+		return 0;
+	}
+
+	size = zram_get_obj_size(zram, index);
+	prio = zram_get_priority(zram, index);
 
-static void zram_sync_read(struct work_struct *work)
+	zstrm = zcomp_stream_get(zram->comps[prio]);
+	src = kmap_local_page(page);
+	ret = zcomp_decompress(zram->comps[prio], zstrm, src, size,
+			       zstrm->local_copy);
+	if (!ret)
+		copy_page(src, zstrm->local_copy);
+	kunmap_local(src);
+	zcomp_stream_put(zstrm);
+	zram_slot_unlock(zram, index);
+
+	return ret;
+}
+
+static void zram_deferred_decompress(struct work_struct *w)
 {
-	struct zram_work *zw = container_of(work, struct zram_work, work);
+	struct zram_rb_req *req = container_of(w, struct zram_rb_req, work);
+	struct page *page = bio_first_page_all(req->bio);
+	struct zram *zram = req->zram;
+	u32 index = req->index;
+	int ret;
+
+	ret = decompress_bdev_page(zram, page, index);
+	if (ret)
+		req->parent->bi_status = BLK_STS_IOERR;
+
+	/* Decrement parent's ->remaining */
+	bio_endio(req->parent);
+	bio_put(req->bio);
+	kfree(req);
+}
+
+static void zram_async_read_endio(struct bio *bio)
+{
+	struct zram_rb_req *req = bio->bi_private;
+	struct zram *zram = req->zram;
+
+	if (bio->bi_status) {
+		req->parent->bi_status = bio->bi_status;
+		bio_endio(req->parent);
+		bio_put(bio);
+		kfree(req);
+		return;
+	}
+
+	/*
+	 * NOTE: zram_async_read_endio() is not exactly right place for this.
+	 * Ideally, we need to do it after ZRAM_WB check, but this requires
+	 * us to use wq path even on systems that don't enable compressed
+	 * writeback, because we cannot take slot-lock in the current context.
+	 *
+	 * Keep the existing behavior for now.
+	 */
+	if (zram->wb_compressed == false) {
+		/* No decompression needed, complete the parent IO */
+		bio_endio(req->parent);
+		bio_put(bio);
+		kfree(req);
+		return;
+	}
+
+	/*
+	 * zram decompression is sleepable, so we need to deffer it to
+	 * a preemptible context.
+	 */
+	INIT_WORK(&req->work, zram_deferred_decompress);
+	queue_work(system_highpri_wq, &req->work);
+}
+
+static void read_from_bdev_async(struct zram *zram, struct page *page,
+				 u32 index, unsigned long blk_idx,
+				 struct bio *parent)
+{
+	struct zram_rb_req *req;
+	struct bio *bio;
+
+	req = kmalloc(sizeof(*req), GFP_NOIO);
+	if (!req)
+		return;
+
+	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
+	if (!bio) {
+		kfree(req);
+		return;
+	}
+
+	req->zram = zram;
+	req->index = index;
+	req->blk_idx = blk_idx;
+	req->bio = bio;
+	req->parent = parent;
+
+	bio->bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
+	bio->bi_private = req;
+	bio->bi_end_io = zram_async_read_endio;
+
+	__bio_add_page(bio, page, PAGE_SIZE, 0);
+	bio_inc_remaining(parent);
+	submit_bio(bio);
+}
+
+static void zram_sync_read(struct work_struct *w)
+{
+	struct zram_rb_req *req = container_of(w, struct zram_rb_req, work);
 	struct bio_vec bv;
 	struct bio bio;
 
-	bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
-	bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
-	__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
-	zw->error = submit_bio_wait(&bio);
+	bio_init(&bio, req->zram->bdev, &bv, 1, REQ_OP_READ);
+	bio.bi_iter.bi_sector = req->blk_idx * (PAGE_SIZE >> 9);
+	__bio_add_page(&bio, req->page, PAGE_SIZE, 0);
+	req->error = submit_bio_wait(&bio);
 }
 
 /*
@@ -1338,39 +1482,42 @@ static void zram_sync_read(struct work_struct *work)
  * chained IO with parent IO in same context, it's a deadlock. To avoid that,
  * use a worker thread context.
  */
-static int read_from_bdev_sync(struct zram *zram, struct page *page,
-				unsigned long entry)
+static int read_from_bdev_sync(struct zram *zram, struct page *page, u32 index,
+			       unsigned long blk_idx)
 {
-	struct zram_work work;
+	struct zram_rb_req req;
 
-	work.page = page;
-	work.zram = zram;
-	work.entry = entry;
+	req.page = page;
+	req.zram = zram;
+	req.blk_idx = blk_idx;
 
-	INIT_WORK_ONSTACK(&work.work, zram_sync_read);
-	queue_work(system_dfl_wq, &work.work);
-	flush_work(&work.work);
-	destroy_work_on_stack(&work.work);
+	INIT_WORK_ONSTACK(&req.work, zram_sync_read);
+	queue_work(system_dfl_wq, &req.work);
+	flush_work(&req.work);
+	destroy_work_on_stack(&req.work);
 
-	return work.error;
+	if (req.error || zram->wb_compressed == false)
+		return req.error;
+
+	return decompress_bdev_page(zram, page, index);
 }
 
-static int read_from_bdev(struct zram *zram, struct page *page,
-			unsigned long entry, struct bio *parent)
+static int read_from_bdev(struct zram *zram, struct page *page, u32 index,
+			  unsigned long blk_idx, struct bio *parent)
 {
 	atomic64_inc(&zram->stats.bd_reads);
 	if (!parent) {
 		if (WARN_ON_ONCE(!IS_ENABLED(ZRAM_PARTIAL_IO)))
 			return -EIO;
-		return read_from_bdev_sync(zram, page, entry);
+		return read_from_bdev_sync(zram, page, index, blk_idx);
 	}
-	read_from_bdev_async(zram, page, entry, parent);
+	read_from_bdev_async(zram, page, index, blk_idx, parent);
 	return 0;
 }
 #else
 static inline void reset_bdev(struct zram *zram) {};
-static int read_from_bdev(struct zram *zram, struct page *page,
-			unsigned long entry, struct bio *parent)
+static int read_from_bdev(struct zram *zram, struct page *page, u32 index,
+			  unsigned long blk_idx, struct bio *parent)
 {
 	return -EIO;
 }
@@ -1977,12 +2124,37 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
 	return ret;
 }
 
+#if defined CONFIG_ZRAM_WRITEBACK
+static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
+{
+	struct zcomp_strm *zstrm;
+	unsigned long handle;
+	unsigned int size;
+	void *src;
+
+	handle = zram_get_handle(zram, index);
+	size = zram_get_obj_size(zram, index);
+
+	/*
+	 * We need to get stream just for ->local_copy buffer, in
+	 * case if object spans two physical pages. No decompression
+	 * takes place here, as we read raw compressed data.
+	 */
+	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
+	src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
+	memcpy_to_page(page, 0, src, size);
+	zs_obj_read_end(zram->mem_pool, handle, src);
+	zcomp_stream_put(zstrm);
+
+	return 0;
+}
+#endif
+
 /*
  * Reads (decompresses if needed) a page from zspool (zsmalloc).
  * Corresponding ZRAM slot should be locked.
  */
-static int zram_read_from_zspool(struct zram *zram, struct page *page,
-				 u32 index)
+static int read_from_zspool(struct zram *zram, struct page *page, u32 index)
 {
 	if (zram_test_flag(zram, index, ZRAM_SAME) ||
 	    !zram_get_handle(zram, index))
@@ -2002,7 +2174,7 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
 	zram_slot_lock(zram, index);
 	if (!zram_test_flag(zram, index, ZRAM_WB)) {
 		/* Slot should be locked through out the function call */
-		ret = zram_read_from_zspool(zram, page, index);
+		ret = read_from_zspool(zram, page, index);
 		zram_slot_unlock(zram, index);
 	} else {
 		unsigned long blk_idx = zram_get_handle(zram, index);
@@ -2012,7 +2184,7 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
 		 * device.
 		 */
 		zram_slot_unlock(zram, index);
-		ret = read_from_bdev(zram, page, blk_idx, parent);
+		ret = read_from_bdev(zram, page, index, blk_idx, parent);
 	}
 
 	/* Should NEVER happen. Return bio error if it does. */
@@ -2273,7 +2445,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
 	if (comp_len_old < threshold)
 		return 0;
 
-	ret = zram_read_from_zspool(zram, page, index);
+	ret = read_from_zspool(zram, page, index);
 	if (ret)
 		return ret;
 
@@ -2960,6 +3132,7 @@ static int zram_add(void)
 	init_rwsem(&zram->init_lock);
 #ifdef CONFIG_ZRAM_WRITEBACK
 	zram->wb_batch_size = 32;
+	zram->wb_compressed = false;
 #endif
 
 	/* gendisk structure */
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index c6d94501376c..72fdf66c78ab 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -128,6 +128,7 @@ struct zram {
 #ifdef CONFIG_ZRAM_WRITEBACK
 	struct file *backing_dev;
 	bool wb_limit_enable;
+	bool wb_compressed;
 	u32 wb_batch_size;
 	u64 bd_wb_limit;
 	struct block_device *bdev;
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 2/7] zram: introduce writeback_compressed device attribute
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 1/7] " Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 3/7] zram: document writeback_batch_size Sergey Senozhatsky
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

From: Richard Chang <richardycc@google.com>

Introduce witeback_compressed device attribute to toggle
compressed writeback (decompression on demand) feature.

[senozhatsky: rewrote original patch, added documentation]
Signed-off-by: Richard Chang <richardycc@google.com>
Co-developed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/ABI/testing/sysfs-block-zram  |  7 ++++
 Documentation/admin-guide/blockdev/zram.rst | 13 +++++++
 drivers/block/zram/zram_drv.c               | 38 +++++++++++++++++++++
 3 files changed, 58 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 36c57de0a10a..ed10c2e4b5c2 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -150,3 +150,10 @@ Contact:	Sergey Senozhatsky <senozhatsky@chromium.org>
 Description:
 		The algorithm_params file is write-only and is used to setup
 		compression algorithm parameters.
+
+What:		/sys/block/zram<id>/writeback_compressed
+Date:		Decemeber 2025
+Contact:	Richard Chang <richardycc@google.com>
+Description:
+		The writeback_compressed device atrribute toggles compressed
+		writeback feature.
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 3e273c1bb749..9547e4e95979 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -214,6 +214,7 @@ mem_limit         	WO	specifies the maximum amount of memory ZRAM can
 writeback_limit   	WO	specifies the maximum amount of write IO zram
 				can write out to backing device as 4KB unit
 writeback_limit_enable  RW	show and set writeback_limit feature
+writeback_compressed	RW	show and set compressed writeback feature
 comp_algorithm    	RW	show and change the compression algorithm
 algorithm_params	WO	setup compression algorithm parameters
 compact           	WO	trigger memory compaction
@@ -434,6 +435,18 @@ system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
 writeback happened until you reset the zram to allocate extra writeback
 budget in next setting is user's job.
 
+By default zram stores written back pages in decompressed (raw) form, which
+means that writeback operation involves decompression of the page before
+writing it to the backing device.  This behavior can be changed by enabling
+`writeback_compressed` feature, which causes zram to write compressed pages
+to the backing device, thus avoiding decompression overhead.  To enable
+this feature, execute::
+
+	$ echo yes > /sys/block/zramX/writeback_compressed
+
+Note that this feature should be configured before the `zramX` device is
+initialized.
+
 If admin wants to measure writeback count in a certain period, they could
 know it via /sys/block/zram0/bd_stat's 3rd column.
 
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 6263d300312e..3cc03c3f7389 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -539,6 +539,42 @@ struct zram_rb_req {
 	u32 index;
 };
 
+static ssize_t writeback_compressed_store(struct device *dev,
+					  struct device_attribute *attr,
+					  const char *buf, size_t len)
+{
+	struct zram *zram = dev_to_zram(dev);
+	bool val;
+
+	if (kstrtobool(buf, &val))
+		return -EINVAL;
+
+	down_write(&zram->init_lock);
+	if (init_done(zram)) {
+		up_write(&zram->init_lock);
+		return -EBUSY;
+	}
+
+	zram->wb_compressed = val;
+	up_write(&zram->init_lock);
+
+	return len;
+}
+
+static ssize_t writeback_compressed_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	bool val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->wb_compressed;
+	up_read(&zram->init_lock);
+
+	return sysfs_emit(buf, "%d\n", val);
+}
+
 static ssize_t writeback_limit_enable_store(struct device *dev,
 					    struct device_attribute *attr,
 					    const char *buf, size_t len)
@@ -3048,6 +3084,7 @@ static DEVICE_ATTR_WO(writeback);
 static DEVICE_ATTR_RW(writeback_limit);
 static DEVICE_ATTR_RW(writeback_limit_enable);
 static DEVICE_ATTR_RW(writeback_batch_size);
+static DEVICE_ATTR_RW(writeback_compressed);
 #endif
 #ifdef CONFIG_ZRAM_MULTI_COMP
 static DEVICE_ATTR_RW(recomp_algorithm);
@@ -3070,6 +3107,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_writeback_limit.attr,
 	&dev_attr_writeback_limit_enable.attr,
 	&dev_attr_writeback_batch_size.attr,
+	&dev_attr_writeback_compressed.attr,
 #endif
 	&dev_attr_io_stat.attr,
 	&dev_attr_mm_stat.attr,
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 3/7] zram: document writeback_batch_size
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 1/7] " Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 2/7] zram: introduce writeback_compressed device attribute Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 4/7] zram: move bd_stat to writeback section Sergey Senozhatsky
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

Add missing writeback_batch_size documentation.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/ABI/testing/sysfs-block-zram  |  7 +++++++
 Documentation/admin-guide/blockdev/zram.rst | 11 ++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index ed10c2e4b5c2..e538d4850d61 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -157,3 +157,10 @@ Contact:	Richard Chang <richardycc@google.com>
 Description:
 		The writeback_compressed device atrribute toggles compressed
 		writeback feature.
+
+What:		/sys/block/zram<id>/writeback_batch_size
+Date:		November 2025
+Contact:	Sergey Senozhatsky <senozhatsky@chromium.org>
+Description:
+		The writeback_batch_size device atrribute sets the maximum
+		number of in-flight writeback operations.
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 9547e4e95979..94bb7f2245ee 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -214,6 +214,8 @@ mem_limit         	WO	specifies the maximum amount of memory ZRAM can
 writeback_limit   	WO	specifies the maximum amount of write IO zram
 				can write out to backing device as 4KB unit
 writeback_limit_enable  RW	show and set writeback_limit feature
+writeback_batch_size	RW	show and set maximum number of in-flight
+				writeback operations
 writeback_compressed	RW	show and set compressed writeback feature
 comp_algorithm    	RW	show and change the compression algorithm
 algorithm_params	WO	setup compression algorithm parameters
@@ -223,7 +225,6 @@ backing_dev	  	RW	set up backend storage for zram to write out
 idle		  	WO	mark allocated slot as idle
 ======================  ======  ===============================================
 
-
 User space is advised to use the following files to read the device statistics.
 
 File /sys/block/zram<id>/stat
@@ -447,6 +448,14 @@ this feature, execute::
 Note that this feature should be configured before the `zramX` device is
 initialized.
 
+Depending on backing device storage type, writeback operation may benefit
+from a higher number of in-flight write requests (batched writes).  The
+number of maximum in-flight writeback operations can be configured via
+`writeback_batch_size` attribute.  To change the default value (which is 32),
+execute::
+
+	$ echo 64 > /sys/block/zramX/writeback_batch_size
+
 If admin wants to measure writeback count in a certain period, they could
 know it via /sys/block/zram0/bd_stat's 3rd column.
 
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 4/7] zram: move bd_stat to writeback section
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2025-12-01  9:47 ` [PATCHv2 3/7] zram: document writeback_batch_size Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 5/7] zram: rename zram_free_page() Sergey Senozhatsky
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

Move bd_stat function and attribute declaration to
existing CONFIG_WRITEBACK ifdef-sections.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 48 +++++++++++++++--------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3cc03c3f7389..1a0f550219b1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -539,6 +539,24 @@ struct zram_rb_req {
 	u32 index;
 };
 
+#define FOUR_K(x) ((x) * (1 << (PAGE_SHIFT - 12)))
+static ssize_t bd_stat_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+	ssize_t ret;
+
+	down_read(&zram->init_lock);
+	ret = sysfs_emit(buf,
+			 "%8llu %8llu %8llu\n",
+			 FOUR_K((u64)atomic64_read(&zram->stats.bd_count)),
+			 FOUR_K((u64)atomic64_read(&zram->stats.bd_reads)),
+			 FOUR_K((u64)atomic64_read(&zram->stats.bd_writes)));
+	up_read(&zram->init_lock);
+
+	return ret;
+}
+
 static ssize_t writeback_compressed_store(struct device *dev,
 					  struct device_attribute *attr,
 					  const char *buf, size_t len)
@@ -1976,28 +1994,8 @@ static ssize_t mm_stat_show(struct device *dev,
 	return ret;
 }
 
-#ifdef CONFIG_ZRAM_WRITEBACK
-#define FOUR_K(x) ((x) * (1 << (PAGE_SHIFT - 12)))
-static ssize_t bd_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
-{
-	struct zram *zram = dev_to_zram(dev);
-	ssize_t ret;
-
-	down_read(&zram->init_lock);
-	ret = sysfs_emit(buf,
-			"%8llu %8llu %8llu\n",
-			FOUR_K((u64)atomic64_read(&zram->stats.bd_count)),
-			FOUR_K((u64)atomic64_read(&zram->stats.bd_reads)),
-			FOUR_K((u64)atomic64_read(&zram->stats.bd_writes)));
-	up_read(&zram->init_lock);
-
-	return ret;
-}
-#endif
-
 static ssize_t debug_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+			       struct device_attribute *attr, char *buf)
 {
 	int version = 1;
 	struct zram *zram = dev_to_zram(dev);
@@ -2015,9 +2013,6 @@ static ssize_t debug_stat_show(struct device *dev,
 
 static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
-#ifdef CONFIG_ZRAM_WRITEBACK
-static DEVICE_ATTR_RO(bd_stat);
-#endif
 static DEVICE_ATTR_RO(debug_stat);
 
 static void zram_meta_free(struct zram *zram, u64 disksize)
@@ -3079,6 +3074,7 @@ static DEVICE_ATTR_WO(mem_used_max);
 static DEVICE_ATTR_WO(idle);
 static DEVICE_ATTR_RW(comp_algorithm);
 #ifdef CONFIG_ZRAM_WRITEBACK
+static DEVICE_ATTR_RO(bd_stat);
 static DEVICE_ATTR_RW(backing_dev);
 static DEVICE_ATTR_WO(writeback);
 static DEVICE_ATTR_RW(writeback_limit);
@@ -3102,6 +3098,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_idle.attr,
 	&dev_attr_comp_algorithm.attr,
 #ifdef CONFIG_ZRAM_WRITEBACK
+	&dev_attr_bd_stat.attr,
 	&dev_attr_backing_dev.attr,
 	&dev_attr_writeback.attr,
 	&dev_attr_writeback_limit.attr,
@@ -3111,9 +3108,6 @@ static struct attribute *zram_disk_attrs[] = {
 #endif
 	&dev_attr_io_stat.attr,
 	&dev_attr_mm_stat.attr,
-#ifdef CONFIG_ZRAM_WRITEBACK
-	&dev_attr_bd_stat.attr,
-#endif
 	&dev_attr_debug_stat.attr,
 #ifdef CONFIG_ZRAM_MULTI_COMP
 	&dev_attr_recomp_algorithm.attr,
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 5/7] zram: rename zram_free_page()
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2025-12-01  9:47 ` [PATCHv2 4/7] zram: move bd_stat to writeback section Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 6/7] zram: switch to guard() for init_lock Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 7/7] zram: consolidate device-attr declarations Sergey Senozhatsky
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

We don't free page in zram_free_page(), not all slots even
have any memory associated with them (e.g. ZRAM_SAME).  We
free the slot (or reset it), rename the function accordingly.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 1a0f550219b1..615756d5d05d 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -56,7 +56,7 @@ static size_t huge_class_size;
 
 static const struct block_device_operations zram_devops;
 
-static void zram_free_page(struct zram *zram, size_t index);
+static void zram_slot_free(struct zram *zram, u32 index);
 #define slot_dep_map(zram, index) (&(zram)->table[(index)].dep_map)
 
 static void zram_slot_lock_init(struct zram *zram, u32 index)
@@ -984,7 +984,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
 		huge = zram_test_flag(zram, index, ZRAM_HUGE);
 	}
 
-	zram_free_page(zram, index);
+	zram_slot_free(zram, index);
 	zram_set_flag(zram, index, ZRAM_WB);
 	zram_set_handle(zram, index, req->blk_idx);
 
@@ -2025,7 +2025,7 @@ static void zram_meta_free(struct zram *zram, u64 disksize)
 
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < num_pages; index++)
-		zram_free_page(zram, index);
+		zram_slot_free(zram, index);
 
 	zs_destroy_pool(zram->mem_pool);
 	vfree(zram->table);
@@ -2057,7 +2057,7 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
 	return true;
 }
 
-static void zram_free_page(struct zram *zram, size_t index)
+static void zram_slot_free(struct zram *zram, u32 index)
 {
 	unsigned long handle;
 
@@ -2256,7 +2256,7 @@ static int write_same_filled_page(struct zram *zram, unsigned long fill,
 				  u32 index)
 {
 	zram_slot_lock(zram, index);
-	zram_free_page(zram, index);
+	zram_slot_free(zram, index);
 	zram_set_flag(zram, index, ZRAM_SAME);
 	zram_set_handle(zram, index, fill);
 	zram_slot_unlock(zram, index);
@@ -2294,7 +2294,7 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
 	kunmap_local(src);
 
 	zram_slot_lock(zram, index);
-	zram_free_page(zram, index);
+	zram_slot_free(zram, index);
 	zram_set_flag(zram, index, ZRAM_HUGE);
 	zram_set_handle(zram, index, handle);
 	zram_set_obj_size(zram, index, PAGE_SIZE);
@@ -2359,7 +2359,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	zcomp_stream_put(zstrm);
 
 	zram_slot_lock(zram, index);
-	zram_free_page(zram, index);
+	zram_slot_free(zram, index);
 	zram_set_handle(zram, index, handle);
 	zram_set_obj_size(zram, index, comp_len);
 	zram_slot_unlock(zram, index);
@@ -2581,7 +2581,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
 	zs_obj_write(zram->mem_pool, handle_new, zstrm->buffer, comp_len_new);
 	zcomp_stream_put(zstrm);
 
-	zram_free_page(zram, index);
+	zram_slot_free(zram, index);
 	zram_set_handle(zram, index, handle_new);
 	zram_set_obj_size(zram, index, comp_len_new);
 	zram_set_priority(zram, index, prio);
@@ -2784,7 +2784,7 @@ static void zram_bio_discard(struct zram *zram, struct bio *bio)
 
 	while (n >= PAGE_SIZE) {
 		zram_slot_lock(zram, index);
-		zram_free_page(zram, index);
+		zram_slot_free(zram, index);
 		zram_slot_unlock(zram, index);
 		atomic64_inc(&zram->stats.notify_free);
 		index++;
@@ -2892,7 +2892,7 @@ static void zram_slot_free_notify(struct block_device *bdev,
 		return;
 	}
 
-	zram_free_page(zram, index);
+	zram_slot_free(zram, index);
 	zram_slot_unlock(zram, index);
 }
 
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 6/7] zram: switch to guard() for init_lock
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2025-12-01  9:47 ` [PATCHv2 5/7] zram: rename zram_free_page() Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  2025-12-01  9:47 ` [PATCHv2 7/7] zram: consolidate device-attr declarations Sergey Senozhatsky
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

Use init_lock guard() in sysfs store/show handlers, in order
to simplify and, more importantly, to modernize the code.

While at it, fix up more coding styles.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 211 +++++++++++++---------------------
 1 file changed, 77 insertions(+), 134 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 615756d5d05d..4b8a26c60539 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -360,15 +360,14 @@ static bool page_same_filled(void *ptr, unsigned long *element)
 	return true;
 }
 
-static ssize_t initstate_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static ssize_t initstate_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
 {
 	u32 val;
 	struct zram *zram = dev_to_zram(dev);
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	val = init_done(zram);
-	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%u\n", val);
 }
@@ -382,7 +381,8 @@ static ssize_t disksize_show(struct device *dev,
 }
 
 static ssize_t mem_limit_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+			       struct device_attribute *attr, const char *buf,
+			       size_t len)
 {
 	u64 limit;
 	char *tmp;
@@ -392,15 +392,15 @@ static ssize_t mem_limit_store(struct device *dev,
 	if (buf == tmp) /* no chars parsed, invalid input */
 		return -EINVAL;
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
-	up_write(&zram->init_lock);
 
 	return len;
 }
 
 static ssize_t mem_used_max_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
 {
 	int err;
 	unsigned long val;
@@ -410,12 +410,11 @@ static ssize_t mem_used_max_store(struct device *dev,
 	if (err || val != 0)
 		return -EINVAL;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	if (init_done(zram)) {
 		atomic_long_set(&zram->stats.max_used_pages,
 				zs_get_total_pages(zram->mem_pool));
 	}
-	up_read(&zram->init_lock);
 
 	return len;
 }
@@ -458,12 +457,11 @@ static void mark_idle(struct zram *zram, ktime_t cutoff)
 	}
 }
 
-static ssize_t idle_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+static ssize_t idle_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
 {
 	struct zram *zram = dev_to_zram(dev);
 	ktime_t cutoff_time = 0;
-	ssize_t rv = -EINVAL;
 
 	if (!sysfs_streq(buf, "all")) {
 		/*
@@ -476,24 +474,19 @@ static ssize_t idle_store(struct device *dev,
 			cutoff_time = ktime_sub(ktime_get_boottime(),
 					ns_to_ktime(age_sec * NSEC_PER_SEC));
 		else
-			goto out;
+			return -EINVAL;
 	}
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	if (!init_done(zram))
-		goto out_unlock;
+		return -EINVAL;
 
 	/*
 	 * A cutoff_time of 0 marks everything as idle, this is the
 	 * "all" behavior.
 	 */
 	mark_idle(zram, cutoff_time);
-	rv = len;
-
-out_unlock:
-	up_read(&zram->init_lock);
-out:
-	return rv;
+	return len;
 }
 
 #ifdef CONFIG_ZRAM_WRITEBACK
@@ -546,13 +539,12 @@ static ssize_t bd_stat_show(struct device *dev, struct device_attribute *attr,
 	struct zram *zram = dev_to_zram(dev);
 	ssize_t ret;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	ret = sysfs_emit(buf,
 			 "%8llu %8llu %8llu\n",
 			 FOUR_K((u64)atomic64_read(&zram->stats.bd_count)),
 			 FOUR_K((u64)atomic64_read(&zram->stats.bd_reads)),
 			 FOUR_K((u64)atomic64_read(&zram->stats.bd_writes)));
-	up_read(&zram->init_lock);
 
 	return ret;
 }
@@ -567,14 +559,12 @@ static ssize_t writeback_compressed_store(struct device *dev,
 	if (kstrtobool(buf, &val))
 		return -EINVAL;
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	if (init_done(zram)) {
-		up_write(&zram->init_lock);
 		return -EBUSY;
 	}
 
 	zram->wb_compressed = val;
-	up_write(&zram->init_lock);
 
 	return len;
 }
@@ -586,9 +576,8 @@ static ssize_t writeback_compressed_show(struct device *dev,
 	bool val;
 	struct zram *zram = dev_to_zram(dev);
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	val = zram->wb_compressed;
-	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%d\n", val);
 }
@@ -599,17 +588,14 @@ static ssize_t writeback_limit_enable_store(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 	u64 val;
-	ssize_t ret = -EINVAL;
 
 	if (kstrtoull(buf, 10, &val))
-		return ret;
+		return -EINVAL;
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	zram->wb_limit_enable = val;
-	up_write(&zram->init_lock);
-	ret = len;
 
-	return ret;
+	return len;
 }
 
 static ssize_t writeback_limit_enable_show(struct device *dev,
@@ -619,9 +605,8 @@ static ssize_t writeback_limit_enable_show(struct device *dev,
 	bool val;
 	struct zram *zram = dev_to_zram(dev);
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	val = zram->wb_limit_enable;
-	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%d\n", val);
 }
@@ -632,10 +617,9 @@ static ssize_t writeback_limit_store(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 	u64 val;
-	ssize_t ret = -EINVAL;
 
 	if (kstrtoull(buf, 10, &val))
-		return ret;
+		return -EINVAL;
 
 	/*
 	 * When the page size is greater than 4KB, if bd_wb_limit is set to
@@ -647,12 +631,10 @@ static ssize_t writeback_limit_store(struct device *dev,
 	 */
 	val = rounddown(val, PAGE_SIZE / 4096);
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	zram->bd_wb_limit = val;
-	up_write(&zram->init_lock);
-	ret = len;
 
-	return ret;
+	return len;
 }
 
 static ssize_t writeback_limit_show(struct device *dev,
@@ -661,9 +643,8 @@ static ssize_t writeback_limit_show(struct device *dev,
 	u64 val;
 	struct zram *zram = dev_to_zram(dev);
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	val = zram->bd_wb_limit;
-	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%llu\n", val);
 }
@@ -681,9 +662,8 @@ static ssize_t writeback_batch_size_store(struct device *dev,
 	if (!val)
 		return -EINVAL;
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	zram->wb_batch_size = val;
-	up_write(&zram->init_lock);
 
 	return len;
 }
@@ -695,9 +675,8 @@ static ssize_t writeback_batch_size_show(struct device *dev,
 	u32 val;
 	struct zram *zram = dev_to_zram(dev);
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	val = zram->wb_batch_size;
-	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%u\n", val);
 }
@@ -717,37 +696,33 @@ static void reset_bdev(struct zram *zram)
 }
 
 static ssize_t backing_dev_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+				struct device_attribute *attr, char *buf)
 {
 	struct file *file;
 	struct zram *zram = dev_to_zram(dev);
 	char *p;
 	ssize_t ret;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	file = zram->backing_dev;
 	if (!file) {
 		memcpy(buf, "none\n", 5);
-		up_read(&zram->init_lock);
 		return 5;
 	}
 
 	p = file_path(file, buf, PAGE_SIZE - 1);
-	if (IS_ERR(p)) {
-		ret = PTR_ERR(p);
-		goto out;
-	}
+	if (IS_ERR(p))
+		return PTR_ERR(p);
 
 	ret = strlen(p);
 	memmove(buf, p, ret);
 	buf[ret++] = '\n';
-out:
-	up_read(&zram->init_lock);
 	return ret;
 }
 
 static ssize_t backing_dev_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+				 struct device_attribute *attr, const char *buf,
+				 size_t len)
 {
 	char *file_name;
 	size_t sz;
@@ -762,7 +737,7 @@ static ssize_t backing_dev_store(struct device *dev,
 	if (!file_name)
 		return -ENOMEM;
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	if (init_done(zram)) {
 		pr_info("Can't setup backing device for initialized device\n");
 		err = -EBUSY;
@@ -810,7 +785,6 @@ static ssize_t backing_dev_store(struct device *dev,
 	zram->backing_dev = backing_dev;
 	zram->bitmap = bitmap;
 	zram->nr_pages = nr_pages;
-	up_write(&zram->init_lock);
 
 	pr_info("setup backing device %s\n", file_name);
 	kfree(file_name);
@@ -822,8 +796,6 @@ static ssize_t backing_dev_store(struct device *dev,
 	if (backing_dev)
 		filp_close(backing_dev, NULL);
 
-	up_write(&zram->init_lock);
-
 	kfree(file_name);
 
 	return err;
@@ -1291,33 +1263,29 @@ static ssize_t writeback_store(struct device *dev,
 	ssize_t ret = len;
 	int err, mode = 0;
 
-	down_read(&zram->init_lock);
-	if (!init_done(zram)) {
-		up_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
+	if (!init_done(zram))
 		return -EINVAL;
-	}
 
 	/* Do not permit concurrent post-processing actions. */
-	if (atomic_xchg(&zram->pp_in_progress, 1)) {
-		up_read(&zram->init_lock);
+	if (atomic_xchg(&zram->pp_in_progress, 1))
 		return -EAGAIN;
-	}
 
 	if (!zram->backing_dev) {
 		ret = -ENODEV;
-		goto release_init_lock;
+		goto out;
 	}
 
 	pp_ctl = init_pp_ctl();
 	if (!pp_ctl) {
 		ret = -ENOMEM;
-		goto release_init_lock;
+		goto out;
 	}
 
 	wb_ctl = init_wb_ctl(zram);
 	if (!wb_ctl) {
 		ret = -ENOMEM;
-		goto release_init_lock;
+		goto out;
 	}
 
 	args = skip_spaces(buf);
@@ -1341,7 +1309,7 @@ static ssize_t writeback_store(struct device *dev,
 			err = parse_mode(param, &mode);
 			if (err) {
 				ret = err;
-				goto release_init_lock;
+				goto out;
 			}
 
 			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
@@ -1352,7 +1320,7 @@ static ssize_t writeback_store(struct device *dev,
 			err = parse_mode(val, &mode);
 			if (err) {
 				ret = err;
-				goto release_init_lock;
+				goto out;
 			}
 
 			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
@@ -1363,7 +1331,7 @@ static ssize_t writeback_store(struct device *dev,
 			err = parse_page_index(val, nr_pages, &lo, &hi);
 			if (err) {
 				ret = err;
-				goto release_init_lock;
+				goto out;
 			}
 
 			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
@@ -1374,7 +1342,7 @@ static ssize_t writeback_store(struct device *dev,
 			err = parse_page_indexes(val, nr_pages, &lo, &hi);
 			if (err) {
 				ret = err;
-				goto release_init_lock;
+				goto out;
 			}
 
 			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
@@ -1386,11 +1354,10 @@ static ssize_t writeback_store(struct device *dev,
 	if (err)
 		ret = err;
 
-release_init_lock:
+out:
 	release_pp_ctl(zram, pp_ctl);
 	release_wb_ctl(wb_ctl);
 	atomic_set(&zram->pp_in_progress, 0);
-	up_read(&zram->init_lock);
 
 	return ret;
 }
@@ -1608,9 +1575,8 @@ static ssize_t read_block_state(struct file *file, char __user *buf,
 	if (!kbuf)
 		return -ENOMEM;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	if (!init_done(zram)) {
-		up_read(&zram->init_lock);
 		kvfree(kbuf);
 		return -EINVAL;
 	}
@@ -1646,7 +1612,6 @@ static ssize_t read_block_state(struct file *file, char __user *buf,
 		*ppos += 1;
 	}
 
-	up_read(&zram->init_lock);
 	if (copy_to_user(buf, kbuf, written))
 		written = -EFAULT;
 	kvfree(kbuf);
@@ -1713,16 +1678,14 @@ static int __comp_algorithm_store(struct zram *zram, u32 prio, const char *buf)
 		return -EINVAL;
 	}
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	if (init_done(zram)) {
-		up_write(&zram->init_lock);
 		kfree(compressor);
 		pr_info("Can't change algorithm for initialized device\n");
 		return -EBUSY;
 	}
 
 	comp_algorithm_set(zram, prio, compressor);
-	up_write(&zram->init_lock);
 	return 0;
 }
 
@@ -1843,9 +1806,8 @@ static ssize_t comp_algorithm_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 	ssize_t sz;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	sz = zcomp_available_show(zram->comp_algs[ZRAM_PRIMARY_COMP], buf, 0);
-	up_read(&zram->init_lock);
 	return sz;
 }
 
@@ -1870,7 +1832,7 @@ static ssize_t recomp_algorithm_show(struct device *dev,
 	ssize_t sz = 0;
 	u32 prio;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
 		if (!zram->comp_algs[prio])
 			continue;
@@ -1878,7 +1840,6 @@ static ssize_t recomp_algorithm_show(struct device *dev,
 		sz += sysfs_emit_at(buf, sz, "#%d: ", prio);
 		sz += zcomp_available_show(zram->comp_algs[prio], buf, sz);
 	}
-	up_read(&zram->init_lock);
 	return sz;
 }
 
@@ -1924,42 +1885,38 @@ static ssize_t recomp_algorithm_store(struct device *dev,
 }
 #endif
 
-static ssize_t compact_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+static ssize_t compact_store(struct device *dev, struct device_attribute *attr,
+			     const char *buf, size_t len)
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	down_read(&zram->init_lock);
-	if (!init_done(zram)) {
-		up_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
+	if (!init_done(zram))
 		return -EINVAL;
-	}
 
 	zs_compact(zram->mem_pool);
-	up_read(&zram->init_lock);
 
 	return len;
 }
 
-static ssize_t io_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static ssize_t io_stat_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
 {
 	struct zram *zram = dev_to_zram(dev);
 	ssize_t ret;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	ret = sysfs_emit(buf,
 			"%8llu %8llu 0 %8llu\n",
 			(u64)atomic64_read(&zram->stats.failed_reads),
 			(u64)atomic64_read(&zram->stats.failed_writes),
 			(u64)atomic64_read(&zram->stats.notify_free));
-	up_read(&zram->init_lock);
 
 	return ret;
 }
 
-static ssize_t mm_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static ssize_t mm_stat_show(struct device *dev, struct device_attribute *attr,
+			    char *buf)
 {
 	struct zram *zram = dev_to_zram(dev);
 	struct zs_pool_stats pool_stats;
@@ -1969,7 +1926,7 @@ static ssize_t mm_stat_show(struct device *dev,
 
 	memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats));
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	if (init_done(zram)) {
 		mem_used = zs_get_total_pages(zram->mem_pool);
 		zs_pool_stats(zram->mem_pool, &pool_stats);
@@ -1989,7 +1946,6 @@ static ssize_t mm_stat_show(struct device *dev,
 			atomic_long_read(&pool_stats.pages_compacted),
 			(u64)atomic64_read(&zram->stats.huge_pages),
 			(u64)atomic64_read(&zram->stats.huge_pages_since));
-	up_read(&zram->init_lock);
 
 	return ret;
 }
@@ -2001,12 +1957,11 @@ static ssize_t debug_stat_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 	ssize_t ret;
 
-	down_read(&zram->init_lock);
+	guard(rwsem_read)(&zram->init_lock);
 	ret = sysfs_emit(buf,
 			"version: %d\n0 %8llu\n",
 			version,
 			(u64)atomic64_read(&zram->stats.miss_free));
-	up_read(&zram->init_lock);
 
 	return ret;
 }
@@ -2669,17 +2624,13 @@ static ssize_t recompress_store(struct device *dev,
 	if (threshold >= huge_class_size)
 		return -EINVAL;
 
-	down_read(&zram->init_lock);
-	if (!init_done(zram)) {
-		ret = -EINVAL;
-		goto release_init_lock;
-	}
+	guard(rwsem_read)(&zram->init_lock);
+	if (!init_done(zram))
+		return -EINVAL;
 
 	/* Do not permit concurrent post-processing actions. */
-	if (atomic_xchg(&zram->pp_in_progress, 1)) {
-		up_read(&zram->init_lock);
+	if (atomic_xchg(&zram->pp_in_progress, 1))
 		return -EAGAIN;
-	}
 
 	if (algo) {
 		bool found = false;
@@ -2697,26 +2648,26 @@ static ssize_t recompress_store(struct device *dev,
 
 		if (!found) {
 			ret = -EINVAL;
-			goto release_init_lock;
+			goto out;
 		}
 	}
 
 	prio_max = min(prio_max, (u32)zram->num_active_comps);
 	if (prio >= prio_max) {
 		ret = -EINVAL;
-		goto release_init_lock;
+		goto out;
 	}
 
 	page = alloc_page(GFP_KERNEL);
 	if (!page) {
 		ret = -ENOMEM;
-		goto release_init_lock;
+		goto out;
 	}
 
 	ctl = init_pp_ctl();
 	if (!ctl) {
 		ret = -ENOMEM;
-		goto release_init_lock;
+		goto out;
 	}
 
 	scan_slots_for_recompress(zram, mode, prio_max, ctl);
@@ -2747,12 +2698,11 @@ static ssize_t recompress_store(struct device *dev,
 		cond_resched();
 	}
 
-release_init_lock:
+out:
 	if (page)
 		__free_page(page);
 	release_pp_ctl(zram, ctl);
 	atomic_set(&zram->pp_in_progress, 0);
-	up_read(&zram->init_lock);
 	return ret;
 }
 #endif
@@ -2931,7 +2881,7 @@ static void zram_destroy_comps(struct zram *zram)
 
 static void zram_reset_device(struct zram *zram)
 {
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 
 	zram->limit_pages = 0;
 
@@ -2947,11 +2897,10 @@ static void zram_reset_device(struct zram *zram)
 	reset_bdev(zram);
 
 	comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor);
-	up_write(&zram->init_lock);
 }
 
-static ssize_t disksize_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+static ssize_t disksize_store(struct device *dev, struct device_attribute *attr,
+			      const char *buf, size_t len)
 {
 	u64 disksize;
 	struct zcomp *comp;
@@ -2963,18 +2912,15 @@ static ssize_t disksize_store(struct device *dev,
 	if (!disksize)
 		return -EINVAL;
 
-	down_write(&zram->init_lock);
+	guard(rwsem_write)(&zram->init_lock);
 	if (init_done(zram)) {
 		pr_info("Cannot change disksize for initialized device\n");
-		err = -EBUSY;
-		goto out_unlock;
+		return -EBUSY;
 	}
 
 	disksize = PAGE_ALIGN(disksize);
-	if (!zram_meta_alloc(zram, disksize)) {
-		err = -ENOMEM;
-		goto out_unlock;
-	}
+	if (!zram_meta_alloc(zram, disksize))
+		return -ENOMEM;
 
 	for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
 		if (!zram->comp_algs[prio])
@@ -2994,15 +2940,12 @@ static ssize_t disksize_store(struct device *dev,
 	}
 	zram->disksize = disksize;
 	set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT);
-	up_write(&zram->init_lock);
 
 	return len;
 
 out_free_comps:
 	zram_destroy_comps(zram);
 	zram_meta_free(zram, disksize);
-out_unlock:
-	up_write(&zram->init_lock);
 	return err;
 }
 
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 7/7] zram: consolidate device-attr declarations
  2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2025-12-01  9:47 ` [PATCHv2 6/7] zram: switch to guard() for init_lock Sergey Senozhatsky
@ 2025-12-01  9:47 ` Sergey Senozhatsky
  6 siblings, 0 replies; 15+ messages in thread
From: Sergey Senozhatsky @ 2025-12-01  9:47 UTC (permalink / raw)
  To: Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

Do not spread device attributes declarations across
the file, move io_stat, mm_stat, debug_stat to a common
device-attr section.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 4b8a26c60539..67a9e7c005c3 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1966,10 +1966,6 @@ static ssize_t debug_stat_show(struct device *dev,
 	return ret;
 }
 
-static DEVICE_ATTR_RO(io_stat);
-static DEVICE_ATTR_RO(mm_stat);
-static DEVICE_ATTR_RO(debug_stat);
-
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
 	size_t num_pages = disksize >> PAGE_SHIFT;
@@ -3008,6 +3004,9 @@ static const struct block_device_operations zram_devops = {
 	.owner = THIS_MODULE
 };
 
+static DEVICE_ATTR_RO(io_stat);
+static DEVICE_ATTR_RO(mm_stat);
+static DEVICE_ATTR_RO(debug_stat);
 static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
-- 
2.52.0.487.g5c8c507ade-goog



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2025-12-01  9:47 ` [PATCHv2 1/7] " Sergey Senozhatsky
@ 2026-01-07  3:50   ` zhangdongdong
  2026-01-07  4:28     ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: zhangdongdong @ 2026-01-07  3:50 UTC (permalink / raw)
  To: Sergey Senozhatsky, Andrew Morton, Richard Chang, Minchan Kim
  Cc: Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Minchan Kim

On 12/1/25 17:47, Sergey Senozhatsky wrote:
> From: Richard Chang <richardycc@google.com>
> 
> zram stores all written back slots raw, which implies that
> during writeback zram first has to decompress slots (except
> for ZRAM_HUGE slots, which are raw already).  The problem
> with this approach is that not every written back page gets
> read back (either via read() or via page-fault), which means
> that zram basically wastes CPU cycles and battery decompressing
> such slots.  This changes with introduction of decompression
> on demand, in other words decompression on read()/page-fault.
> 
> One caveat of decompression on demand is that async read
> is completed in IRQ context, while zram decompression is
> sleepable.  To workaround this, read-back decompression
> is offloaded to a preemptible context - system high-prio
> work-queue.
> 
> At this point compressed writeback is still disabled,
> a follow up patch will introduce a new device attribute
> which will make it possible to toggle compressed writeback
> per-device.
> 
> [senozhatsky: rewrote original implementation]
> Signed-off-by: Richard Chang <richardycc@google.com>
> Co-developed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Suggested-by: Minchan Kim <minchan@google.com>
> Suggested-by: Brian Geffon <bgeffon@google.com>
> ---
>   drivers/block/zram/zram_drv.c | 279 +++++++++++++++++++++++++++-------
>   drivers/block/zram/zram_drv.h |   1 +
>   2 files changed, 227 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 5759823d6314..6263d300312e 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -57,9 +57,6 @@ static size_t huge_class_size;
>   static const struct block_device_operations zram_devops;
>   
>   static void zram_free_page(struct zram *zram, size_t index);
> -static int zram_read_from_zspool(struct zram *zram, struct page *page,
> -				 u32 index);
> -
>   #define slot_dep_map(zram, index) (&(zram)->table[(index)].dep_map)
>   
>   static void zram_slot_lock_init(struct zram *zram, u32 index)
> @@ -502,6 +499,10 @@ static ssize_t idle_store(struct device *dev,
>   #ifdef CONFIG_ZRAM_WRITEBACK
>   #define INVALID_BDEV_BLOCK		(~0UL)
>   
> +static int read_from_zspool_raw(struct zram *zram, struct page *page,
> +				u32 index);
> +static int read_from_zspool(struct zram *zram, struct page *page, u32 index);
> +
>   struct zram_wb_ctl {
>   	/* idle list is accessed only by the writeback task, no concurency */
>   	struct list_head idle_reqs;
> @@ -522,6 +523,22 @@ struct zram_wb_req {
>   	struct list_head entry;
>   };
>   
> +struct zram_rb_req {
> +	struct work_struct work;
> +	struct zram *zram;
> +	struct page *page;
> +	/* The read bio for backing device */
> +	struct bio *bio;
> +	unsigned long blk_idx;
> +	union {
> +		/* The original bio to complete (async read) */
> +		struct bio *parent;
> +		/* error status (sync read) */
> +		int error;
> +	};
> +	u32 index;
> +};
> +
>   static ssize_t writeback_limit_enable_store(struct device *dev,
>   					    struct device_attribute *attr,
>   					    const char *buf, size_t len)
> @@ -780,18 +797,6 @@ static void zram_release_bdev_block(struct zram *zram, unsigned long blk_idx)
>   	atomic64_dec(&zram->stats.bd_count);
>   }
>   
> -static void read_from_bdev_async(struct zram *zram, struct page *page,
> -			unsigned long entry, struct bio *parent)
> -{
> -	struct bio *bio;
> -
> -	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
> -	bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
> -	__bio_add_page(bio, page, PAGE_SIZE, 0);
> -	bio_chain(bio, parent);
> -	submit_bio(bio);
> -}
> -
>   static void release_wb_req(struct zram_wb_req *req)
>   {
>   	__free_page(req->page);
> @@ -886,8 +891,9 @@ static void zram_account_writeback_submit(struct zram *zram)
>   
>   static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
>   {
> -	u32 index = req->pps->index;
> -	int err;
> +	u32 size, index = req->pps->index;
> +	int err, prio;
> +	bool huge;
>   
>   	err = blk_status_to_errno(req->bio.bi_status);
>   	if (err) {
> @@ -914,9 +920,27 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
>   		goto out;
>   	}
>   
> +	if (zram->wb_compressed) {
> +		/*
> +		 * ZRAM_WB slots get freed, we need to preserve data required
> +		 * for read decompression.
> +		 */
> +		size = zram_get_obj_size(zram, index);
> +		prio = zram_get_priority(zram, index);
> +		huge = zram_test_flag(zram, index, ZRAM_HUGE);
> +	}
> +
>   	zram_free_page(zram, index);
>   	zram_set_flag(zram, index, ZRAM_WB);
>   	zram_set_handle(zram, index, req->blk_idx);
> +
> +	if (zram->wb_compressed) {
> +		if (huge)
> +			zram_set_flag(zram, index, ZRAM_HUGE);
> +		zram_set_obj_size(zram, index, size);
> +		zram_set_priority(zram, index, prio);
> +	}
> +
>   	atomic64_inc(&zram->stats.pages_stored);
>   
>   out:
> @@ -1050,7 +1074,11 @@ static int zram_writeback_slots(struct zram *zram,
>   		 */
>   		if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
>   			goto next;
> -		if (zram_read_from_zspool(zram, req->page, index))
> +		if (zram->wb_compressed)
> +			err = read_from_zspool_raw(zram, req->page, index);
> +		else
> +			err = read_from_zspool(zram, req->page, index);
> +		if (err)
>   			goto next;
>   		zram_slot_unlock(zram, index);
>   
> @@ -1313,24 +1341,140 @@ static ssize_t writeback_store(struct device *dev,
>   	return ret;
>   }
>   
> -struct zram_work {
> -	struct work_struct work;
> -	struct zram *zram;
> -	unsigned long entry;
> -	struct page *page;
> -	int error;
> -};
> +static int decompress_bdev_page(struct zram *zram, struct page *page, u32 index)
> +{
> +	struct zcomp_strm *zstrm;
> +	unsigned int size;
> +	int ret, prio;
> +	void *src;
> +
> +	zram_slot_lock(zram, index);
> +	/* Since slot was unlocked we need to make sure it's still ZRAM_WB */
> +	if (!zram_test_flag(zram, index, ZRAM_WB)) {
> +		zram_slot_unlock(zram, index);
> +		/* We read some stale data, zero it out */
> +		memset_page(page, 0, 0, PAGE_SIZE);
> +		return -EIO;
> +	}
> +
> +	if (zram_test_flag(zram, index, ZRAM_HUGE)) {
> +		zram_slot_unlock(zram, index);
> +		return 0;
> +	}
> +
> +	size = zram_get_obj_size(zram, index);
> +	prio = zram_get_priority(zram, index);
>   
> -static void zram_sync_read(struct work_struct *work)
> +	zstrm = zcomp_stream_get(zram->comps[prio]);
> +	src = kmap_local_page(page);
> +	ret = zcomp_decompress(zram->comps[prio], zstrm, src, size,
> +			       zstrm->local_copy);
> +	if (!ret)
> +		copy_page(src, zstrm->local_copy);
> +	kunmap_local(src);
> +	zcomp_stream_put(zstrm);
> +	zram_slot_unlock(zram, index);
> +
> +	return ret;
> +}
> +
> +static void zram_deferred_decompress(struct work_struct *w)
>   {
> -	struct zram_work *zw = container_of(work, struct zram_work, work);
> +	struct zram_rb_req *req = container_of(w, struct zram_rb_req, work);
> +	struct page *page = bio_first_page_all(req->bio);
> +	struct zram *zram = req->zram;
> +	u32 index = req->index;
> +	int ret;
> +
> +	ret = decompress_bdev_page(zram, page, index);
> +	if (ret)
> +		req->parent->bi_status = BLK_STS_IOERR;
> +
> +	/* Decrement parent's ->remaining */
> +	bio_endio(req->parent);
> +	bio_put(req->bio);
> +	kfree(req);
> +}
> +
> +static void zram_async_read_endio(struct bio *bio)
> +{
> +	struct zram_rb_req *req = bio->bi_private;
> +	struct zram *zram = req->zram;
> +
> +	if (bio->bi_status) {
> +		req->parent->bi_status = bio->bi_status;
> +		bio_endio(req->parent);
> +		bio_put(bio);
> +		kfree(req);
> +		return;
> +	}
> +
> +	/*
> +	 * NOTE: zram_async_read_endio() is not exactly right place for this.
> +	 * Ideally, we need to do it after ZRAM_WB check, but this requires
> +	 * us to use wq path even on systems that don't enable compressed
> +	 * writeback, because we cannot take slot-lock in the current context.
> +	 *
> +	 * Keep the existing behavior for now.
> +	 */
> +	if (zram->wb_compressed == false) {
> +		/* No decompression needed, complete the parent IO */
> +		bio_endio(req->parent);
> +		bio_put(bio);
> +		kfree(req);
> +		return;
> +	}
> +
> +	/*
> +	 * zram decompression is sleepable, so we need to deffer it to
> +	 * a preemptible context.
> +	 */
> +	INIT_WORK(&req->work, zram_deferred_decompress);
> +	queue_work(system_highpri_wq, &req->work);
> +}
> +
> +static void read_from_bdev_async(struct zram *zram, struct page *page,
> +				 u32 index, unsigned long blk_idx,
> +				 struct bio *parent)
> +{
> +	struct zram_rb_req *req;
> +	struct bio *bio;
> +
> +	req = kmalloc(sizeof(*req), GFP_NOIO);
> +	if (!req)
> +		return;
> +
> +	bio = bio_alloc(zram->bdev, 1, parent->bi_opf, GFP_NOIO);
> +	if (!bio) {
> +		kfree(req);
> +		return;
> +	}
> +
> +	req->zram = zram;
> +	req->index = index;
> +	req->blk_idx = blk_idx;
> +	req->bio = bio;
> +	req->parent = parent;
> +
> +	bio->bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
> +	bio->bi_private = req;
> +	bio->bi_end_io = zram_async_read_endio;
> +
> +	__bio_add_page(bio, page, PAGE_SIZE, 0);
> +	bio_inc_remaining(parent);
> +	submit_bio(bio);
> +}
> +
> +static void zram_sync_read(struct work_struct *w)
> +{
> +	struct zram_rb_req *req = container_of(w, struct zram_rb_req, work);
>   	struct bio_vec bv;
>   	struct bio bio;
>   
> -	bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ);
> -	bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
> -	__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
> -	zw->error = submit_bio_wait(&bio);
> +	bio_init(&bio, req->zram->bdev, &bv, 1, REQ_OP_READ);
> +	bio.bi_iter.bi_sector = req->blk_idx * (PAGE_SIZE >> 9);
> +	__bio_add_page(&bio, req->page, PAGE_SIZE, 0);
> +	req->error = submit_bio_wait(&bio);
>   }
>   
>   /*
> @@ -1338,39 +1482,42 @@ static void zram_sync_read(struct work_struct *work)
>    * chained IO with parent IO in same context, it's a deadlock. To avoid that,
>    * use a worker thread context.
>    */
> -static int read_from_bdev_sync(struct zram *zram, struct page *page,
> -				unsigned long entry)
> +static int read_from_bdev_sync(struct zram *zram, struct page *page, u32 index,
> +			       unsigned long blk_idx)
>   {
> -	struct zram_work work;
> +	struct zram_rb_req req;
>   
> -	work.page = page;
> -	work.zram = zram;
> -	work.entry = entry;
> +	req.page = page;
> +	req.zram = zram;
> +	req.blk_idx = blk_idx;
>   
> -	INIT_WORK_ONSTACK(&work.work, zram_sync_read);
> -	queue_work(system_dfl_wq, &work.work);
> -	flush_work(&work.work);
> -	destroy_work_on_stack(&work.work);
> +	INIT_WORK_ONSTACK(&req.work, zram_sync_read);
> +	queue_work(system_dfl_wq, &req.work);
> +	flush_work(&req.work);
> +	destroy_work_on_stack(&req.work);

Hi Sergey,

Thanks for the work on decompression-on-demand.

One concern I’d like to raise is the use of a workqueue for readback
decompression. In our measurements, deferring decompression to a worker
introduces non-trivial scheduling overhead, and under memory pressure
the added latency can be noticeable (tens of milliseconds in some cases).

This makes the swap-in read path more sensitive to scheduler behavior.
It may be worth considering whether the decompression can be placed in a
context that avoids this extra scheduling hop, for example by moving the
decompression closer to the swap layer.

Thanks,
dongdong

>   
> -	return work.error;
> +	if (req.error || zram->wb_compressed == false)
> +		return req.error;
> +
> +	return decompress_bdev_page(zram, page, index);
>   }
>   
> -static int read_from_bdev(struct zram *zram, struct page *page,
> -			unsigned long entry, struct bio *parent)
> +static int read_from_bdev(struct zram *zram, struct page *page, u32 index,
> +			  unsigned long blk_idx, struct bio *parent)
>   {
>   	atomic64_inc(&zram->stats.bd_reads);
>   	if (!parent) {
>   		if (WARN_ON_ONCE(!IS_ENABLED(ZRAM_PARTIAL_IO)))
>   			return -EIO;
> -		return read_from_bdev_sync(zram, page, entry);
> +		return read_from_bdev_sync(zram, page, index, blk_idx);
>   	}
> -	read_from_bdev_async(zram, page, entry, parent);
> +	read_from_bdev_async(zram, page, index, blk_idx, parent);
>   	return 0;
>   }
>   #else
>   static inline void reset_bdev(struct zram *zram) {};
> -static int read_from_bdev(struct zram *zram, struct page *page,
> -			unsigned long entry, struct bio *parent)
> +static int read_from_bdev(struct zram *zram, struct page *page, u32 index,
> +			  unsigned long blk_idx, struct bio *parent)
>   {
>   	return -EIO;
>   }
> @@ -1977,12 +2124,37 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
>   	return ret;
>   }
>   
> +#if defined CONFIG_ZRAM_WRITEBACK
> +static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
> +{
> +	struct zcomp_strm *zstrm;
> +	unsigned long handle;
> +	unsigned int size;
> +	void *src;
> +
> +	handle = zram_get_handle(zram, index);
> +	size = zram_get_obj_size(zram, index);
> +
> +	/*
> +	 * We need to get stream just for ->local_copy buffer, in
> +	 * case if object spans two physical pages. No decompression
> +	 * takes place here, as we read raw compressed data.
> +	 */
> +	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> +	memcpy_to_page(page, 0, src, size);
> +	zs_obj_read_end(zram->mem_pool, handle, src);
> +	zcomp_stream_put(zstrm);
> +
> +	return 0;
> +}
> +#endif
> +
>   /*
>    * Reads (decompresses if needed) a page from zspool (zsmalloc).
>    * Corresponding ZRAM slot should be locked.
>    */
> -static int zram_read_from_zspool(struct zram *zram, struct page *page,
> -				 u32 index)
> +static int read_from_zspool(struct zram *zram, struct page *page, u32 index)
>   {
>   	if (zram_test_flag(zram, index, ZRAM_SAME) ||
>   	    !zram_get_handle(zram, index))
> @@ -2002,7 +2174,7 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
>   	zram_slot_lock(zram, index);
>   	if (!zram_test_flag(zram, index, ZRAM_WB)) {
>   		/* Slot should be locked through out the function call */
> -		ret = zram_read_from_zspool(zram, page, index);
> +		ret = read_from_zspool(zram, page, index);
>   		zram_slot_unlock(zram, index);
>   	} else {
>   		unsigned long blk_idx = zram_get_handle(zram, index);
> @@ -2012,7 +2184,7 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
>   		 * device.
>   		 */
>   		zram_slot_unlock(zram, index);
> -		ret = read_from_bdev(zram, page, blk_idx, parent);
> +		ret = read_from_bdev(zram, page, index, blk_idx, parent);
>   	}
>   
>   	/* Should NEVER happen. Return bio error if it does. */
> @@ -2273,7 +2445,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
>   	if (comp_len_old < threshold)
>   		return 0;
>   
> -	ret = zram_read_from_zspool(zram, page, index);
> +	ret = read_from_zspool(zram, page, index);
>   	if (ret)
>   		return ret;
>   
> @@ -2960,6 +3132,7 @@ static int zram_add(void)
>   	init_rwsem(&zram->init_lock);
>   #ifdef CONFIG_ZRAM_WRITEBACK
>   	zram->wb_batch_size = 32;
> +	zram->wb_compressed = false;
>   #endif
>   
>   	/* gendisk structure */
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index c6d94501376c..72fdf66c78ab 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -128,6 +128,7 @@ struct zram {
>   #ifdef CONFIG_ZRAM_WRITEBACK
>   	struct file *backing_dev;
>   	bool wb_limit_enable;
> +	bool wb_compressed;
>   	u32 wb_batch_size;
>   	u64 bd_wb_limit;
>   	struct block_device *bdev;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2026-01-07  3:50   ` zhangdongdong
@ 2026-01-07  4:28     ` Sergey Senozhatsky
  2026-01-07  7:28       ` zhangdongdong
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2026-01-07  4:28 UTC (permalink / raw)
  To: zhangdongdong
  Cc: Sergey Senozhatsky, Andrew Morton, Richard Chang, Minchan Kim,
	Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Minchan Kim

On (26/01/07 11:50), zhangdongdong wrote:
> Hi Sergey,
> 
> Thanks for the work on decompression-on-demand.
> 
> One concern I’d like to raise is the use of a workqueue for readback
> decompression. In our measurements, deferring decompression to a worker
> introduces non-trivial scheduling overhead, and under memory pressure
> the added latency can be noticeable (tens of milliseconds in some cases).

The problem is those bio completions happen in atomic context, and zram
requires both compression and decompression to be non-atomic.  And we
can't do sync read on the zram side, because those bio-s are chained.
So the current plan is to look how system hi-prio per-cpu workqueue
will handle this.

Did you try high priority workqueue?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2026-01-07  4:28     ` Sergey Senozhatsky
@ 2026-01-07  7:28       ` zhangdongdong
  2026-01-07 10:14         ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: zhangdongdong @ 2026-01-07  7:28 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Richard Chang, Minchan Kim, Brian Geffon,
	David Stevens, linux-kernel, linux-mm, linux-block, Minchan Kim

On 1/7/26 12:28, Sergey Senozhatsky wrote:
> On (26/01/07 11:50), zhangdongdong wrote:
>> Hi Sergey,
>>
>> Thanks for the work on decompression-on-demand.
>>
>> One concern I’d like to raise is the use of a workqueue for readback
>> decompression. In our measurements, deferring decompression to a worker
>> introduces non-trivial scheduling overhead, and under memory pressure
>> the added latency can be noticeable (tens of milliseconds in some cases).
> 
> The problem is those bio completions happen in atomic context, and zram
> requires both compression and decompression to be non-atomic.  And we
> can't do sync read on the zram side, because those bio-s are chained.
> So the current plan is to look how system hi-prio per-cpu workqueue
> will handle this.
> 
> Did you try high priority workqueue?
> 
Hi,Sergey

Yes, we have tried high priority workqueues. In fact, our current
implementation already uses a dedicated workqueue created with
WQ_HIGHPRI and marked as UNBOUND, which handles the read/decompression
path for swap-in.

Below is a simplified snippet of the queue we are currently using:

zgroup_read_wq = alloc_workqueue("zgroup_read",
				 WQ_HIGHPRI | WQ_UNBOUND, 0);

static int zgroup_submit_zio_async(struct zgroup_io *zio,
				   struct zram_group *zgroup)
{
	struct zgroup_req req = {
		.zio = zio,
	};

	if (!zgroup_io_step_chg(zio, ZIO_STARTED, ZIO_INFLIGHT)) {
		wait_for_completion(&zio->wait);
		if (zio->status)
			zgroup_put_io(zio);
		return zio->status;
	}

	INIT_WORK_ONSTACK(&req.work, zgroup_submit_zio_work);
	queue_work(zgroup_read_wq, &req.work);
	flush_work(&req.work);
	destroy_work_on_stack(&req.work);

	return req.status ?: zgroup_decrypt_pages(zio);
}

Thanks,
dongdong


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2026-01-07  7:28       ` zhangdongdong
@ 2026-01-07 10:14         ` Sergey Senozhatsky
  2026-01-08  2:57           ` zhangdongdong
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2026-01-07 10:14 UTC (permalink / raw)
  To: zhangdongdong
  Cc: Sergey Senozhatsky, Andrew Morton, Richard Chang, Minchan Kim,
	Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Minchan Kim

On (26/01/07 15:28), zhangdongdong wrote:
> Hi,Sergey
> 
> Yes, we have tried high priority workqueues. In fact, our current
> implementation already uses a dedicated workqueue created with
> WQ_HIGHPRI and marked as UNBOUND, which handles the read/decompression
> path for swap-in.
> 
> Below is a simplified snippet of the queue we are currently using:
> 
> zgroup_read_wq = alloc_workqueue("zgroup_read",
> 				 WQ_HIGHPRI | WQ_UNBOUND, 0);
> 
> static int zgroup_submit_zio_async(struct zgroup_io *zio,
> 				   struct zram_group *zgroup)
> {
> 	struct zgroup_req req = {
> 		.zio = zio,
> 	};
> 

zgroup... That certainly looks like a lot of downstream code ;)

Do you use any strategies for writeback?  Compressed writeback
is supposed to be used for apps for which latency is not critical
or sensitive, because of on-demand decompression costs.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2026-01-07 10:14         ` Sergey Senozhatsky
@ 2026-01-08  2:57           ` zhangdongdong
  2026-01-08  3:39             ` Sergey Senozhatsky
  0 siblings, 1 reply; 15+ messages in thread
From: zhangdongdong @ 2026-01-08  2:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Richard Chang, Minchan Kim, Brian Geffon,
	David Stevens, linux-kernel, linux-mm, linux-block, Minchan Kim

On 1/7/26 18:14, Sergey Senozhatsky wrote:
> On (26/01/07 15:28), zhangdongdong wrote:
>> Hi,Sergey
>>
>> Yes, we have tried high priority workqueues. In fact, our current
>> implementation already uses a dedicated workqueue created with
>> WQ_HIGHPRI and marked as UNBOUND, which handles the read/decompression
>> path for swap-in.
>>
>> Below is a simplified snippet of the queue we are currently using:
>>
>> zgroup_read_wq = alloc_workqueue("zgroup_read",
>> 				 WQ_HIGHPRI | WQ_UNBOUND, 0);
>>
>> static int zgroup_submit_zio_async(struct zgroup_io *zio,
>> 				   struct zram_group *zgroup)
>> {
>> 	struct zgroup_req req = {
>> 		.zio = zio,
>> 	};
>>
> 
> zgroup... That certainly looks like a lot of downstream code ;)
> 
> Do you use any strategies for writeback?  Compressed writeback
> is supposed to be used for apps for which latency is not critical
> or sensitive, because of on-demand decompression costs.
> 

Hi Sergey,

Sorry for the delayed reply — I had some urgent matters come up and only
got back to this now ;)

Yes, we do use writeback strategies on our side. The current 
implementation focuses on batched writeback of compressed data from
zram, managed on a per-app / per-memcg basis. We track and control how
much data from each app is written back to the backing storage, with the
same assumption you mentioned: compressed writeback is primarily
intended for workloads where latency is not critical.

Accurate prefetching on swap-in is still an open problem for us. As you
pointed out, both the I/O itself and on-demand decompression introduce
additional latency on the readback path, and minimizing their impact
remains challenging.

Regarding the workqueue choice: initially we used system_dfl_wq for the
read/decompression path. Later, based on observed scheduling latency
under memory pressure, we switched to a dedicated workqueue created with
WQ_HIGHPRI | WQ_UNBOUND. This change helped reduce scheduling
interference, but it also reinforced our concern that deferring
decompression to a worker still adds an extra scheduling hop on the
swap-in path.

Best regards,
dongdong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2026-01-08  2:57           ` zhangdongdong
@ 2026-01-08  3:39             ` Sergey Senozhatsky
  2026-01-08 10:36               ` zhangdongdong
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Senozhatsky @ 2026-01-08  3:39 UTC (permalink / raw)
  To: zhangdongdong, Jens Axboe
  Cc: Sergey Senozhatsky, Andrew Morton, Richard Chang, Minchan Kim,
	Brian Geffon, David Stevens, linux-kernel, linux-mm, linux-block,
	Minchan Kim

Hi,

On (26/01/08 10:57), zhangdongdong wrote:
> > Do you use any strategies for writeback?  Compressed writeback
> > is supposed to be used for apps for which latency is not critical
> > or sensitive, because of on-demand decompression costs.
> > 
> 
> Hi Sergey,
> 
> Sorry for the delayed reply — I had some urgent matters come up and only
> got back to this now ;)

No worries, you reply in a perfectly reasonable time frame.

> Yes, we do use writeback strategies on our side. The current implementation
> focuses on batched writeback of compressed data from
> zram, managed on a per-app / per-memcg basis. We track and control how
> much data from each app is written back to the backing storage, with the
> same assumption you mentioned: compressed writeback is primarily
> intended for workloads where latency is not critical.
> 
> Accurate prefetching on swap-in is still an open problem for us. As you
> pointed out, both the I/O itself and on-demand decompression introduce
> additional latency on the readback path, and minimizing their impact
> remains challenging.
> 
> Regarding the workqueue choice: initially we used system_dfl_wq for the
> read/decompression path. Later, based on observed scheduling latency
> under memory pressure, we switched to a dedicated workqueue created with
> WQ_HIGHPRI | WQ_UNBOUND. This change helped reduce scheduling
> interference, but it also reinforced our concern that deferring
> decompression to a worker still adds an extra scheduling hop on the
> swap-in path.

How bad (and often) is your memory pressure situation?  I just wonder
if your case is an outlier, so to speak.

Just thinking aloud:

I really don't see a path back to atomic zram read/write.  Those
were very painful and problematic, I do not consider a possibility
of re-introducing them, especially if the reason is an optional
feature (which comp-wb is).  If we want to improve latency, we need
to find a way to do it without going back to atomic read/write,
assuming that latency becomes unbearable.  But at the same time under
memory pressure everything becomes janky at some point, so I don't
know if comp-wb latency is the biggest problem in that case.

Dunno, *maybe* we can explore a possibility of grabbing both entry-lock
and per-CPU compression stream before we queue async bio, so that in
the bio completion we already *sort of* have everything we need.
However, that comes with a bunch of issues:

- the number of per-CPU compression streams is limited, naturally,
  to the number of CPUs.  So if we have a bunch of comp-wb reads we
  can block all other activities: normal zram reads/writes, which
  compete for the same per-CPU compressions streams.

- this still puts atomicity requirements on the compressors.  I haven't
  looked into, for instance, zstd *de*-compression code, but I know for
  sure that zstd compression code allocates memory internally when
  configured to use pre-trained CD-dictionaries, effectively making zstd
  use GFP_ATOMIC allocations internally, if called from atomic context.
  Do we have anything like that in decompression - I don't know.  But in
  general we cannot be sure that all compressors work in atomic context
  in the same way as they do in non-atomic context.

I don't know if solving it on zram side alone is possible.  Maybe we
can get some help from the block layer: some sort of two-stage bio
submission.  First stage: submit chained bio-s, second stage: iterate
over all submitted and completed bio-s and decompress the data.  Again,
just thinking out loud.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 1/7] zram: introduce compressed data writeback
  2026-01-08  3:39             ` Sergey Senozhatsky
@ 2026-01-08 10:36               ` zhangdongdong
  0 siblings, 0 replies; 15+ messages in thread
From: zhangdongdong @ 2026-01-08 10:36 UTC (permalink / raw)
  To: Sergey Senozhatsky, Jens Axboe
  Cc: Andrew Morton, Richard Chang, Minchan Kim, Brian Geffon,
	David Stevens, linux-kernel, linux-mm, linux-block, Minchan Kim,
	xiongping1, huangjianan, wanghui33


On 1/8/26 11:39, Sergey Senozhatsky wrote:
> Hi,
> 
> On (26/01/08 10:57), zhangdongdong wrote:
>>> Do you use any strategies for writeback?  Compressed writeback
>>> is supposed to be used for apps for which latency is not critical
>>> or sensitive, because of on-demand decompression costs.
>>>
>>
>> Hi Sergey,
>>
>> Sorry for the delayed reply — I had some urgent matters come up and only
>> got back to this now ;)
> 
> No worries, you reply in a perfectly reasonable time frame.
> 
>> Yes, we do use writeback strategies on our side. The current implementation
>> focuses on batched writeback of compressed data from
>> zram, managed on a per-app / per-memcg basis. We track and control how
>> much data from each app is written back to the backing storage, with the
>> same assumption you mentioned: compressed writeback is primarily
>> intended for workloads where latency is not critical.
>>
>> Accurate prefetching on swap-in is still an open problem for us. As you
>> pointed out, both the I/O itself and on-demand decompression introduce
>> additional latency on the readback path, and minimizing their impact
>> remains challenging.
>>
>> Regarding the workqueue choice: initially we used system_dfl_wq for the
>> read/decompression path. Later, based on observed scheduling latency
>> under memory pressure, we switched to a dedicated workqueue created with
>> WQ_HIGHPRI | WQ_UNBOUND. This change helped reduce scheduling
>> interference, but it also reinforced our concern that deferring
>> decompression to a worker still adds an extra scheduling hop on the
>> swap-in path.
> 
> How bad (and often) is your memory pressure situation?  I just wonder
> if your case is an outlier, so to speak.
> 
> 
> Just thinking aloud:
> 
> I really don't see a path back to atomic zram read/write.  Those
> were very painful and problematic, I do not consider a possibility
> of re-introducing them, especially if the reason is an optional
> feature (which comp-wb is).  If we want to improve latency, we need
> to find a way to do it without going back to atomic read/write,
> assuming that latency becomes unbearable.  But at the same time under
> memory pressure everything becomes janky at some point, so I don't
> know if comp-wb latency is the biggest problem in that case.
> 
> Dunno, *maybe* we can explore a possibility of grabbing both entry-lock
> and per-CPU compression stream before we queue async bio, so that in
> the bio completion we already *sort of* have everything we need.
> However, that comes with a bunch of issues:
> 
> - the number of per-CPU compression streams is limited, naturally,
>    to the number of CPUs.  So if we have a bunch of comp-wb reads we
>    can block all other activities: normal zram reads/writes, which
>    compete for the same per-CPU compressions streams.
> 
> - this still puts atomicity requirements on the compressors.  I haven't
>    looked into, for instance, zstd *de*-compression code, but I know for
>    sure that zstd compression code allocates memory internally when
>    configured to use pre-trained CD-dictionaries, effectively making zstd
>    use GFP_ATOMIC allocations internally, if called from atomic context.
>    Do we have anything like that in decompression - I don't know.  But in
>    general we cannot be sure that all compressors work in atomic context
>    in the same way as they do in non-atomic context.
> 
> I don't know if solving it on zram side alone is possible.  Maybe we
> can get some help from the block layer: some sort of two-stage bio
> submission.  First stage: submit chained bio-s, second stage: iterate
> over all submitted and completed bio-s and decompress the data.  Again,
> just thinking out loud.
> 

Hi Sergey,

My thinking is largely aligned with yours. I agree that relying on zram
alone is unlikely to fully solve this problem, especially without going
back to atomic read/write.

Our current mitigation approach is to introduce a hook at the swap layer
and move decompression there. By doing so, decompression happens in a
fully sleepable context, which avoids the atomic-context constraints
you outlined. This helps us sidestep the core issue rather than trying
to force decompression back into zram completion paths.

For reference, this is the change we are experimenting with:
https://android-review.googlesource.com/c/kernel/common/+/3724447

I also noticed that Richard proposed a similar optimization hook recently:
https://android-review.googlesource.com/c/kernel/common/+/3730147

Regarding your question about memory pressure: our current test case
runs on an 8 GB device, with around 50 apps being launched sequentially.
This creates fairly heavy memory pressure. In earlier tests using an
async kworker-based approach, we observed an average latency of about
1.3 ms,but with tail latencies occasionally reaching 30–100 ms.

If I recall correctly, this issue first became noticeable after a block
layer change was merged; I can try to dig that up and share more details
later.

Best regards,
dongdong



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-01-08 10:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-01  9:47 [PATCHv2 0/7] zram: introduce compressed data writeback Sergey Senozhatsky
2025-12-01  9:47 ` [PATCHv2 1/7] " Sergey Senozhatsky
2026-01-07  3:50   ` zhangdongdong
2026-01-07  4:28     ` Sergey Senozhatsky
2026-01-07  7:28       ` zhangdongdong
2026-01-07 10:14         ` Sergey Senozhatsky
2026-01-08  2:57           ` zhangdongdong
2026-01-08  3:39             ` Sergey Senozhatsky
2026-01-08 10:36               ` zhangdongdong
2025-12-01  9:47 ` [PATCHv2 2/7] zram: introduce writeback_compressed device attribute Sergey Senozhatsky
2025-12-01  9:47 ` [PATCHv2 3/7] zram: document writeback_batch_size Sergey Senozhatsky
2025-12-01  9:47 ` [PATCHv2 4/7] zram: move bd_stat to writeback section Sergey Senozhatsky
2025-12-01  9:47 ` [PATCHv2 5/7] zram: rename zram_free_page() Sergey Senozhatsky
2025-12-01  9:47 ` [PATCHv2 6/7] zram: switch to guard() for init_lock Sergey Senozhatsky
2025-12-01  9:47 ` [PATCHv2 7/7] zram: consolidate device-attr declarations Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox