[PATCHv2 0/4] zram: introduce writeback bio batching

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/4] zram: introduce writeback bio batching
@ 2025-11-13  8:53 Sergey Senozhatsky
  2025-11-13  8:53 ` [PATCHv2 1/4] zram: introduce writeback bio batching support Sergey Senozhatsky
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-13  8:53 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
  Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

This is a rework of Yuwen's patch [1] that adds writeback bio
batching support to zram, which can improve throughput of
writeback operation.

v1->v2:
- moved pp-slot ownership to req before submission (pp_ctl doesn't own it
  anymore after it's assigned to the req)
- do not take spin-lock in bach_limit handler
- dropped wb_limit_lock
- moved wb_limt accounting to pre-submit and completion stages
- introduced simple wb_limit helpers
- more comments and cleanups

[1] https://lore.kernel.org/linux-block/tencent_0FBBFC8AE0B97BC63B5D47CE1FF2BABFDA09@qq.com/

Sergey Senozhatsky (3):
  zram: add writeback batch size device attr
  zram: take write lock in wb limit store handlers
  zram: drop wb_limit_lock

Yuwen Chen (1):
  zram: introduce writeback bio batching support

 drivers/block/zram/zram_drv.c | 410 +++++++++++++++++++++++++++-------
 drivers/block/zram/zram_drv.h |   2 +-
 2 files changed, 326 insertions(+), 86 deletions(-)

-- 
2.51.2.1041.gc1ab5b90ca-goog



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-13  8:53 [PATCHv2 0/4] zram: introduce writeback bio batching Sergey Senozhatsky
@ 2025-11-13  8:53 ` Sergey Senozhatsky
  2025-11-13 23:45   ` Minchan Kim
  2025-11-13  8:54 ` [PATCHv2 2/4] zram: add writeback batch size device attr Sergey Senozhatsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-13  8:53 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
  Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky, Minchan Kim

From: Yuwen Chen <ywen.chen@foxmail.com>

Currently, zram writeback supports only a single bio writeback
operation, waiting for bio completion before post-processing
next pp-slot.  This works, in general, but has certain throughput
limitations.  Implement batched (multiple) bio writeback support
to take advantage of parallel requests processing and better
requests scheduling.

For the time being the writeback batch size (maximum number of
in-flight bio requests) is set to 1, so the behaviors is the
same as the previous single-bio writeback.  This is addressed
in a follow up patch, which adds a writeback_batch_size device
attribute.

Please refer to [1] and [2] for benchmarks.

[1] https://lore.kernel.org/linux-block/tencent_B2DC37E3A2AED0E7F179365FCB5D82455B08@qq.com
[2] https://lore.kernel.org/linux-block/tencent_0FBBFC8AE0B97BC63B5D47CE1FF2BABFDA09@qq.com

[senozhatsky: significantly reworked the initial patch so that the
approach and implementation resemble current zram post-processing
code]

Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Co-developed-by: Richard Chang <richardycc@google.com>
Suggested-by: Minchan Kim <minchan@google.com>
---
 drivers/block/zram/zram_drv.c | 343 +++++++++++++++++++++++++++-------
 1 file changed, 278 insertions(+), 65 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a43074657531..a0a939fd9d31 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -734,20 +734,226 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
 	submit_bio(bio);
 }
 
-static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
-{
-	unsigned long blk_idx = 0;
-	struct page *page = NULL;
+struct zram_wb_ctl {
+	struct list_head idle_reqs;
+	struct list_head inflight_reqs;
+
+	atomic_t num_inflight;
+	struct completion done;
+	struct blk_plug plug;
+};
+
+struct zram_wb_req {
+	unsigned long blk_idx;
+	struct page *page;
 	struct zram_pp_slot *pps;
 	struct bio_vec bio_vec;
 	struct bio bio;
-	int ret = 0, err;
+
+	struct list_head entry;
+};
+
+static void release_wb_req(struct zram_wb_req *req)
+{
+	__free_page(req->page);
+	kfree(req);
+}
+
+static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
+{
+	/* We should never have inflight requests at this point */
+	WARN_ON(!list_empty(&wb_ctl->inflight_reqs));
+
+	while (!list_empty(&wb_ctl->idle_reqs)) {
+		struct zram_wb_req *req;
+
+		req = list_first_entry(&wb_ctl->idle_reqs,
+				       struct zram_wb_req, entry);
+		list_del(&req->entry);
+		release_wb_req(req);
+	}
+
+	kfree(wb_ctl);
+}
+
+/* XXX: should be a per-device sysfs attr */
+#define ZRAM_WB_REQ_CNT 1
+
+static struct zram_wb_ctl *init_wb_ctl(void)
+{
+	struct zram_wb_ctl *wb_ctl;
+	int i;
+
+	wb_ctl = kmalloc(sizeof(*wb_ctl), GFP_KERNEL);
+	if (!wb_ctl)
+		return NULL;
+
+	INIT_LIST_HEAD(&wb_ctl->idle_reqs);
+	INIT_LIST_HEAD(&wb_ctl->inflight_reqs);
+	atomic_set(&wb_ctl->num_inflight, 0);
+	init_completion(&wb_ctl->done);
+
+	for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
+		struct zram_wb_req *req;
+
+		/*
+		 * This is fatal condition only if we couldn't allocate
+		 * any requests at all.  Otherwise we just work with the
+		 * requests that we have successfully allocated, so that
+		 * writeback can still proceed, even if there is only one
+		 * request on the idle list.
+		 */
+		req = kzalloc(sizeof(*req), GFP_NOIO | __GFP_NOWARN);
+		if (!req)
+			break;
+
+		req->page = alloc_page(GFP_NOIO | __GFP_NOWARN);
+		if (!req->page) {
+			kfree(req);
+			break;
+		}
+
+		INIT_LIST_HEAD(&req->entry);
+		list_add(&req->entry, &wb_ctl->idle_reqs);
+	}
+
+	/* We couldn't allocate any requests, so writeabck is not possible */
+	if (list_empty(&wb_ctl->idle_reqs))
+		goto release_wb_ctl;
+
+	return wb_ctl;
+
+release_wb_ctl:
+	release_wb_ctl(wb_ctl);
+	return NULL;
+}
+
+static void zram_account_writeback_rollback(struct zram *zram)
+{
+	spin_lock(&zram->wb_limit_lock);
+	if (zram->wb_limit_enable)
+		zram->bd_wb_limit +=  1UL << (PAGE_SHIFT - 12);
+	spin_unlock(&zram->wb_limit_lock);
+}
+
+static void zram_account_writeback_submit(struct zram *zram)
+{
+	spin_lock(&zram->wb_limit_lock);
+	if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
+		zram->bd_wb_limit -=  1UL << (PAGE_SHIFT - 12);
+	spin_unlock(&zram->wb_limit_lock);
+}
+
+static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
+{
 	u32 index;
+	int err;
 
-	page = alloc_page(GFP_KERNEL);
-	if (!page)
-		return -ENOMEM;
+	index = req->pps->index;
+	release_pp_slot(zram, req->pps);
+	req->pps = NULL;
+
+	err = blk_status_to_errno(req->bio.bi_status);
+	if (err) {
+		/*
+		 * Failed wb requests should not be accounted in wb_limit
+		 * (if enabled).
+		 */
+		zram_account_writeback_rollback(zram);
+		return err;
+	}
 
+	atomic64_inc(&zram->stats.bd_writes);
+	zram_slot_lock(zram, index);
+	/*
+	 * We release slot lock during writeback so slot can change under us:
+	 * slot_free() or slot_free() and zram_write_page(). In both cases
+	 * slot loses ZRAM_PP_SLOT flag. No concurrent post-processing can
+	 * set ZRAM_PP_SLOT on such slots until current post-processing
+	 * finishes.
+	 */
+	if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
+		goto out;
+
+	zram_free_page(zram, index);
+	zram_set_flag(zram, index, ZRAM_WB);
+	zram_set_handle(zram, index, req->blk_idx);
+	atomic64_inc(&zram->stats.pages_stored);
+
+out:
+	zram_slot_unlock(zram, index);
+	return 0;
+}
+
+static void zram_writeback_endio(struct bio *bio)
+{
+	struct zram_wb_ctl *wb_ctl = bio->bi_private;
+
+	if (atomic_dec_return(&wb_ctl->num_inflight) == 0)
+		complete(&wb_ctl->done);
+}
+
+static void zram_submit_wb_request(struct zram *zram,
+				   struct zram_wb_ctl *wb_ctl,
+				   struct zram_wb_req *req)
+{
+	/*
+	 * wb_limit (if enabled) should be adjusted before submission,
+	 * so that we don't over-submit.
+	 */
+	zram_account_writeback_submit(zram);
+	atomic_inc(&wb_ctl->num_inflight);
+	list_add_tail(&req->entry, &wb_ctl->inflight_reqs);
+	submit_bio(&req->bio);
+}
+
+static struct zram_wb_req *select_idle_req(struct zram_wb_ctl *wb_ctl)
+{
+	struct zram_wb_req *req;
+
+	req = list_first_entry_or_null(&wb_ctl->idle_reqs,
+				       struct zram_wb_req, entry);
+	if (req)
+		list_del(&req->entry);
+	return req;
+}
+
+static int zram_wb_wait_for_completion(struct zram *zram,
+				       struct zram_wb_ctl *wb_ctl)
+{
+	int ret = 0;
+
+	if (atomic_read(&wb_ctl->num_inflight))
+		wait_for_completion_io(&wb_ctl->done);
+
+	reinit_completion(&wb_ctl->done);
+	while (!list_empty(&wb_ctl->inflight_reqs)) {
+		struct zram_wb_req *req;
+		int err;
+
+		req = list_first_entry(&wb_ctl->inflight_reqs,
+				       struct zram_wb_req, entry);
+		list_move(&req->entry, &wb_ctl->idle_reqs);
+
+		err = zram_writeback_complete(zram, req);
+		if (err)
+			ret = err;
+	}
+
+	return ret;
+}
+
+static int zram_writeback_slots(struct zram *zram,
+				struct zram_pp_ctl *ctl,
+				struct zram_wb_ctl *wb_ctl)
+{
+	struct zram_wb_req *req = NULL;
+	unsigned long blk_idx = 0;
+	struct zram_pp_slot *pps;
+	int ret = 0, err;
+	u32 index = 0;
+
+	blk_start_plug(&wb_ctl->plug);
 	while ((pps = select_pp_slot(ctl))) {
 		spin_lock(&zram->wb_limit_lock);
 		if (zram->wb_limit_enable && !zram->bd_wb_limit) {
@@ -757,6 +963,26 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
 		}
 		spin_unlock(&zram->wb_limit_lock);
 
+		while (!req) {
+			req = select_idle_req(wb_ctl);
+			if (req)
+				break;
+
+			blk_finish_plug(&wb_ctl->plug);
+			err = zram_wb_wait_for_completion(zram, wb_ctl);
+			blk_start_plug(&wb_ctl->plug);
+			/*
+			 * BIO errors are not fatal, we continue and simply
+			 * attempt to writeback the remaining objects (pages).
+			 * At the same time we need to signal user-space that
+			 * some writes (at least one, but also could be all of
+			 * them) were not successful and we do so by returning
+			 * the most recent BIO error.
+			 */
+			if (err)
+				ret = err;
+		}
+
 		if (!blk_idx) {
 			blk_idx = alloc_block_bdev(zram);
 			if (!blk_idx) {
@@ -765,7 +991,6 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
 			}
 		}
 
-		index = pps->index;
 		zram_slot_lock(zram, index);
 		/*
 		 * scan_slots() sets ZRAM_PP_SLOT and relases slot lock, so
@@ -775,67 +1000,47 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
 		 */
 		if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
 			goto next;
-		if (zram_read_from_zspool(zram, page, index))
+		if (zram_read_from_zspool(zram, req->page, index))
 			goto next;
 		zram_slot_unlock(zram, index);
 
-		bio_init(&bio, zram->bdev, &bio_vec, 1,
-			 REQ_OP_WRITE | REQ_SYNC);
-		bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
-		__bio_add_page(&bio, page, PAGE_SIZE, 0);
-
 		/*
-		 * XXX: A single page IO would be inefficient for write
-		 * but it would be not bad as starter.
+		 * From now on pp-slot is owned by the req, remove it from
+		 * its pps bucket.
 		 */
-		err = submit_bio_wait(&bio);
-		if (err) {
-			release_pp_slot(zram, pps);
-			/*
-			 * BIO errors are not fatal, we continue and simply
-			 * attempt to writeback the remaining objects (pages).
-			 * At the same time we need to signal user-space that
-			 * some writes (at least one, but also could be all of
-			 * them) were not successful and we do so by returning
-			 * the most recent BIO error.
-			 */
-			ret = err;
-			continue;
-		}
+		list_del_init(&pps->entry);
 
-		atomic64_inc(&zram->stats.bd_writes);
-		zram_slot_lock(zram, index);
-		/*
-		 * Same as above, we release slot lock during writeback so
-		 * slot can change under us: slot_free() or slot_free() and
-		 * reallocation (zram_write_page()). In both cases slot loses
-		 * ZRAM_PP_SLOT flag. No concurrent post-processing can set
-		 * ZRAM_PP_SLOT on such slots until current post-processing
-		 * finishes.
-		 */
-		if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
-			goto next;
+		req->blk_idx = blk_idx;
+		req->pps = pps;
+		bio_init(&req->bio, zram->bdev, &req->bio_vec, 1,
+			 REQ_OP_WRITE | REQ_SYNC);
+		req->bio.bi_iter.bi_sector = req->blk_idx * (PAGE_SIZE >> 9);
+		req->bio.bi_end_io = zram_writeback_endio;
+		req->bio.bi_private = wb_ctl;
+		__bio_add_page(&req->bio, req->page, PAGE_SIZE, 0);
 
-		zram_free_page(zram, index);
-		zram_set_flag(zram, index, ZRAM_WB);
-		zram_set_handle(zram, index, blk_idx);
+		zram_submit_wb_request(zram, wb_ctl, req);
 		blk_idx = 0;
-		atomic64_inc(&zram->stats.pages_stored);
-		spin_lock(&zram->wb_limit_lock);
-		if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
-			zram->bd_wb_limit -=  1UL << (PAGE_SHIFT - 12);
-		spin_unlock(&zram->wb_limit_lock);
+		req = NULL;
+		continue;
+
 next:
 		zram_slot_unlock(zram, index);
 		release_pp_slot(zram, pps);
-
 		cond_resched();
 	}
 
-	if (blk_idx)
-		free_block_bdev(zram, blk_idx);
-	if (page)
-		__free_page(page);
+	/*
+	 * Selected idle req, but never submitted it due to some error or
+	 * wb limit.
+	 */
+	if (req)
+		release_wb_req(req);
+
+	blk_finish_plug(&wb_ctl->plug);
+	err = zram_wb_wait_for_completion(zram, wb_ctl);
+	if (err)
+		ret = err;
 
 	return ret;
 }
@@ -948,7 +1153,8 @@ static ssize_t writeback_store(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 	u64 nr_pages = zram->disksize >> PAGE_SHIFT;
 	unsigned long lo = 0, hi = nr_pages;
-	struct zram_pp_ctl *ctl = NULL;
+	struct zram_pp_ctl *pp_ctl = NULL;
+	struct zram_wb_ctl *wb_ctl = NULL;
 	char *args, *param, *val;
 	ssize_t ret = len;
 	int err, mode = 0;
@@ -970,8 +1176,14 @@ static ssize_t writeback_store(struct device *dev,
 		goto release_init_lock;
 	}
 
-	ctl = init_pp_ctl();
-	if (!ctl) {
+	pp_ctl = init_pp_ctl();
+	if (!pp_ctl) {
+		ret = -ENOMEM;
+		goto release_init_lock;
+	}
+
+	wb_ctl = init_wb_ctl();
+	if (!wb_ctl) {
 		ret = -ENOMEM;
 		goto release_init_lock;
 	}
@@ -1000,7 +1212,7 @@ static ssize_t writeback_store(struct device *dev,
 				goto release_init_lock;
 			}
 
-			scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
 			break;
 		}
 
@@ -1011,7 +1223,7 @@ static ssize_t writeback_store(struct device *dev,
 				goto release_init_lock;
 			}
 
-			scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
 			break;
 		}
 
@@ -1022,7 +1234,7 @@ static ssize_t writeback_store(struct device *dev,
 				goto release_init_lock;
 			}
 
-			scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
 			continue;
 		}
 
@@ -1033,17 +1245,18 @@ static ssize_t writeback_store(struct device *dev,
 				goto release_init_lock;
 			}
 
-			scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+			scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
 			continue;
 		}
 	}
 
-	err = zram_writeback_slots(zram, ctl);
+	err = zram_writeback_slots(zram, pp_ctl, wb_ctl);
 	if (err)
 		ret = err;
 
 release_init_lock:
-	release_pp_ctl(zram, ctl);
+	release_pp_ctl(zram, pp_ctl);
+	release_wb_ctl(wb_ctl);
 	atomic_set(&zram->pp_in_progress, 0);
 	up_read(&zram->init_lock);
 
-- 
2.51.2.1041.gc1ab5b90ca-goog



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv2 2/4] zram: add writeback batch size device attr
  2025-11-13  8:53 [PATCHv2 0/4] zram: introduce writeback bio batching Sergey Senozhatsky
  2025-11-13  8:53 ` [PATCHv2 1/4] zram: introduce writeback bio batching support Sergey Senozhatsky
@ 2025-11-13  8:54 ` Sergey Senozhatsky
  2025-11-13  8:54 ` [PATCHv2 3/4] zram: take write lock in wb limit store handlers Sergey Senozhatsky
  2025-11-13  8:54 ` [PATCHv2 4/4] zram: drop wb_limit_lock Sergey Senozhatsky
  3 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-13  8:54 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
  Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

Introduce writeback_batch_size device attribute so that
the maximum number of in-flight writeback bio requests
can be configured at run-time per-device.  This essentially
enables batched bio writeback.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 48 ++++++++++++++++++++++++++++++-----
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a0a939fd9d31..238b997f6891 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -570,6 +570,42 @@ static ssize_t writeback_limit_show(struct device *dev,
 	return sysfs_emit(buf, "%llu\n", val);
 }
 
+static ssize_t writeback_batch_size_store(struct device *dev,
+					  struct device_attribute *attr,
+					  const char *buf, size_t len)
+{
+	struct zram *zram = dev_to_zram(dev);
+	u32 val;
+	ssize_t ret = -EINVAL;
+
+	if (kstrtouint(buf, 10, &val))
+		return ret;
+
+	if (!val)
+		val = 1;
+
+	down_read(&zram->init_lock);
+	zram->wb_batch_size = val;
+	up_read(&zram->init_lock);
+	ret = len;
+
+	return ret;
+}
+
+static ssize_t writeback_batch_size_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	u32 val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->wb_batch_size;
+	up_read(&zram->init_lock);
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
 static void reset_bdev(struct zram *zram)
 {
 	if (!zram->backing_dev)
@@ -776,10 +812,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
 	kfree(wb_ctl);
 }
 
-/* XXX: should be a per-device sysfs attr */
-#define ZRAM_WB_REQ_CNT 1
-
-static struct zram_wb_ctl *init_wb_ctl(void)
+static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
 {
 	struct zram_wb_ctl *wb_ctl;
 	int i;
@@ -793,7 +826,7 @@ static struct zram_wb_ctl *init_wb_ctl(void)
 	atomic_set(&wb_ctl->num_inflight, 0);
 	init_completion(&wb_ctl->done);
 
-	for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
+	for (i = 0; i < zram->wb_batch_size; i++) {
 		struct zram_wb_req *req;
 
 		/*
@@ -1182,7 +1215,7 @@ static ssize_t writeback_store(struct device *dev,
 		goto release_init_lock;
 	}
 
-	wb_ctl = init_wb_ctl();
+	wb_ctl = init_wb_ctl(zram);
 	if (!wb_ctl) {
 		ret = -ENOMEM;
 		goto release_init_lock;
@@ -2823,6 +2856,7 @@ static DEVICE_ATTR_RW(backing_dev);
 static DEVICE_ATTR_WO(writeback);
 static DEVICE_ATTR_RW(writeback_limit);
 static DEVICE_ATTR_RW(writeback_limit_enable);
+static DEVICE_ATTR_RW(writeback_batch_size);
 #endif
 #ifdef CONFIG_ZRAM_MULTI_COMP
 static DEVICE_ATTR_RW(recomp_algorithm);
@@ -2844,6 +2878,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_writeback.attr,
 	&dev_attr_writeback_limit.attr,
 	&dev_attr_writeback_limit_enable.attr,
+	&dev_attr_writeback_batch_size.attr,
 #endif
 	&dev_attr_io_stat.attr,
 	&dev_attr_mm_stat.attr,
@@ -2905,6 +2940,7 @@ static int zram_add(void)
 
 	init_rwsem(&zram->init_lock);
 #ifdef CONFIG_ZRAM_WRITEBACK
+	zram->wb_batch_size = 1;
 	spin_lock_init(&zram->wb_limit_lock);
 #endif
 
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 6cee93f9c0d0..1a647f42c1a4 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -129,6 +129,7 @@ struct zram {
 	struct file *backing_dev;
 	spinlock_t wb_limit_lock;
 	bool wb_limit_enable;
+	u32 wb_batch_size;
 	u64 bd_wb_limit;
 	struct block_device *bdev;
 	unsigned long *bitmap;
-- 
2.51.2.1041.gc1ab5b90ca-goog



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv2 3/4] zram: take write lock in wb limit store handlers
  2025-11-13  8:53 [PATCHv2 0/4] zram: introduce writeback bio batching Sergey Senozhatsky
  2025-11-13  8:53 ` [PATCHv2 1/4] zram: introduce writeback bio batching support Sergey Senozhatsky
  2025-11-13  8:54 ` [PATCHv2 2/4] zram: add writeback batch size device attr Sergey Senozhatsky
@ 2025-11-13  8:54 ` Sergey Senozhatsky
  2025-11-13  8:54 ` [PATCHv2 4/4] zram: drop wb_limit_lock Sergey Senozhatsky
  3 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-13  8:54 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
  Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

Write device attrs handlers should take write zram init_lock.
While at it, fixup coding styles.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 238b997f6891..6312b0437618 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -501,7 +501,8 @@ static ssize_t idle_store(struct device *dev,
 
 #ifdef CONFIG_ZRAM_WRITEBACK
 static ssize_t writeback_limit_enable_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+					    struct device_attribute *attr,
+					    const char *buf, size_t len)
 {
 	struct zram *zram = dev_to_zram(dev);
 	u64 val;
@@ -510,18 +511,19 @@ static ssize_t writeback_limit_enable_store(struct device *dev,
 	if (kstrtoull(buf, 10, &val))
 		return ret;
 
-	down_read(&zram->init_lock);
+	down_write(&zram->init_lock);
 	spin_lock(&zram->wb_limit_lock);
 	zram->wb_limit_enable = val;
 	spin_unlock(&zram->wb_limit_lock);
-	up_read(&zram->init_lock);
+	up_write(&zram->init_lock);
 	ret = len;
 
 	return ret;
 }
 
 static ssize_t writeback_limit_enable_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+					   struct device_attribute *attr,
+					   char *buf)
 {
 	bool val;
 	struct zram *zram = dev_to_zram(dev);
@@ -536,7 +538,8 @@ static ssize_t writeback_limit_enable_show(struct device *dev,
 }
 
 static ssize_t writeback_limit_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+				     struct device_attribute *attr,
+				     const char *buf, size_t len)
 {
 	struct zram *zram = dev_to_zram(dev);
 	u64 val;
@@ -545,11 +548,11 @@ static ssize_t writeback_limit_store(struct device *dev,
 	if (kstrtoull(buf, 10, &val))
 		return ret;
 
-	down_read(&zram->init_lock);
+	down_write(&zram->init_lock);
 	spin_lock(&zram->wb_limit_lock);
 	zram->bd_wb_limit = val;
 	spin_unlock(&zram->wb_limit_lock);
-	up_read(&zram->init_lock);
+	up_write(&zram->init_lock);
 	ret = len;
 
 	return ret;
-- 
2.51.2.1041.gc1ab5b90ca-goog



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv2 4/4] zram: drop wb_limit_lock
  2025-11-13  8:53 [PATCHv2 0/4] zram: introduce writeback bio batching Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2025-11-13  8:54 ` [PATCHv2 3/4] zram: take write lock in wb limit store handlers Sergey Senozhatsky
@ 2025-11-13  8:54 ` Sergey Senozhatsky
  3 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-13  8:54 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
  Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

We don't need wb_limit_lock.  Writeback limit setters take an
exclusive write zram init_lock, while wb_limit modifications
happen only from a single task and under zram read init_lock.
No concurrent wb_limit modifications are possible (we permit
only one post-processing task at a time).  Add lockdep
assertions to wb_limit mutators.

While at it, fixup coding styles.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 22 +++++-----------------
 drivers/block/zram/zram_drv.h |  1 -
 2 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 6312b0437618..28afb010307d 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -512,9 +512,7 @@ static ssize_t writeback_limit_enable_store(struct device *dev,
 		return ret;
 
 	down_write(&zram->init_lock);
-	spin_lock(&zram->wb_limit_lock);
 	zram->wb_limit_enable = val;
-	spin_unlock(&zram->wb_limit_lock);
 	up_write(&zram->init_lock);
 	ret = len;
 
@@ -529,9 +527,7 @@ static ssize_t writeback_limit_enable_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	down_read(&zram->init_lock);
-	spin_lock(&zram->wb_limit_lock);
 	val = zram->wb_limit_enable;
-	spin_unlock(&zram->wb_limit_lock);
 	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%d\n", val);
@@ -549,9 +545,7 @@ static ssize_t writeback_limit_store(struct device *dev,
 		return ret;
 
 	down_write(&zram->init_lock);
-	spin_lock(&zram->wb_limit_lock);
 	zram->bd_wb_limit = val;
-	spin_unlock(&zram->wb_limit_lock);
 	up_write(&zram->init_lock);
 	ret = len;
 
@@ -559,15 +553,13 @@ static ssize_t writeback_limit_store(struct device *dev,
 }
 
 static ssize_t writeback_limit_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+				    struct device_attribute *attr, char *buf)
 {
 	u64 val;
 	struct zram *zram = dev_to_zram(dev);
 
 	down_read(&zram->init_lock);
-	spin_lock(&zram->wb_limit_lock);
 	val = zram->bd_wb_limit;
-	spin_unlock(&zram->wb_limit_lock);
 	up_read(&zram->init_lock);
 
 	return sysfs_emit(buf, "%llu\n", val);
@@ -866,18 +858,18 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
 
 static void zram_account_writeback_rollback(struct zram *zram)
 {
-	spin_lock(&zram->wb_limit_lock);
+	lockdep_assert_held_read(&zram->init_lock);
+
 	if (zram->wb_limit_enable)
 		zram->bd_wb_limit +=  1UL << (PAGE_SHIFT - 12);
-	spin_unlock(&zram->wb_limit_lock);
 }
 
 static void zram_account_writeback_submit(struct zram *zram)
 {
-	spin_lock(&zram->wb_limit_lock);
+	lockdep_assert_held_read(&zram->init_lock);
+
 	if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
 		zram->bd_wb_limit -=  1UL << (PAGE_SHIFT - 12);
-	spin_unlock(&zram->wb_limit_lock);
 }
 
 static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
@@ -991,13 +983,10 @@ static int zram_writeback_slots(struct zram *zram,
 
 	blk_start_plug(&wb_ctl->plug);
 	while ((pps = select_pp_slot(ctl))) {
-		spin_lock(&zram->wb_limit_lock);
 		if (zram->wb_limit_enable && !zram->bd_wb_limit) {
-			spin_unlock(&zram->wb_limit_lock);
 			ret = -EIO;
 			break;
 		}
-		spin_unlock(&zram->wb_limit_lock);
 
 		while (!req) {
 			req = select_idle_req(wb_ctl);
@@ -2944,7 +2933,6 @@ static int zram_add(void)
 	init_rwsem(&zram->init_lock);
 #ifdef CONFIG_ZRAM_WRITEBACK
 	zram->wb_batch_size = 1;
-	spin_lock_init(&zram->wb_limit_lock);
 #endif
 
 	/* gendisk structure */
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 1a647f42c1a4..c6d94501376c 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -127,7 +127,6 @@ struct zram {
 	bool claim; /* Protected by disk->open_mutex */
 #ifdef CONFIG_ZRAM_WRITEBACK
 	struct file *backing_dev;
-	spinlock_t wb_limit_lock;
 	bool wb_limit_enable;
 	u32 wb_batch_size;
 	u64 bd_wb_limit;
-- 
2.51.2.1041.gc1ab5b90ca-goog



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-13  8:53 ` [PATCHv2 1/4] zram: introduce writeback bio batching support Sergey Senozhatsky
@ 2025-11-13 23:45   ` Minchan Kim
  2025-11-14  1:53     ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2025-11-13 23:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Yuwen Chen, Richard Chang, Brian Geffon,
	Fengyu Lian, linux-kernel, linux-mm, linux-block

On Thu, Nov 13, 2025 at 05:53:59PM +0900, Sergey Senozhatsky wrote:
> From: Yuwen Chen <ywen.chen@foxmail.com>
> 
> Currently, zram writeback supports only a single bio writeback
> operation, waiting for bio completion before post-processing
> next pp-slot.  This works, in general, but has certain throughput
> limitations.  Implement batched (multiple) bio writeback support
> to take advantage of parallel requests processing and better
> requests scheduling.
> 
> For the time being the writeback batch size (maximum number of
> in-flight bio requests) is set to 1, so the behaviors is the
> same as the previous single-bio writeback.  This is addressed
> in a follow up patch, which adds a writeback_batch_size device
> attribute.
> 
> Please refer to [1] and [2] for benchmarks.
> 
> [1] https://lore.kernel.org/linux-block/tencent_B2DC37E3A2AED0E7F179365FCB5D82455B08@qq.com
> [2] https://lore.kernel.org/linux-block/tencent_0FBBFC8AE0B97BC63B5D47CE1FF2BABFDA09@qq.com
> 
> [senozhatsky: significantly reworked the initial patch so that the
> approach and implementation resemble current zram post-processing
> code]

This version is much clear than previous series.
Most below are nits.

> 
> Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> Co-developed-by: Richard Chang <richardycc@google.com>
> Suggested-by: Minchan Kim <minchan@google.com>
> ---
>  drivers/block/zram/zram_drv.c | 343 +++++++++++++++++++++++++++-------
>  1 file changed, 278 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a43074657531..a0a939fd9d31 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -734,20 +734,226 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
>  	submit_bio(bio);
>  }
>  
> -static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
> -{
> -	unsigned long blk_idx = 0;
> -	struct page *page = NULL;
> +struct zram_wb_ctl {
> +	struct list_head idle_reqs;
> +	struct list_head inflight_reqs;
> +
> +	atomic_t num_inflight;
> +	struct completion done;
> +	struct blk_plug plug;
> +};
> +
> +struct zram_wb_req {
> +	unsigned long blk_idx;
> +	struct page *page;
>  	struct zram_pp_slot *pps;
>  	struct bio_vec bio_vec;
>  	struct bio bio;
> -	int ret = 0, err;
> +
> +	struct list_head entry;
> +};

How about moving structure definition to the upper part of the C file?
Not only readability to put together data types but also better diff
for reviewer to know what we changed in this patch.

> +
> +static void release_wb_req(struct zram_wb_req *req)
> +{
> +	__free_page(req->page);
> +	kfree(req);
> +}
> +
> +static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
> +{
> +	/* We should never have inflight requests at this point */
> +	WARN_ON(!list_empty(&wb_ctl->inflight_reqs));
> +
> +	while (!list_empty(&wb_ctl->idle_reqs)) {
> +		struct zram_wb_req *req;
> +
> +		req = list_first_entry(&wb_ctl->idle_reqs,
> +				       struct zram_wb_req, entry);
> +		list_del(&req->entry);
> +		release_wb_req(req);
> +	}
> +
> +	kfree(wb_ctl);
> +}
> +
> +/* XXX: should be a per-device sysfs attr */
> +#define ZRAM_WB_REQ_CNT 1

Understand you will create the knob for the tune but at least,
let's introduce default number for that here.

How about 32 since it's general queue depth for modern storage?

> +
> +static struct zram_wb_ctl *init_wb_ctl(void)
> +{
> +	struct zram_wb_ctl *wb_ctl;
> +	int i;
> +
> +	wb_ctl = kmalloc(sizeof(*wb_ctl), GFP_KERNEL);
> +	if (!wb_ctl)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&wb_ctl->idle_reqs);
> +	INIT_LIST_HEAD(&wb_ctl->inflight_reqs);
> +	atomic_set(&wb_ctl->num_inflight, 0);
> +	init_completion(&wb_ctl->done);
> +
> +	for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
> +		struct zram_wb_req *req;
> +
> +		/*
> +		 * This is fatal condition only if we couldn't allocate
> +		 * any requests at all.  Otherwise we just work with the
> +		 * requests that we have successfully allocated, so that
> +		 * writeback can still proceed, even if there is only one
> +		 * request on the idle list.
> +		 */
> +		req = kzalloc(sizeof(*req), GFP_NOIO | __GFP_NOWARN);

Why GFP_NOIO?

> +		if (!req)
> +			break;
> +
> +		req->page = alloc_page(GFP_NOIO | __GFP_NOWARN);

Ditto

> +		if (!req->page) {
> +			kfree(req);
> +			break;
> +		}
> +
> +		INIT_LIST_HEAD(&req->entry);

Do we need this reset?

> +		list_add(&req->entry, &wb_ctl->idle_reqs);
> +	}
> +
> +	/* We couldn't allocate any requests, so writeabck is not possible */
> +	if (list_empty(&wb_ctl->idle_reqs))
> +		goto release_wb_ctl;
> +
> +	return wb_ctl;
> +
> +release_wb_ctl:
> +	release_wb_ctl(wb_ctl);
> +	return NULL;
> +}
> +
> +static void zram_account_writeback_rollback(struct zram *zram)
> +{
> +	spin_lock(&zram->wb_limit_lock);
> +	if (zram->wb_limit_enable)
> +		zram->bd_wb_limit +=  1UL << (PAGE_SHIFT - 12);
> +	spin_unlock(&zram->wb_limit_lock);
> +}
> +
> +static void zram_account_writeback_submit(struct zram *zram)
> +{
> +	spin_lock(&zram->wb_limit_lock);
> +	if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
> +		zram->bd_wb_limit -=  1UL << (PAGE_SHIFT - 12);
> +	spin_unlock(&zram->wb_limit_lock);
> +}

I didn't think about much about this that we really need to be
accurate like this. Maybe, next time after coffee.

> +
> +static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
> +{
>  	u32 index;
> +	int err;
>  
> -	page = alloc_page(GFP_KERNEL);
> -	if (!page)
> -		return -ENOMEM;
> +	index = req->pps->index;
> +	release_pp_slot(zram, req->pps);
> +	req->pps = NULL;
> +
> +	err = blk_status_to_errno(req->bio.bi_status);
> +	if (err) {
> +		/*
> +		 * Failed wb requests should not be accounted in wb_limit
> +		 * (if enabled).
> +		 */
> +		zram_account_writeback_rollback(zram);
> +		return err;
> +	}
>  
> +	atomic64_inc(&zram->stats.bd_writes);
> +	zram_slot_lock(zram, index);
> +	/*
> +	 * We release slot lock during writeback so slot can change under us:
> +	 * slot_free() or slot_free() and zram_write_page(). In both cases
> +	 * slot loses ZRAM_PP_SLOT flag. No concurrent post-processing can
> +	 * set ZRAM_PP_SLOT on such slots until current post-processing
> +	 * finishes.
> +	 */
> +	if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
> +		goto out;
> +
> +	zram_free_page(zram, index);
> +	zram_set_flag(zram, index, ZRAM_WB);
> +	zram_set_handle(zram, index, req->blk_idx);
> +	atomic64_inc(&zram->stats.pages_stored);
> +
> +out:
> +	zram_slot_unlock(zram, index);
> +	return 0;
> +}
> +
> +static void zram_writeback_endio(struct bio *bio)
> +{
> +	struct zram_wb_ctl *wb_ctl = bio->bi_private;
> +
> +	if (atomic_dec_return(&wb_ctl->num_inflight) == 0)
> +		complete(&wb_ctl->done);
> +}
> +
> +static void zram_submit_wb_request(struct zram *zram,
> +				   struct zram_wb_ctl *wb_ctl,
> +				   struct zram_wb_req *req)
> +{
> +	/*
> +	 * wb_limit (if enabled) should be adjusted before submission,
> +	 * so that we don't over-submit.
> +	 */
> +	zram_account_writeback_submit(zram);
> +	atomic_inc(&wb_ctl->num_inflight);
> +	list_add_tail(&req->entry, &wb_ctl->inflight_reqs);
> +	submit_bio(&req->bio);
> +}
> +
> +static struct zram_wb_req *select_idle_req(struct zram_wb_ctl *wb_ctl)
> +{
> +	struct zram_wb_req *req;
> +
> +	req = list_first_entry_or_null(&wb_ctl->idle_reqs,
> +				       struct zram_wb_req, entry);
> +	if (req)
> +		list_del(&req->entry);
> +	return req;
> +}
> +
> +static int zram_wb_wait_for_completion(struct zram *zram,
> +				       struct zram_wb_ctl *wb_ctl)
> +{
> +	int ret = 0;
> +
> +	if (atomic_read(&wb_ctl->num_inflight))
> +		wait_for_completion_io(&wb_ctl->done);
> +
> +	reinit_completion(&wb_ctl->done);
> +	while (!list_empty(&wb_ctl->inflight_reqs)) {
> +		struct zram_wb_req *req;
> +		int err;
> +
> +		req = list_first_entry(&wb_ctl->inflight_reqs,
> +				       struct zram_wb_req, entry);
> +		list_move(&req->entry, &wb_ctl->idle_reqs);
> +
> +		err = zram_writeback_complete(zram, req);
> +		if (err)
> +			ret = err;
> +	}
> +
> +	return ret;
> +}
> +
> +static int zram_writeback_slots(struct zram *zram,
> +				struct zram_pp_ctl *ctl,
> +				struct zram_wb_ctl *wb_ctl)
> +{
> +	struct zram_wb_req *req = NULL;
> +	unsigned long blk_idx = 0;
> +	struct zram_pp_slot *pps;
> +	int ret = 0, err;
> +	u32 index = 0;
> +
> +	blk_start_plug(&wb_ctl->plug);

Why is the plug part of wb_ctl?

The scope of plug is in this function and the purpose is for
this writeback batch in this function so the plug can be local
variable in this function.

>  	while ((pps = select_pp_slot(ctl))) {
>  		spin_lock(&zram->wb_limit_lock);
>  		if (zram->wb_limit_enable && !zram->bd_wb_limit) {
> @@ -757,6 +963,26 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
>  		}
>  		spin_unlock(&zram->wb_limit_lock);
>  
> +		while (!req) {
> +			req = select_idle_req(wb_ctl);
> +			if (req)
> +				break;
> +
> +			blk_finish_plug(&wb_ctl->plug);
> +			err = zram_wb_wait_for_completion(zram, wb_ctl);
> +			blk_start_plug(&wb_ctl->plug);
> +			/*
> +			 * BIO errors are not fatal, we continue and simply
> +			 * attempt to writeback the remaining objects (pages).
> +			 * At the same time we need to signal user-space that
> +			 * some writes (at least one, but also could be all of
> +			 * them) were not successful and we do so by returning
> +			 * the most recent BIO error.
> +			 */
> +			if (err)
> +				ret = err;
> +		}
> +
>  		if (!blk_idx) {
>  			blk_idx = alloc_block_bdev(zram);
>  			if (!blk_idx) {
> @@ -765,7 +991,6 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
>  			}
>  		}
>  
> -		index = pps->index;
>  		zram_slot_lock(zram, index);
>  		/*
>  		 * scan_slots() sets ZRAM_PP_SLOT and relases slot lock, so
> @@ -775,67 +1000,47 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
>  		 */
>  		if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
>  			goto next;
> -		if (zram_read_from_zspool(zram, page, index))
> +		if (zram_read_from_zspool(zram, req->page, index))
>  			goto next;
>  		zram_slot_unlock(zram, index);
>  
> -		bio_init(&bio, zram->bdev, &bio_vec, 1,
> -			 REQ_OP_WRITE | REQ_SYNC);
> -		bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
> -		__bio_add_page(&bio, page, PAGE_SIZE, 0);
> -
>  		/*
> -		 * XXX: A single page IO would be inefficient for write
> -		 * but it would be not bad as starter.
> +		 * From now on pp-slot is owned by the req, remove it from
> +		 * its pps bucket.
>  		 */
> -		err = submit_bio_wait(&bio);

Yay, finally we remove this submit_bio_wait.

> -		if (err) {
> -			release_pp_slot(zram, pps);
> -			/*
> -			 * BIO errors are not fatal, we continue and simply
> -			 * attempt to writeback the remaining objects (pages).
> -			 * At the same time we need to signal user-space that
> -			 * some writes (at least one, but also could be all of
> -			 * them) were not successful and we do so by returning
> -			 * the most recent BIO error.
> -			 */
> -			ret = err;
> -			continue;
> -		}
> +		list_del_init(&pps->entry);
>  
> -		atomic64_inc(&zram->stats.bd_writes);
> -		zram_slot_lock(zram, index);
> -		/*
> -		 * Same as above, we release slot lock during writeback so
> -		 * slot can change under us: slot_free() or slot_free() and
> -		 * reallocation (zram_write_page()). In both cases slot loses
> -		 * ZRAM_PP_SLOT flag. No concurrent post-processing can set
> -		 * ZRAM_PP_SLOT on such slots until current post-processing
> -		 * finishes.
> -		 */
> -		if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
> -			goto next;
> +		req->blk_idx = blk_idx;
> +		req->pps = pps;
> +		bio_init(&req->bio, zram->bdev, &req->bio_vec, 1,
> +			 REQ_OP_WRITE | REQ_SYNC);

Can't we drop the REQ_SYNC now?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-13 23:45   ` Minchan Kim
@ 2025-11-14  1:53     ` Sergey Senozhatsky
  2025-11-14  3:08       ` Sergey Senozhatsky
  2025-11-14 19:14       ` Minchan Kim
  0 siblings, 2 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-14  1:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Yuwen Chen, Richard Chang,
	Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block

On (25/11/13 15:45), Minchan Kim wrote:
[..]
> > +struct zram_wb_req {
> > +	unsigned long blk_idx;
> > +	struct page *page;
> >  	struct zram_pp_slot *pps;
> >  	struct bio_vec bio_vec;
> >  	struct bio bio;
> > -	int ret = 0, err;
> > +
> > +	struct list_head entry;
> > +};
> 
> How about moving structure definition to the upper part of the C file?
> Not only readability to put together data types but also better diff
> for reviewer to know what we changed in this patch.

This still needs to be under #ifdef CONFIG_ZRAM_WRITEBACK so readability
is not significantly better.  Do you still prefer moving it up?

[..]

> > +/* XXX: should be a per-device sysfs attr */
> > +#define ZRAM_WB_REQ_CNT 1
> 
> Understand you will create the knob for the tune but at least,
> let's introduce default number for that here.
> 
> How about 32 since it's general queue depth for modern storage?

So this is tricky.  I don't know what number is a good default for
all, given the variety of devices out there, variety of specs and
hardware, on both sides of price range.  I don't know if 32 is safe
wrt to performance/throughput (I may be wrong and 32 is safe for
everyone).  On the other hand, 1 was our baseline for ages, so I
wanted to minimize the risks and just keep the baseline behavior.

Do you still prefer 32 as default?  (here and in the next patch)

[..]
> > +	for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
> > +		struct zram_wb_req *req;
> > +
> > +		/*
> > +		 * This is fatal condition only if we couldn't allocate
> > +		 * any requests at all.  Otherwise we just work with the
> > +		 * requests that we have successfully allocated, so that
> > +		 * writeback can still proceed, even if there is only one
> > +		 * request on the idle list.
> > +		 */
> > +		req = kzalloc(sizeof(*req), GFP_NOIO | __GFP_NOWARN);
> 
> Why GFP_NOIO?
> 
> > +		if (!req)
> > +			break;
> > +
> > +		req->page = alloc_page(GFP_NOIO | __GFP_NOWARN);
> 
> Ditto

So we do this for post-processing, which allocates a bunch of memory
for post-processing (not only requests lists with physical pages, but
also candidate slots buckets).  The thing is that post-processing can
be called under memory pressure and we don't really want to block and
reclaim memory from the path that is called to relive memory pressure
(by doing writeback or recompression).

> > +		if (!req->page) {
> > +			kfree(req);
> > +			break;
> > +		}
> > +
> > +		INIT_LIST_HEAD(&req->entry);
> 
> Do we need this reset?

Let me take a look.

> > +static void zram_account_writeback_rollback(struct zram *zram)
> > +{
> > +	spin_lock(&zram->wb_limit_lock);
> > +	if (zram->wb_limit_enable)
> > +		zram->bd_wb_limit +=  1UL << (PAGE_SHIFT - 12);
> > +	spin_unlock(&zram->wb_limit_lock);
> > +}
> > +
> > +static void zram_account_writeback_submit(struct zram *zram)
> > +{
> > +	spin_lock(&zram->wb_limit_lock);
> > +	if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
> > +		zram->bd_wb_limit -=  1UL << (PAGE_SHIFT - 12);
> > +	spin_unlock(&zram->wb_limit_lock);
> > +}
> 
> I didn't think about much about this that we really need to be
> accurate like this. Maybe, next time after coffee.

Sorry, not sure I understand this comment.

[..]
> > +static int zram_writeback_slots(struct zram *zram,
> > +				struct zram_pp_ctl *ctl,
> > +				struct zram_wb_ctl *wb_ctl)
> > +{
> > +	struct zram_wb_req *req = NULL;
> > +	unsigned long blk_idx = 0;
> > +	struct zram_pp_slot *pps;
> > +	int ret = 0, err;
> > +	u32 index = 0;
> > +
> > +	blk_start_plug(&wb_ctl->plug);
> 
> Why is the plug part of wb_ctl?
> 
> The scope of plug is in this function and the purpose is for
> this writeback batch in this function so the plug can be local
> variable in this function.

ACK, that's a leftover from when I manipulated plugs outside of this
function.  Now it's completely local.

[..]
> > +		bio_init(&req->bio, zram->bdev, &req->bio_vec, 1,
> > +			 REQ_OP_WRITE | REQ_SYNC);
> 
> Can't we drop the REQ_SYNC now?

Good catch, I suppose we can.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-14  1:53     ` Sergey Senozhatsky
@ 2025-11-14  3:08       ` Sergey Senozhatsky
  2025-11-14 19:14       ` Minchan Kim
  1 sibling, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-14  3:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Yuwen Chen, Richard Chang, Brian Geffon,
	Fengyu Lian, linux-kernel, linux-mm, linux-block,
	Sergey Senozhatsky

On (25/11/14 10:53), Sergey Senozhatsky wrote:
> [..]
> > > +struct zram_wb_req {
> > > +	unsigned long blk_idx;
> > > +	struct page *page;
> > >  	struct zram_pp_slot *pps;
> > >  	struct bio_vec bio_vec;
> > >  	struct bio bio;
> > > -	int ret = 0, err;
> > > +
> > > +	struct list_head entry;
> > > +};
> > 
> > How about moving structure definition to the upper part of the C file?
> > Not only readability to put together data types but also better diff
> > for reviewer to know what we changed in this patch.
> 
> This still needs to be under #ifdef CONFIG_ZRAM_WRITEBACK so readability
> is not significantly better.  Do you still prefer moving it up?

My intention was to keep structs definitions together with the static
functions that use them (which are under big #ifdef CONFIG_ZRAM_WRITEBACK
block).  So that CONFIG_ZRAM_WRITEBACK parts stay in one place and are not
scattered across the file.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-14  1:53     ` Sergey Senozhatsky
  2025-11-14  3:08       ` Sergey Senozhatsky
@ 2025-11-14 19:14       ` Minchan Kim
  2025-11-15  2:25         ` Sergey Senozhatsky
  2025-11-15  3:42         ` Sergey Senozhatsky
  1 sibling, 2 replies; 11+ messages in thread
From: Minchan Kim @ 2025-11-14 19:14 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Yuwen Chen, Richard Chang, Brian Geffon,
	Fengyu Lian, linux-kernel, linux-mm, linux-block

On Fri, Nov 14, 2025 at 10:53:23AM +0900, Sergey Senozhatsky wrote:
> On (25/11/13 15:45), Minchan Kim wrote:
> [..]
> > > +struct zram_wb_req {
> > > +	unsigned long blk_idx;
> > > +	struct page *page;
> > >  	struct zram_pp_slot *pps;
> > >  	struct bio_vec bio_vec;
> > >  	struct bio bio;
> > > -	int ret = 0, err;
> > > +
> > > +	struct list_head entry;
> > > +};
> > 
> > How about moving structure definition to the upper part of the C file?
> > Not only readability to put together data types but also better diff
> > for reviewer to know what we changed in this patch.
> 
> This still needs to be under #ifdef CONFIG_ZRAM_WRITEBACK so readability
> is not significantly better.  Do you still prefer moving it up?

Let's move them on top of ifdef CONFIG_ZRAM_WRITEBACK, then.
IOW, above of writeback_limit_enable_store.

> 
> [..]
> 
> > > +/* XXX: should be a per-device sysfs attr */
> > > +#define ZRAM_WB_REQ_CNT 1
> > 
> > Understand you will create the knob for the tune but at least,
> > let's introduce default number for that here.
> > 
> > How about 32 since it's general queue depth for modern storage?
> 
> So this is tricky.  I don't know what number is a good default for
> all, given the variety of devices out there, variety of specs and
> hardware, on both sides of price range.  I don't know if 32 is safe
> wrt to performance/throughput (I may be wrong and 32 is safe for
> everyone).  On the other hand, 1 was our baseline for ages, so I
> wanted to minimize the risks and just keep the baseline behavior.
> 
> Do you still prefer 32 as default?  (here and in the next patch)

Yes, we couldn't get the perfect number everyone would be happpy
since we don't know their configuration but the value is the
typical UFS 3.1(even, it's little old sice UFS has higher queue depth)'s
queue depth. More good thing with the 32 is aligned with SWAP_CLUSTER_MAX
which is the unit of batching in the traditional split LRU reclaim.

Assuming we don't encounter any significant regressions, I'd like to
move forward with a queue depth of 32 so that all users can benefit from
this speedup.

> 
> [..]
> > > +	for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
> > > +		struct zram_wb_req *req;
> > > +
> > > +		/*
> > > +		 * This is fatal condition only if we couldn't allocate
> > > +		 * any requests at all.  Otherwise we just work with the
> > > +		 * requests that we have successfully allocated, so that
> > > +		 * writeback can still proceed, even if there is only one
> > > +		 * request on the idle list.
> > > +		 */
> > > +		req = kzalloc(sizeof(*req), GFP_NOIO | __GFP_NOWARN);
> > 
> > Why GFP_NOIO?
> > 
> > > +		if (!req)
> > > +			break;
> > > +
> > > +		req->page = alloc_page(GFP_NOIO | __GFP_NOWARN);
> > 
> > Ditto
> 
> So we do this for post-processing, which allocates a bunch of memory
> for post-processing (not only requests lists with physical pages, but
> also candidate slots buckets).  The thing is that post-processing can
> be called under memory pressure and we don't really want to block and
> reclaim memory from the path that is called to relive memory pressure
> (by doing writeback or recompression).

Sorry, I didn't understand what's the post-processing means.

First, this writeback_store path is not critical path. Typical usecase
is trigger the writeback store on system idle time to save zram memory.

Second, If you used the flag to relieve memory pressure, that's not
the right flag. GFP_NOIO aimed to prevent deadlock with IO context
but the writeback_store is just process context so no reason to use
the GFP_NOIO. (If we really want to releieve memory presure, we
should use __GFP_NORETRY with ~__GFP_RECLAIM but I doubt)


> 
> > > +		if (!req->page) {
> > > +			kfree(req);
> > > +			break;
> > > +		}
> > > +
> > > +		INIT_LIST_HEAD(&req->entry);
> > 
> > Do we need this reset?
> 
> Let me take a look.
> 
> > > +static void zram_account_writeback_rollback(struct zram *zram)
> > > +{
> > > +	spin_lock(&zram->wb_limit_lock);
> > > +	if (zram->wb_limit_enable)
> > > +		zram->bd_wb_limit +=  1UL << (PAGE_SHIFT - 12);
> > > +	spin_unlock(&zram->wb_limit_lock);
> > > +}
> > > +
> > > +static void zram_account_writeback_submit(struct zram *zram)
> > > +{
> > > +	spin_lock(&zram->wb_limit_lock);
> > > +	if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
> > > +		zram->bd_wb_limit -=  1UL << (PAGE_SHIFT - 12);
> > > +	spin_unlock(&zram->wb_limit_lock);
> > > +}
> > 
> > I didn't think about much about this that we really need to be
> > accurate like this. Maybe, next time after coffee.
> 
> Sorry, not sure I understand this comment.

I meant I didn't took close look the part, yet. :)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-14 19:14       ` Minchan Kim
@ 2025-11-15  2:25         ` Sergey Senozhatsky
  2025-11-15  3:42         ` Sergey Senozhatsky
  1 sibling, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-15  2:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Yuwen Chen, Richard Chang,
	Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block

On (25/11/14 11:14), Minchan Kim wrote:
> > > How about moving structure definition to the upper part of the C file?
> > > Not only readability to put together data types but also better diff
> > > for reviewer to know what we changed in this patch.
> > 
> > This still needs to be under #ifdef CONFIG_ZRAM_WRITEBACK so readability
> > is not significantly better.  Do you still prefer moving it up?
> 
> Let's move them on top of ifdef CONFIG_ZRAM_WRITEBACK, then.
> IOW, above of writeback_limit_enable_store.

Done.

> > > How about 32 since it's general queue depth for modern storage?
> > 
> > So this is tricky.  I don't know what number is a good default for
> > all, given the variety of devices out there, variety of specs and
> > hardware, on both sides of price range.  I don't know if 32 is safe
> > wrt to performance/throughput (I may be wrong and 32 is safe for
> > everyone).  On the other hand, 1 was our baseline for ages, so I
> > wanted to minimize the risks and just keep the baseline behavior.
> > 
> > Do you still prefer 32 as default?  (here and in the next patch)
> 
> Yes, we couldn't get the perfect number everyone would be happpy
> since we don't know their configuration but the value is the
> typical UFS 3.1(even, it's little old sice UFS has higher queue depth)'s
> queue depth. More good thing with the 32 is aligned with SWAP_CLUSTER_MAX
> which is the unit of batching in the traditional split LRU reclaim.
> 
> Assuming we don't encounter any significant regressions, I'd like to
> move forward with a queue depth of 32 so that all users can benefit from
> this speedup.

Done.

> > So we do this for post-processing, which allocates a bunch of memory
> > for post-processing (not only requests lists with physical pages, but
> > also candidate slots buckets).  The thing is that post-processing can
> > be called under memory pressure and we don't really want to block and
> > reclaim memory from the path that is called to relive memory pressure
> > (by doing writeback or recompression).
> 
> Sorry, I didn't understand what's the post-processing means.
> 
> First, this writeback_store path is not critical path. Typical usecase
> is trigger the writeback store on system idle time to save zram memory.
> 
> Second, If you used the flag to relieve memory pressure, that's not
> the right flag. GFP_NOIO aimed to prevent deadlock with IO context
> but the writeback_store is just process context so no reason to use
> the GFP_NOIO. (If we really want to releieve memory presure, we
> should use __GFP_NORETRY with ~__GFP_RECLAIM but I doubt)

Done.

I wouldn't necessarily call it "wrong", we do re-enter zram
	
	user-space wb > zram writeback -> reclaim IO -> zram write page

it's not deadlock-ish, for sure, but still looked to me important enough
to avoid, so that writeback would be more robust and make faster forward
progress (by actually saving memory) in various situations, including
possible memory pressure.  Changed in v3.

> > > I didn't think about much about this that we really need to be
> > > accurate like this. Maybe, next time after coffee.
> > 
> > Sorry, not sure I understand this comment.
> 
> I meant I didn't took close look the part, yet. :)

Ah, I see :)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv2 1/4] zram: introduce writeback bio batching support
  2025-11-14 19:14       ` Minchan Kim
  2025-11-15  2:25         ` Sergey Senozhatsky
@ 2025-11-15  3:42         ` Sergey Senozhatsky
  1 sibling, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2025-11-15  3:42 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Yuwen Chen, Richard Chang,
	Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block

On (25/11/14 11:14), Minchan Kim wrote:
[..]
> First, this writeback_store path is not critical path. Typical usecase
> is trigger the writeback store on system idle time to save zram memory.

One thing to note here, is that this is likely the case for embedded devices
(smartphones, laptops, etc), where zram is often used as swap device.
However, on servers (or desktops), zram can be used as a general purpose
block device, and writeback can be executed under very different conditions
there.  Writeback is guaranteed to save memory, while on servers there
might be no "idle time", so I can see scenarios when writeback is executed
under load.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-15  3:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-13  8:53 [PATCHv2 0/4] zram: introduce writeback bio batching Sergey Senozhatsky
2025-11-13  8:53 ` [PATCHv2 1/4] zram: introduce writeback bio batching support Sergey Senozhatsky
2025-11-13 23:45   ` Minchan Kim
2025-11-14  1:53     ` Sergey Senozhatsky
2025-11-14  3:08       ` Sergey Senozhatsky
2025-11-14 19:14       ` Minchan Kim
2025-11-15  2:25         ` Sergey Senozhatsky
2025-11-15  3:42         ` Sergey Senozhatsky
2025-11-13  8:54 ` [PATCHv2 2/4] zram: add writeback batch size device attr Sergey Senozhatsky
2025-11-13  8:54 ` [PATCHv2 3/4] zram: take write lock in wb limit store handlers Sergey Senozhatsky
2025-11-13  8:54 ` [PATCHv2 4/4] zram: drop wb_limit_lock Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox