* [PATCHv6 0/6] zram: introduce writeback bio batching
@ 2025-11-22 7:40 Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 1/6] " Sergey Senozhatsky
` (6 more replies)
0 siblings, 7 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky
As writeback is becoming more and more common the longstanding
limitations of zram writeback throughput are becoming more
visible. Introduce writeback bio batching so that multiple
writeback bio-s can be processed simultaneously.
v5 -> v6:
- added some comments to make code clearer
- use write lock for batch size limit store (Andrew)
- err on 0 batch size (Brian)
- pickup reviewed-by tags (Brian)
Sergey Senozhatsky (6):
zram: introduce writeback bio batching
zram: add writeback batch size device attr
zram: take write lock in wb limit store handlers
zram: drop wb_limit_lock
zram: rework bdev block allocation
zram: read slot block idx under slot lock
drivers/block/zram/zram_drv.c | 471 ++++++++++++++++++++++++++--------
drivers/block/zram/zram_drv.h | 2 +-
2 files changed, 365 insertions(+), 108 deletions(-)
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv6 1/6] zram: introduce writeback bio batching
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
@ 2025-11-22 7:40 ` Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 2/6] zram: add writeback batch size device attr Sergey Senozhatsky
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky, Minchan Kim
As was stated in a comment [1] a single page writeback IO
is not efficient, but it works. It's time to address this
throughput limitation as writeback becomes used more often.
Introduce batched (multiple) bio writeback support to take
advantage of parallel requests processing and better requests
scheduling.
Approach used in this patch doesn't use a dedicated kthread
like in [2], or blk-plug like in [3]. Dedicated kthread adds
complexity, which can be avoided. Apart from that not all
zram setups use writeback, so having numerous per-device
kthreads (on systems that create multiple zram devices)
hanging around is not the most optimal thing to do. blk-plug,
on the other hand, works best when request are sequential,
which doesn't particularly fit zram writebck IO patterns: zram
writeback IO patterns are expected to be random, due to how
bdev block reservation/release are handled. blk-plug approach
also works in cycles: idle IO, when zram sets up requests in
a batch, is followed by bursts of IO, when zram submits the
entire batch.
Instead we use a batch of requests and submit new bio as soon
as one of the in-flight requests completes.
For the time being the writeback batch size (maximum number of
in-flight bio requests) is set to 32 for all devices. A follow
up patch adds a writeback_batch_size device attribute, so the
batch size becomes run-time configurable.
[1] https://lore.kernel.org/all/20181203024045.153534-6-minchan@kernel.org/
[2] https://lore.kernel.org/all/20250731064949.1690732-1-richardycc@google.com/
[3] https://lore.kernel.org/all/tencent_78FC2C4FE16BA1EBAF0897DB60FCD675ED05@qq.com/
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Co-developed-by: Yuwen Chen <ywen.chen@foxmail.com>
Co-developed-by: Richard Chang <richardycc@google.com>
Suggested-by: Minchan Kim <minchan@google.com>
---
drivers/block/zram/zram_drv.c | 369 +++++++++++++++++++++++++++-------
1 file changed, 301 insertions(+), 68 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index a43074657531..06ea56f0a00f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -500,6 +500,26 @@ static ssize_t idle_store(struct device *dev,
}
#ifdef CONFIG_ZRAM_WRITEBACK
+struct zram_wb_ctl {
+ /* idle list is accessed only by the writeback task, no concurency */
+ struct list_head idle_reqs;
+ /* done list is accessed concurrently, protect by done_lock */
+ struct list_head done_reqs;
+ wait_queue_head_t done_wait;
+ spinlock_t done_lock;
+ atomic_t num_inflight;
+};
+
+struct zram_wb_req {
+ unsigned long blk_idx;
+ struct page *page;
+ struct zram_pp_slot *pps;
+ struct bio_vec bio_vec;
+ struct bio bio;
+
+ struct list_head entry;
+};
+
static ssize_t writeback_limit_enable_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
@@ -734,19 +754,221 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
submit_bio(bio);
}
-static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
+static void release_wb_req(struct zram_wb_req *req)
{
- unsigned long blk_idx = 0;
- struct page *page = NULL;
- struct zram_pp_slot *pps;
- struct bio_vec bio_vec;
- struct bio bio;
+ __free_page(req->page);
+ kfree(req);
+}
+
+static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
+{
+ if (!wb_ctl)
+ return;
+
+ /* We should never have inflight requests at this point */
+ WARN_ON(atomic_read(&wb_ctl->num_inflight));
+ WARN_ON(!list_empty(&wb_ctl->done_reqs));
+
+ while (!list_empty(&wb_ctl->idle_reqs)) {
+ struct zram_wb_req *req;
+
+ req = list_first_entry(&wb_ctl->idle_reqs,
+ struct zram_wb_req, entry);
+ list_del(&req->entry);
+ release_wb_req(req);
+ }
+
+ kfree(wb_ctl);
+}
+
+/* XXX: should be a per-device sysfs attr */
+#define ZRAM_WB_REQ_CNT 32
+
+static struct zram_wb_ctl *init_wb_ctl(void)
+{
+ struct zram_wb_ctl *wb_ctl;
+ int i;
+
+ wb_ctl = kmalloc(sizeof(*wb_ctl), GFP_KERNEL);
+ if (!wb_ctl)
+ return NULL;
+
+ INIT_LIST_HEAD(&wb_ctl->idle_reqs);
+ INIT_LIST_HEAD(&wb_ctl->done_reqs);
+ atomic_set(&wb_ctl->num_inflight, 0);
+ init_waitqueue_head(&wb_ctl->done_wait);
+ spin_lock_init(&wb_ctl->done_lock);
+
+ for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
+ struct zram_wb_req *req;
+
+ /*
+ * This is fatal condition only if we couldn't allocate
+ * any requests at all. Otherwise we just work with the
+ * requests that we have successfully allocated, so that
+ * writeback can still proceed, even if there is only one
+ * request on the idle list.
+ */
+ req = kzalloc(sizeof(*req), GFP_KERNEL | __GFP_NOWARN);
+ if (!req)
+ break;
+
+ req->page = alloc_page(GFP_KERNEL | __GFP_NOWARN);
+ if (!req->page) {
+ kfree(req);
+ break;
+ }
+
+ list_add(&req->entry, &wb_ctl->idle_reqs);
+ }
+
+ /* We couldn't allocate any requests, so writeabck is not possible */
+ if (list_empty(&wb_ctl->idle_reqs))
+ goto release_wb_ctl;
+
+ return wb_ctl;
+
+release_wb_ctl:
+ release_wb_ctl(wb_ctl);
+ return NULL;
+}
+
+static void zram_account_writeback_rollback(struct zram *zram)
+{
+ spin_lock(&zram->wb_limit_lock);
+ if (zram->wb_limit_enable)
+ zram->bd_wb_limit += 1UL << (PAGE_SHIFT - 12);
+ spin_unlock(&zram->wb_limit_lock);
+}
+
+static void zram_account_writeback_submit(struct zram *zram)
+{
+ spin_lock(&zram->wb_limit_lock);
+ if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
+ zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12);
+ spin_unlock(&zram->wb_limit_lock);
+}
+
+static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
+{
+ u32 index = req->pps->index;
+ int err;
+
+ err = blk_status_to_errno(req->bio.bi_status);
+ if (err) {
+ /*
+ * Failed wb requests should not be accounted in wb_limit
+ * (if enabled).
+ */
+ zram_account_writeback_rollback(zram);
+ free_block_bdev(zram, req->blk_idx);
+ return err;
+ }
+
+ atomic64_inc(&zram->stats.bd_writes);
+ zram_slot_lock(zram, index);
+ /*
+ * We release slot lock during writeback so slot can change under us:
+ * slot_free() or slot_free() and zram_write_page(). In both cases
+ * slot loses ZRAM_PP_SLOT flag. No concurrent post-processing can
+ * set ZRAM_PP_SLOT on such slots until current post-processing
+ * finishes.
+ */
+ if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) {
+ free_block_bdev(zram, req->blk_idx);
+ goto out;
+ }
+
+ zram_free_page(zram, index);
+ zram_set_flag(zram, index, ZRAM_WB);
+ zram_set_handle(zram, index, req->blk_idx);
+ atomic64_inc(&zram->stats.pages_stored);
+
+out:
+ zram_slot_unlock(zram, index);
+ return 0;
+}
+
+static void zram_writeback_endio(struct bio *bio)
+{
+ struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio);
+ struct zram_wb_ctl *wb_ctl = bio->bi_private;
+ unsigned long flags;
+
+ spin_lock_irqsave(&wb_ctl->done_lock, flags);
+ list_add(&req->entry, &wb_ctl->done_reqs);
+ spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
+
+ wake_up(&wb_ctl->done_wait);
+}
+
+static void zram_submit_wb_request(struct zram *zram,
+ struct zram_wb_ctl *wb_ctl,
+ struct zram_wb_req *req)
+{
+ /*
+ * wb_limit (if enabled) should be adjusted before submission,
+ * so that we don't over-submit.
+ */
+ zram_account_writeback_submit(zram);
+ atomic_inc(&wb_ctl->num_inflight);
+ req->bio.bi_private = wb_ctl;
+ submit_bio(&req->bio);
+}
+
+static int zram_complete_done_reqs(struct zram *zram,
+ struct zram_wb_ctl *wb_ctl)
+{
+ struct zram_wb_req *req;
+ unsigned long flags;
int ret = 0, err;
- u32 index;
- page = alloc_page(GFP_KERNEL);
- if (!page)
- return -ENOMEM;
+ while (atomic_read(&wb_ctl->num_inflight) > 0) {
+ spin_lock_irqsave(&wb_ctl->done_lock, flags);
+ req = list_first_entry_or_null(&wb_ctl->done_reqs,
+ struct zram_wb_req, entry);
+ if (req)
+ list_del(&req->entry);
+ spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
+
+ /* ->num_inflight > 0 doesn't mean we have done requests */
+ if (!req)
+ break;
+
+ err = zram_writeback_complete(zram, req);
+ if (err)
+ ret = err;
+
+ atomic_dec(&wb_ctl->num_inflight);
+ release_pp_slot(zram, req->pps);
+ req->pps = NULL;
+
+ list_add(&req->entry, &wb_ctl->idle_reqs);
+ }
+
+ return ret;
+}
+
+static struct zram_wb_req *zram_select_idle_req(struct zram_wb_ctl *wb_ctl)
+{
+ struct zram_wb_req *req;
+
+ req = list_first_entry_or_null(&wb_ctl->idle_reqs,
+ struct zram_wb_req, entry);
+ if (req)
+ list_del(&req->entry);
+ return req;
+}
+
+static int zram_writeback_slots(struct zram *zram,
+ struct zram_pp_ctl *ctl,
+ struct zram_wb_ctl *wb_ctl)
+{
+ struct zram_wb_req *req = NULL;
+ unsigned long blk_idx = 0;
+ struct zram_pp_slot *pps;
+ int ret = 0, err = 0;
+ u32 index = 0;
while ((pps = select_pp_slot(ctl))) {
spin_lock(&zram->wb_limit_lock);
@@ -757,6 +979,27 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
}
spin_unlock(&zram->wb_limit_lock);
+ while (!req) {
+ req = zram_select_idle_req(wb_ctl);
+ if (req)
+ break;
+
+ wait_event(wb_ctl->done_wait,
+ !list_empty(&wb_ctl->done_reqs));
+
+ err = zram_complete_done_reqs(zram, wb_ctl);
+ /*
+ * BIO errors are not fatal, we continue and simply
+ * attempt to writeback the remaining objects (pages).
+ * At the same time we need to signal user-space that
+ * some writes (at least one, but also could be all of
+ * them) were not successful and we do so by returning
+ * the most recent BIO error.
+ */
+ if (err)
+ ret = err;
+ }
+
if (!blk_idx) {
blk_idx = alloc_block_bdev(zram);
if (!blk_idx) {
@@ -775,67 +1018,47 @@ static int zram_writeback_slots(struct zram *zram, struct zram_pp_ctl *ctl)
*/
if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
goto next;
- if (zram_read_from_zspool(zram, page, index))
+ if (zram_read_from_zspool(zram, req->page, index))
goto next;
zram_slot_unlock(zram, index);
- bio_init(&bio, zram->bdev, &bio_vec, 1,
- REQ_OP_WRITE | REQ_SYNC);
- bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
- __bio_add_page(&bio, page, PAGE_SIZE, 0);
-
/*
- * XXX: A single page IO would be inefficient for write
- * but it would be not bad as starter.
+ * From now on pp-slot is owned by the req, remove it from
+ * its pp bucket.
*/
- err = submit_bio_wait(&bio);
- if (err) {
- release_pp_slot(zram, pps);
- /*
- * BIO errors are not fatal, we continue and simply
- * attempt to writeback the remaining objects (pages).
- * At the same time we need to signal user-space that
- * some writes (at least one, but also could be all of
- * them) were not successful and we do so by returning
- * the most recent BIO error.
- */
- ret = err;
- continue;
- }
+ list_del_init(&pps->entry);
- atomic64_inc(&zram->stats.bd_writes);
- zram_slot_lock(zram, index);
- /*
- * Same as above, we release slot lock during writeback so
- * slot can change under us: slot_free() or slot_free() and
- * reallocation (zram_write_page()). In both cases slot loses
- * ZRAM_PP_SLOT flag. No concurrent post-processing can set
- * ZRAM_PP_SLOT on such slots until current post-processing
- * finishes.
- */
- if (!zram_test_flag(zram, index, ZRAM_PP_SLOT))
- goto next;
+ req->blk_idx = blk_idx;
+ req->pps = pps;
+ bio_init(&req->bio, zram->bdev, &req->bio_vec, 1, REQ_OP_WRITE);
+ req->bio.bi_iter.bi_sector = req->blk_idx * (PAGE_SIZE >> 9);
+ req->bio.bi_end_io = zram_writeback_endio;
+ __bio_add_page(&req->bio, req->page, PAGE_SIZE, 0);
- zram_free_page(zram, index);
- zram_set_flag(zram, index, ZRAM_WB);
- zram_set_handle(zram, index, blk_idx);
+ zram_submit_wb_request(zram, wb_ctl, req);
blk_idx = 0;
- atomic64_inc(&zram->stats.pages_stored);
- spin_lock(&zram->wb_limit_lock);
- if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
- zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12);
- spin_unlock(&zram->wb_limit_lock);
+ req = NULL;
+ cond_resched();
+ continue;
+
next:
zram_slot_unlock(zram, index);
release_pp_slot(zram, pps);
-
- cond_resched();
}
- if (blk_idx)
- free_block_bdev(zram, blk_idx);
- if (page)
- __free_page(page);
+ /*
+ * Selected idle req, but never submitted it due to some error or
+ * wb limit.
+ */
+ if (req)
+ release_wb_req(req);
+
+ while (atomic_read(&wb_ctl->num_inflight) > 0) {
+ wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs));
+ err = zram_complete_done_reqs(zram, wb_ctl);
+ if (err)
+ ret = err;
+ }
return ret;
}
@@ -948,7 +1171,8 @@ static ssize_t writeback_store(struct device *dev,
struct zram *zram = dev_to_zram(dev);
u64 nr_pages = zram->disksize >> PAGE_SHIFT;
unsigned long lo = 0, hi = nr_pages;
- struct zram_pp_ctl *ctl = NULL;
+ struct zram_pp_ctl *pp_ctl = NULL;
+ struct zram_wb_ctl *wb_ctl = NULL;
char *args, *param, *val;
ssize_t ret = len;
int err, mode = 0;
@@ -970,8 +1194,14 @@ static ssize_t writeback_store(struct device *dev,
goto release_init_lock;
}
- ctl = init_pp_ctl();
- if (!ctl) {
+ pp_ctl = init_pp_ctl();
+ if (!pp_ctl) {
+ ret = -ENOMEM;
+ goto release_init_lock;
+ }
+
+ wb_ctl = init_wb_ctl();
+ if (!wb_ctl) {
ret = -ENOMEM;
goto release_init_lock;
}
@@ -1000,7 +1230,7 @@ static ssize_t writeback_store(struct device *dev,
goto release_init_lock;
}
- scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+ scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
break;
}
@@ -1011,7 +1241,7 @@ static ssize_t writeback_store(struct device *dev,
goto release_init_lock;
}
- scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+ scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
break;
}
@@ -1022,7 +1252,7 @@ static ssize_t writeback_store(struct device *dev,
goto release_init_lock;
}
- scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+ scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
continue;
}
@@ -1033,17 +1263,18 @@ static ssize_t writeback_store(struct device *dev,
goto release_init_lock;
}
- scan_slots_for_writeback(zram, mode, lo, hi, ctl);
+ scan_slots_for_writeback(zram, mode, lo, hi, pp_ctl);
continue;
}
}
- err = zram_writeback_slots(zram, ctl);
+ err = zram_writeback_slots(zram, pp_ctl, wb_ctl);
if (err)
ret = err;
release_init_lock:
- release_pp_ctl(zram, ctl);
+ release_pp_ctl(zram, pp_ctl);
+ release_wb_ctl(wb_ctl);
atomic_set(&zram->pp_in_progress, 0);
up_read(&zram->init_lock);
@@ -1112,7 +1343,9 @@ static int read_from_bdev(struct zram *zram, struct page *page,
return -EIO;
}
-static void free_block_bdev(struct zram *zram, unsigned long blk_idx) {};
+static void free_block_bdev(struct zram *zram, unsigned long blk_idx)
+{
+}
#endif
#ifdef CONFIG_ZRAM_MEMORY_TRACKING
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv6 2/6] zram: add writeback batch size device attr
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 1/6] " Sergey Senozhatsky
@ 2025-11-22 7:40 ` Sergey Senozhatsky
2025-11-24 15:50 ` Brian Geffon
2025-11-22 7:40 ` [PATCHv6 3/6] zram: take write lock in wb limit store handlers Sergey Senozhatsky
` (4 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky
Introduce writeback_batch_size device attribute so that
the maximum number of in-flight writeback bio requests
can be configured at run-time per-device. This essentially
enables batched bio writeback.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 46 ++++++++++++++++++++++++++++++-----
drivers/block/zram/zram_drv.h | 1 +
2 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 06ea56f0a00f..5906ba061165 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -590,6 +590,40 @@ static ssize_t writeback_limit_show(struct device *dev,
return sysfs_emit(buf, "%llu\n", val);
}
+static ssize_t writeback_batch_size_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct zram *zram = dev_to_zram(dev);
+ u32 val;
+
+ if (kstrtouint(buf, 10, &val))
+ return -EINVAL;
+
+ if (!val)
+ return -EINVAL;
+
+ down_write(&zram->init_lock);
+ zram->wb_batch_size = val;
+ up_write(&zram->init_lock);
+
+ return len;
+}
+
+static ssize_t writeback_batch_size_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ u32 val;
+ struct zram *zram = dev_to_zram(dev);
+
+ down_read(&zram->init_lock);
+ val = zram->wb_batch_size;
+ up_read(&zram->init_lock);
+
+ return sysfs_emit(buf, "%u\n", val);
+}
+
static void reset_bdev(struct zram *zram)
{
if (!zram->backing_dev)
@@ -781,10 +815,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
kfree(wb_ctl);
}
-/* XXX: should be a per-device sysfs attr */
-#define ZRAM_WB_REQ_CNT 32
-
-static struct zram_wb_ctl *init_wb_ctl(void)
+static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
{
struct zram_wb_ctl *wb_ctl;
int i;
@@ -799,7 +830,7 @@ static struct zram_wb_ctl *init_wb_ctl(void)
init_waitqueue_head(&wb_ctl->done_wait);
spin_lock_init(&wb_ctl->done_lock);
- for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
+ for (i = 0; i < zram->wb_batch_size; i++) {
struct zram_wb_req *req;
/*
@@ -1200,7 +1231,7 @@ static ssize_t writeback_store(struct device *dev,
goto release_init_lock;
}
- wb_ctl = init_wb_ctl();
+ wb_ctl = init_wb_ctl(zram);
if (!wb_ctl) {
ret = -ENOMEM;
goto release_init_lock;
@@ -2843,6 +2874,7 @@ static DEVICE_ATTR_RW(backing_dev);
static DEVICE_ATTR_WO(writeback);
static DEVICE_ATTR_RW(writeback_limit);
static DEVICE_ATTR_RW(writeback_limit_enable);
+static DEVICE_ATTR_RW(writeback_batch_size);
#endif
#ifdef CONFIG_ZRAM_MULTI_COMP
static DEVICE_ATTR_RW(recomp_algorithm);
@@ -2864,6 +2896,7 @@ static struct attribute *zram_disk_attrs[] = {
&dev_attr_writeback.attr,
&dev_attr_writeback_limit.attr,
&dev_attr_writeback_limit_enable.attr,
+ &dev_attr_writeback_batch_size.attr,
#endif
&dev_attr_io_stat.attr,
&dev_attr_mm_stat.attr,
@@ -2925,6 +2958,7 @@ static int zram_add(void)
init_rwsem(&zram->init_lock);
#ifdef CONFIG_ZRAM_WRITEBACK
+ zram->wb_batch_size = 32;
spin_lock_init(&zram->wb_limit_lock);
#endif
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 6cee93f9c0d0..1a647f42c1a4 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -129,6 +129,7 @@ struct zram {
struct file *backing_dev;
spinlock_t wb_limit_lock;
bool wb_limit_enable;
+ u32 wb_batch_size;
u64 bd_wb_limit;
struct block_device *bdev;
unsigned long *bitmap;
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv6 3/6] zram: take write lock in wb limit store handlers
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 1/6] " Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 2/6] zram: add writeback batch size device attr Sergey Senozhatsky
@ 2025-11-22 7:40 ` Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 4/6] zram: drop wb_limit_lock Sergey Senozhatsky
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky
Write device attrs handlers should take write zram init_lock.
While at it, fixup coding styles.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
---
drivers/block/zram/zram_drv.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5906ba061165..8dd733707a40 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -521,7 +521,8 @@ struct zram_wb_req {
};
static ssize_t writeback_limit_enable_store(struct device *dev,
- struct device_attribute *attr, const char *buf, size_t len)
+ struct device_attribute *attr,
+ const char *buf, size_t len)
{
struct zram *zram = dev_to_zram(dev);
u64 val;
@@ -530,18 +531,19 @@ static ssize_t writeback_limit_enable_store(struct device *dev,
if (kstrtoull(buf, 10, &val))
return ret;
- down_read(&zram->init_lock);
+ down_write(&zram->init_lock);
spin_lock(&zram->wb_limit_lock);
zram->wb_limit_enable = val;
spin_unlock(&zram->wb_limit_lock);
- up_read(&zram->init_lock);
+ up_write(&zram->init_lock);
ret = len;
return ret;
}
static ssize_t writeback_limit_enable_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+ struct device_attribute *attr,
+ char *buf)
{
bool val;
struct zram *zram = dev_to_zram(dev);
@@ -556,7 +558,8 @@ static ssize_t writeback_limit_enable_show(struct device *dev,
}
static ssize_t writeback_limit_store(struct device *dev,
- struct device_attribute *attr, const char *buf, size_t len)
+ struct device_attribute *attr,
+ const char *buf, size_t len)
{
struct zram *zram = dev_to_zram(dev);
u64 val;
@@ -565,11 +568,11 @@ static ssize_t writeback_limit_store(struct device *dev,
if (kstrtoull(buf, 10, &val))
return ret;
- down_read(&zram->init_lock);
+ down_write(&zram->init_lock);
spin_lock(&zram->wb_limit_lock);
zram->bd_wb_limit = val;
spin_unlock(&zram->wb_limit_lock);
- up_read(&zram->init_lock);
+ up_write(&zram->init_lock);
ret = len;
return ret;
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv6 4/6] zram: drop wb_limit_lock
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
` (2 preceding siblings ...)
2025-11-22 7:40 ` [PATCHv6 3/6] zram: take write lock in wb limit store handlers Sergey Senozhatsky
@ 2025-11-22 7:40 ` Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 5/6] zram: rework bdev block allocation Sergey Senozhatsky
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky
We don't need wb_limit_lock. Writeback limit setters take an
exclusive write zram init_lock, while wb_limit modifications
happen only from a single task and under zram read init_lock.
No concurrent wb_limit modifications are possible (we permit
only one post-processing task at a time). Add lockdep
assertions to wb_limit mutators.
While at it, fixup coding styles.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
---
drivers/block/zram/zram_drv.c | 22 +++++-----------------
drivers/block/zram/zram_drv.h | 1 -
2 files changed, 5 insertions(+), 18 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 8dd733707a40..806497225603 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -532,9 +532,7 @@ static ssize_t writeback_limit_enable_store(struct device *dev,
return ret;
down_write(&zram->init_lock);
- spin_lock(&zram->wb_limit_lock);
zram->wb_limit_enable = val;
- spin_unlock(&zram->wb_limit_lock);
up_write(&zram->init_lock);
ret = len;
@@ -549,9 +547,7 @@ static ssize_t writeback_limit_enable_show(struct device *dev,
struct zram *zram = dev_to_zram(dev);
down_read(&zram->init_lock);
- spin_lock(&zram->wb_limit_lock);
val = zram->wb_limit_enable;
- spin_unlock(&zram->wb_limit_lock);
up_read(&zram->init_lock);
return sysfs_emit(buf, "%d\n", val);
@@ -569,9 +565,7 @@ static ssize_t writeback_limit_store(struct device *dev,
return ret;
down_write(&zram->init_lock);
- spin_lock(&zram->wb_limit_lock);
zram->bd_wb_limit = val;
- spin_unlock(&zram->wb_limit_lock);
up_write(&zram->init_lock);
ret = len;
@@ -579,15 +573,13 @@ static ssize_t writeback_limit_store(struct device *dev,
}
static ssize_t writeback_limit_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+ struct device_attribute *attr, char *buf)
{
u64 val;
struct zram *zram = dev_to_zram(dev);
down_read(&zram->init_lock);
- spin_lock(&zram->wb_limit_lock);
val = zram->bd_wb_limit;
- spin_unlock(&zram->wb_limit_lock);
up_read(&zram->init_lock);
return sysfs_emit(buf, "%llu\n", val);
@@ -869,18 +861,18 @@ static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
static void zram_account_writeback_rollback(struct zram *zram)
{
- spin_lock(&zram->wb_limit_lock);
+ lockdep_assert_held_read(&zram->init_lock);
+
if (zram->wb_limit_enable)
zram->bd_wb_limit += 1UL << (PAGE_SHIFT - 12);
- spin_unlock(&zram->wb_limit_lock);
}
static void zram_account_writeback_submit(struct zram *zram)
{
- spin_lock(&zram->wb_limit_lock);
+ lockdep_assert_held_read(&zram->init_lock);
+
if (zram->wb_limit_enable && zram->bd_wb_limit > 0)
zram->bd_wb_limit -= 1UL << (PAGE_SHIFT - 12);
- spin_unlock(&zram->wb_limit_lock);
}
static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
@@ -1005,13 +997,10 @@ static int zram_writeback_slots(struct zram *zram,
u32 index = 0;
while ((pps = select_pp_slot(ctl))) {
- spin_lock(&zram->wb_limit_lock);
if (zram->wb_limit_enable && !zram->bd_wb_limit) {
- spin_unlock(&zram->wb_limit_lock);
ret = -EIO;
break;
}
- spin_unlock(&zram->wb_limit_lock);
while (!req) {
req = zram_select_idle_req(wb_ctl);
@@ -2962,7 +2951,6 @@ static int zram_add(void)
init_rwsem(&zram->init_lock);
#ifdef CONFIG_ZRAM_WRITEBACK
zram->wb_batch_size = 32;
- spin_lock_init(&zram->wb_limit_lock);
#endif
/* gendisk structure */
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 1a647f42c1a4..c6d94501376c 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -127,7 +127,6 @@ struct zram {
bool claim; /* Protected by disk->open_mutex */
#ifdef CONFIG_ZRAM_WRITEBACK
struct file *backing_dev;
- spinlock_t wb_limit_lock;
bool wb_limit_enable;
u32 wb_batch_size;
u64 bd_wb_limit;
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv6 5/6] zram: rework bdev block allocation
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
` (3 preceding siblings ...)
2025-11-22 7:40 ` [PATCHv6 4/6] zram: drop wb_limit_lock Sergey Senozhatsky
@ 2025-11-22 7:40 ` Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 6/6] zram: read slot block idx under slot lock Sergey Senozhatsky
2025-11-22 21:54 ` [PATCHv6 0/6] zram: introduce writeback bio batching Andrew Morton
6 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky
First, writeback bdev ->bitmap bits are set only from one
context, as we can have only one single task performing
writeback, so we cannot race with anything else. Remove
retry path.
Second, we always check ZRAM_WB flag to distinguish writtenback
slots, so we should not confuse 0 bdev block index and 0 handle.
We can use first bdev block (0 bit) for writeback as well.
While at it, give functions slightly more accurate names, as
we don't alloc/free anything there, we reserve a block for
async writeback or release the block.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
---
drivers/block/zram/zram_drv.c | 37 +++++++++++++++++------------------
1 file changed, 18 insertions(+), 19 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 806497225603..1f7e9e914d34 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -500,6 +500,8 @@ static ssize_t idle_store(struct device *dev,
}
#ifdef CONFIG_ZRAM_WRITEBACK
+#define INVALID_BDEV_BLOCK (~0UL)
+
struct zram_wb_ctl {
/* idle list is accessed only by the writeback task, no concurency */
struct list_head idle_reqs;
@@ -746,23 +748,20 @@ static ssize_t backing_dev_store(struct device *dev,
return err;
}
-static unsigned long alloc_block_bdev(struct zram *zram)
+static unsigned long zram_reserve_bdev_block(struct zram *zram)
{
- unsigned long blk_idx = 1;
-retry:
- /* skip 0 bit to confuse zram.handle = 0 */
- blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, blk_idx);
- if (blk_idx == zram->nr_pages)
- return 0;
+ unsigned long blk_idx;
- if (test_and_set_bit(blk_idx, zram->bitmap))
- goto retry;
+ blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, 0);
+ if (blk_idx == zram->nr_pages)
+ return INVALID_BDEV_BLOCK;
+ set_bit(blk_idx, zram->bitmap);
atomic64_inc(&zram->stats.bd_count);
return blk_idx;
}
-static void free_block_bdev(struct zram *zram, unsigned long blk_idx)
+static void zram_release_bdev_block(struct zram *zram, unsigned long blk_idx)
{
int was_set;
@@ -887,7 +886,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
* (if enabled).
*/
zram_account_writeback_rollback(zram);
- free_block_bdev(zram, req->blk_idx);
+ zram_release_bdev_block(zram, req->blk_idx);
return err;
}
@@ -901,7 +900,7 @@ static int zram_writeback_complete(struct zram *zram, struct zram_wb_req *req)
* finishes.
*/
if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) {
- free_block_bdev(zram, req->blk_idx);
+ zram_release_bdev_block(zram, req->blk_idx);
goto out;
}
@@ -990,8 +989,8 @@ static int zram_writeback_slots(struct zram *zram,
struct zram_pp_ctl *ctl,
struct zram_wb_ctl *wb_ctl)
{
+ unsigned long blk_idx = INVALID_BDEV_BLOCK;
struct zram_wb_req *req = NULL;
- unsigned long blk_idx = 0;
struct zram_pp_slot *pps;
int ret = 0, err = 0;
u32 index = 0;
@@ -1023,9 +1022,9 @@ static int zram_writeback_slots(struct zram *zram,
ret = err;
}
- if (!blk_idx) {
- blk_idx = alloc_block_bdev(zram);
- if (!blk_idx) {
+ if (blk_idx == INVALID_BDEV_BLOCK) {
+ blk_idx = zram_reserve_bdev_block(zram);
+ if (blk_idx == INVALID_BDEV_BLOCK) {
ret = -ENOSPC;
break;
}
@@ -1059,7 +1058,7 @@ static int zram_writeback_slots(struct zram *zram,
__bio_add_page(&req->bio, req->page, PAGE_SIZE, 0);
zram_submit_wb_request(zram, wb_ctl, req);
- blk_idx = 0;
+ blk_idx = INVALID_BDEV_BLOCK;
req = NULL;
cond_resched();
continue;
@@ -1366,7 +1365,7 @@ static int read_from_bdev(struct zram *zram, struct page *page,
return -EIO;
}
-static void free_block_bdev(struct zram *zram, unsigned long blk_idx)
+static void zram_release_bdev_block(struct zram *zram, unsigned long blk_idx)
{
}
#endif
@@ -1890,7 +1889,7 @@ static void zram_free_page(struct zram *zram, size_t index)
if (zram_test_flag(zram, index, ZRAM_WB)) {
zram_clear_flag(zram, index, ZRAM_WB);
- free_block_bdev(zram, zram_get_handle(zram, index));
+ zram_release_bdev_block(zram, zram_get_handle(zram, index));
goto out;
}
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv6 6/6] zram: read slot block idx under slot lock
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
` (4 preceding siblings ...)
2025-11-22 7:40 ` [PATCHv6 5/6] zram: rework bdev block allocation Sergey Senozhatsky
@ 2025-11-22 7:40 ` Sergey Senozhatsky
2025-11-22 21:54 ` [PATCHv6 0/6] zram: introduce writeback bio batching Andrew Morton
6 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 7:40 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang
Cc: Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block,
Sergey Senozhatsky
Read slot's block id under slot-lock. We release the
slot-lock for bdev read so, technically, slot still can
get freed in the meantime, but at least we will read
bdev block (page) that holds previous know slot data,
not from slot->handle bdev block, which can be anything
at that point.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 1f7e9e914d34..3428f647d0a7 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1995,14 +1995,14 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
ret = zram_read_from_zspool(zram, page, index);
zram_slot_unlock(zram, index);
} else {
+ unsigned long blk_idx = zram_get_handle(zram, index);
+
/*
* The slot should be unlocked before reading from the backing
* device.
*/
zram_slot_unlock(zram, index);
-
- ret = read_from_bdev(zram, page, zram_get_handle(zram, index),
- parent);
+ ret = read_from_bdev(zram, page, blk_idx, parent);
}
/* Should NEVER happen. Return bio error if it does. */
--
2.52.0.460.gd25c4c69ec-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv6 0/6] zram: introduce writeback bio batching
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
` (5 preceding siblings ...)
2025-11-22 7:40 ` [PATCHv6 6/6] zram: read slot block idx under slot lock Sergey Senozhatsky
@ 2025-11-22 21:54 ` Andrew Morton
2025-11-22 23:49 ` Sergey Senozhatsky
6 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2025-11-22 21:54 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Minchan Kim, Yuwen Chen, Richard Chang, Brian Geffon,
Fengyu Lian, linux-kernel, linux-mm, linux-block
On Sat, 22 Nov 2025 16:40:23 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> As writeback is becoming more and more common the longstanding
> limitations of zram writeback throughput are becoming more
> visible. Introduce writeback bio batching so that multiple
> writeback bio-s can be processed simultaneously.
Thanks, I updated mm.git's mm-unstable branch to this version.
> v5 -> v6:
> - added some comments to make code clearer
> - use write lock for batch size limit store (Andrew)
> - err on 0 batch size (Brian)
> - pickup reviewed-by tags (Brian)
Here's how this v6 series altered mm-unstable:
drivers/block/zram/zram_drv.c | 112 +++++++++++++++++---------------
1 file changed, 62 insertions(+), 50 deletions(-)
--- a/drivers/block/zram/zram_drv.c~b
+++ a/drivers/block/zram/zram_drv.c
@@ -503,11 +503,13 @@ out:
#define INVALID_BDEV_BLOCK (~0UL)
struct zram_wb_ctl {
+ /* idle list is accessed only by the writeback task, no concurency */
struct list_head idle_reqs;
- struct list_head inflight_reqs;
-
+ /* done list is accessed concurrently, protect by done_lock */
+ struct list_head done_reqs;
+ wait_queue_head_t done_wait;
+ spinlock_t done_lock;
atomic_t num_inflight;
- struct completion done;
};
struct zram_wb_req {
@@ -591,20 +593,18 @@ static ssize_t writeback_batch_size_stor
{
struct zram *zram = dev_to_zram(dev);
u32 val;
- ssize_t ret = -EINVAL;
if (kstrtouint(buf, 10, &val))
- return ret;
+ return -EINVAL;
if (!val)
- val = 1;
+ return -EINVAL;
down_write(&zram->init_lock);
zram->wb_batch_size = val;
up_write(&zram->init_lock);
- ret = len;
- return ret;
+ return len;
}
static ssize_t writeback_batch_size_show(struct device *dev,
@@ -794,7 +794,8 @@ static void release_wb_ctl(struct zram_w
return;
/* We should never have inflight requests at this point */
- WARN_ON(!list_empty(&wb_ctl->inflight_reqs));
+ WARN_ON(atomic_read(&wb_ctl->num_inflight));
+ WARN_ON(!list_empty(&wb_ctl->done_reqs));
while (!list_empty(&wb_ctl->idle_reqs)) {
struct zram_wb_req *req;
@@ -818,9 +819,10 @@ static struct zram_wb_ctl *init_wb_ctl(s
return NULL;
INIT_LIST_HEAD(&wb_ctl->idle_reqs);
- INIT_LIST_HEAD(&wb_ctl->inflight_reqs);
+ INIT_LIST_HEAD(&wb_ctl->done_reqs);
atomic_set(&wb_ctl->num_inflight, 0);
- init_completion(&wb_ctl->done);
+ init_waitqueue_head(&wb_ctl->done_wait);
+ spin_lock_init(&wb_ctl->done_lock);
for (i = 0; i < zram->wb_batch_size; i++) {
struct zram_wb_req *req;
@@ -914,10 +916,15 @@ out:
static void zram_writeback_endio(struct bio *bio)
{
+ struct zram_wb_req *req = container_of(bio, struct zram_wb_req, bio);
struct zram_wb_ctl *wb_ctl = bio->bi_private;
+ unsigned long flags;
- if (atomic_dec_return(&wb_ctl->num_inflight) == 0)
- complete(&wb_ctl->done);
+ spin_lock_irqsave(&wb_ctl->done_lock, flags);
+ list_add(&req->entry, &wb_ctl->done_reqs);
+ spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
+
+ wake_up(&wb_ctl->done_wait);
}
static void zram_submit_wb_request(struct zram *zram,
@@ -930,49 +937,54 @@ static void zram_submit_wb_request(struc
*/
zram_account_writeback_submit(zram);
atomic_inc(&wb_ctl->num_inflight);
- list_add_tail(&req->entry, &wb_ctl->inflight_reqs);
+ req->bio.bi_private = wb_ctl;
submit_bio(&req->bio);
}
-static struct zram_wb_req *select_idle_req(struct zram_wb_ctl *wb_ctl)
+static int zram_complete_done_reqs(struct zram *zram,
+ struct zram_wb_ctl *wb_ctl)
{
struct zram_wb_req *req;
+ unsigned long flags;
+ int ret = 0, err;
- req = list_first_entry_or_null(&wb_ctl->idle_reqs,
- struct zram_wb_req, entry);
- if (req)
- list_del(&req->entry);
- return req;
-}
-
-static int zram_wb_wait_for_completion(struct zram *zram,
- struct zram_wb_ctl *wb_ctl)
-{
- int ret = 0;
-
- if (atomic_read(&wb_ctl->num_inflight))
- wait_for_completion_io(&wb_ctl->done);
-
- reinit_completion(&wb_ctl->done);
- while (!list_empty(&wb_ctl->inflight_reqs)) {
- struct zram_wb_req *req;
- int err;
+ while (atomic_read(&wb_ctl->num_inflight) > 0) {
+ spin_lock_irqsave(&wb_ctl->done_lock, flags);
+ req = list_first_entry_or_null(&wb_ctl->done_reqs,
+ struct zram_wb_req, entry);
+ if (req)
+ list_del(&req->entry);
+ spin_unlock_irqrestore(&wb_ctl->done_lock, flags);
- req = list_first_entry(&wb_ctl->inflight_reqs,
- struct zram_wb_req, entry);
- list_move(&req->entry, &wb_ctl->idle_reqs);
+ /* ->num_inflight > 0 doesn't mean we have done requests */
+ if (!req)
+ break;
err = zram_writeback_complete(zram, req);
if (err)
ret = err;
+ atomic_dec(&wb_ctl->num_inflight);
release_pp_slot(zram, req->pps);
req->pps = NULL;
+
+ list_add(&req->entry, &wb_ctl->idle_reqs);
}
return ret;
}
+static struct zram_wb_req *zram_select_idle_req(struct zram_wb_ctl *wb_ctl)
+{
+ struct zram_wb_req *req;
+
+ req = list_first_entry_or_null(&wb_ctl->idle_reqs,
+ struct zram_wb_req, entry);
+ if (req)
+ list_del(&req->entry);
+ return req;
+}
+
static int zram_writeback_slots(struct zram *zram,
struct zram_pp_ctl *ctl,
struct zram_wb_ctl *wb_ctl)
@@ -980,11 +992,9 @@ static int zram_writeback_slots(struct z
unsigned long blk_idx = INVALID_BDEV_BLOCK;
struct zram_wb_req *req = NULL;
struct zram_pp_slot *pps;
- struct blk_plug io_plug;
- int ret = 0, err;
+ int ret = 0, err = 0;
u32 index = 0;
- blk_start_plug(&io_plug);
while ((pps = select_pp_slot(ctl))) {
if (zram->wb_limit_enable && !zram->bd_wb_limit) {
ret = -EIO;
@@ -992,13 +1002,14 @@ static int zram_writeback_slots(struct z
}
while (!req) {
- req = select_idle_req(wb_ctl);
+ req = zram_select_idle_req(wb_ctl);
if (req)
break;
- blk_finish_plug(&io_plug);
- err = zram_wb_wait_for_completion(zram, wb_ctl);
- blk_start_plug(&io_plug);
+ wait_event(wb_ctl->done_wait,
+ !list_empty(&wb_ctl->done_reqs));
+
+ err = zram_complete_done_reqs(zram, wb_ctl);
/*
* BIO errors are not fatal, we continue and simply
* attempt to writeback the remaining objects (pages).
@@ -1044,18 +1055,17 @@ static int zram_writeback_slots(struct z
bio_init(&req->bio, zram->bdev, &req->bio_vec, 1, REQ_OP_WRITE);
req->bio.bi_iter.bi_sector = req->blk_idx * (PAGE_SIZE >> 9);
req->bio.bi_end_io = zram_writeback_endio;
- req->bio.bi_private = wb_ctl;
__bio_add_page(&req->bio, req->page, PAGE_SIZE, 0);
zram_submit_wb_request(zram, wb_ctl, req);
blk_idx = INVALID_BDEV_BLOCK;
req = NULL;
+ cond_resched();
continue;
next:
zram_slot_unlock(zram, index);
release_pp_slot(zram, pps);
- cond_resched();
}
/*
@@ -1065,10 +1075,12 @@ next:
if (req)
release_wb_req(req);
- blk_finish_plug(&io_plug);
- err = zram_wb_wait_for_completion(zram, wb_ctl);
- if (err)
- ret = err;
+ while (atomic_read(&wb_ctl->num_inflight) > 0) {
+ wait_event(wb_ctl->done_wait, !list_empty(&wb_ctl->done_reqs));
+ err = zram_complete_done_reqs(zram, wb_ctl);
+ if (err)
+ ret = err;
+ }
return ret;
}
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv6 0/6] zram: introduce writeback bio batching
2025-11-22 21:54 ` [PATCHv6 0/6] zram: introduce writeback bio batching Andrew Morton
@ 2025-11-22 23:49 ` Sergey Senozhatsky
0 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2025-11-22 23:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Sergey Senozhatsky, Minchan Kim, Yuwen Chen, Richard Chang,
Brian Geffon, Fengyu Lian, linux-kernel, linux-mm, linux-block
On (25/11/22 13:54), Andrew Morton wrote:
> On Sat, 22 Nov 2025 16:40:23 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
>
> > As writeback is becoming more and more common the longstanding
> > limitations of zram writeback throughput are becoming more
> > visible. Introduce writeback bio batching so that multiple
> > writeback bio-s can be processed simultaneously.
>
> Thanks, I updated mm.git's mm-unstable branch to this version.
>
> > v5 -> v6:
> > - added some comments to make code clearer
> > - use write lock for batch size limit store (Andrew)
> > - err on 0 batch size (Brian)
> > - pickup reviewed-by tags (Brian)
>
> Here's how this v6 series altered mm-unstable:
Looks right. Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv6 2/6] zram: add writeback batch size device attr
2025-11-22 7:40 ` [PATCHv6 2/6] zram: add writeback batch size device attr Sergey Senozhatsky
@ 2025-11-24 15:50 ` Brian Geffon
0 siblings, 0 replies; 10+ messages in thread
From: Brian Geffon @ 2025-11-24 15:50 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Andrew Morton, Minchan Kim, Yuwen Chen, Richard Chang,
Fengyu Lian, linux-kernel, linux-mm, linux-block
On Sat, Nov 22, 2025 at 2:40 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> Introduce writeback_batch_size device attribute so that
> the maximum number of in-flight writeback bio requests
> can be configured at run-time per-device. This essentially
> enables batched bio writeback.
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Brian Geffon <bgeffon@google.com>
> ---
> drivers/block/zram/zram_drv.c | 46 ++++++++++++++++++++++++++++++-----
> drivers/block/zram/zram_drv.h | 1 +
> 2 files changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 06ea56f0a00f..5906ba061165 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -590,6 +590,40 @@ static ssize_t writeback_limit_show(struct device *dev,
> return sysfs_emit(buf, "%llu\n", val);
> }
>
> +static ssize_t writeback_batch_size_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct zram *zram = dev_to_zram(dev);
> + u32 val;
> +
> + if (kstrtouint(buf, 10, &val))
> + return -EINVAL;
> +
> + if (!val)
> + return -EINVAL;
> +
> + down_write(&zram->init_lock);
> + zram->wb_batch_size = val;
> + up_write(&zram->init_lock);
> +
> + return len;
> +}
> +
> +static ssize_t writeback_batch_size_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + u32 val;
> + struct zram *zram = dev_to_zram(dev);
> +
> + down_read(&zram->init_lock);
> + val = zram->wb_batch_size;
> + up_read(&zram->init_lock);
> +
> + return sysfs_emit(buf, "%u\n", val);
> +}
> +
> static void reset_bdev(struct zram *zram)
> {
> if (!zram->backing_dev)
> @@ -781,10 +815,7 @@ static void release_wb_ctl(struct zram_wb_ctl *wb_ctl)
> kfree(wb_ctl);
> }
>
> -/* XXX: should be a per-device sysfs attr */
> -#define ZRAM_WB_REQ_CNT 32
> -
> -static struct zram_wb_ctl *init_wb_ctl(void)
> +static struct zram_wb_ctl *init_wb_ctl(struct zram *zram)
> {
> struct zram_wb_ctl *wb_ctl;
> int i;
> @@ -799,7 +830,7 @@ static struct zram_wb_ctl *init_wb_ctl(void)
> init_waitqueue_head(&wb_ctl->done_wait);
> spin_lock_init(&wb_ctl->done_lock);
>
> - for (i = 0; i < ZRAM_WB_REQ_CNT; i++) {
> + for (i = 0; i < zram->wb_batch_size; i++) {
> struct zram_wb_req *req;
>
> /*
> @@ -1200,7 +1231,7 @@ static ssize_t writeback_store(struct device *dev,
> goto release_init_lock;
> }
>
> - wb_ctl = init_wb_ctl();
> + wb_ctl = init_wb_ctl(zram);
> if (!wb_ctl) {
> ret = -ENOMEM;
> goto release_init_lock;
> @@ -2843,6 +2874,7 @@ static DEVICE_ATTR_RW(backing_dev);
> static DEVICE_ATTR_WO(writeback);
> static DEVICE_ATTR_RW(writeback_limit);
> static DEVICE_ATTR_RW(writeback_limit_enable);
> +static DEVICE_ATTR_RW(writeback_batch_size);
> #endif
> #ifdef CONFIG_ZRAM_MULTI_COMP
> static DEVICE_ATTR_RW(recomp_algorithm);
> @@ -2864,6 +2896,7 @@ static struct attribute *zram_disk_attrs[] = {
> &dev_attr_writeback.attr,
> &dev_attr_writeback_limit.attr,
> &dev_attr_writeback_limit_enable.attr,
> + &dev_attr_writeback_batch_size.attr,
> #endif
> &dev_attr_io_stat.attr,
> &dev_attr_mm_stat.attr,
> @@ -2925,6 +2958,7 @@ static int zram_add(void)
>
> init_rwsem(&zram->init_lock);
> #ifdef CONFIG_ZRAM_WRITEBACK
> + zram->wb_batch_size = 32;
> spin_lock_init(&zram->wb_limit_lock);
> #endif
>
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index 6cee93f9c0d0..1a647f42c1a4 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -129,6 +129,7 @@ struct zram {
> struct file *backing_dev;
> spinlock_t wb_limit_lock;
> bool wb_limit_enable;
> + u32 wb_batch_size;
> u64 bd_wb_limit;
> struct block_device *bdev;
> unsigned long *bitmap;
> --
> 2.52.0.460.gd25c4c69ec-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-24 15:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-22 7:40 [PATCHv6 0/6] zram: introduce writeback bio batching Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 1/6] " Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 2/6] zram: add writeback batch size device attr Sergey Senozhatsky
2025-11-24 15:50 ` Brian Geffon
2025-11-22 7:40 ` [PATCHv6 3/6] zram: take write lock in wb limit store handlers Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 4/6] zram: drop wb_limit_lock Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 5/6] zram: rework bdev block allocation Sergey Senozhatsky
2025-11-22 7:40 ` [PATCHv6 6/6] zram: read slot block idx under slot lock Sergey Senozhatsky
2025-11-22 21:54 ` [PATCHv6 0/6] zram: introduce writeback bio batching Andrew Morton
2025-11-22 23:49 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox