* [RFC PATCH] driver: loop: introduce synchronized read for loop driver
@ 2025-09-22 3:29 zhaoyang.huang
2025-09-22 18:09 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: zhaoyang.huang @ 2025-09-22 3:29 UTC (permalink / raw)
To: Jens Axboe, Ming Lei, linux-mm, linux-block, linux-kernel,
Zhaoyang Huang, steve.kang
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
For now, my android system with per pid memcgv2 setup are suffering
high block_rq_issue to block_rq_complete latency which is actually
introduced by schedule latency of too many kworker threads. By further
investigation, we found that the EAS scheduler which will pack small
load tasks into one CPU core will make this scenario worse. This commit
would like to introduce a way of synchronized read to be helpful on
this scenario. The I2C of loop device's request reduced from 14ms to
2.1ms under fio test.
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
drivers/block/Kconfig | 10 ++++++++++
drivers/block/loop.c | 22 +++++++++++++++++++++-
2 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index df38fb364904..a30d6c5f466e 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -383,4 +383,14 @@ config BLK_DEV_ZONED_LOOP
If unsure, say N.
+config LOOP_SYNC_READ
+ bool "enable synchronized read for loop device"
+ default n
+ help
+ provide a way of synchronized read for loop device which could be
+ helpful when you are concerned with the schedule latency affection
+ over the requests of loop device especially when plenty of blkcgs
+ setup within the system. The loop device should be configured as
+ LO_FLAGS_DIRECT_IO when applying this config.
+
endif # BLK_DEV
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 053a086d547e..1e18abe48d2b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1884,7 +1884,27 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
#endif
}
#endif
- loop_queue_work(lo, cmd);
+#ifdef CONFIG_LOOP_SYNC_READ
+ if (req_op(rq) == REQ_OP_READ && cmd->use_aio && current->plug) {
+ struct blk_plug *plug = current->plug;
+
+ current->plug = NULL;
+ /* iterate through the plug->mq_list and launch the requests to real device */
+ while (rq) {
+ loff_t pos;
+
+ cmd = blk_mq_rq_to_pdu(rq);
+ pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
+ lo_rw_aio(lo, cmd, pos, ITER_DEST);
+ rq = rq_list_pop(&plug->mq_list);
+ }
+ plug->rq_count = 0;
+ current->plug = plug;
+ } else
+ loop_queue_work(lo, cmd);
+#else
+ loop_queue_work(lo, cmd);
+#endif
return BLK_STS_OK;
}
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] driver: loop: introduce synchronized read for loop driver
2025-09-22 3:29 [RFC PATCH] driver: loop: introduce synchronized read for loop driver zhaoyang.huang
@ 2025-09-22 18:09 ` Christoph Hellwig
2025-09-23 3:50 ` Zhaoyang Huang
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-09-22 18:09 UTC (permalink / raw)
To: zhaoyang.huang
Cc: Jens Axboe, Ming Lei, linux-mm, linux-block, linux-kernel,
Zhaoyang Huang, steve.kang
On Mon, Sep 22, 2025 at 11:29:15AM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> For now, my android system with per pid memcgv2 setup are suffering
> high block_rq_issue to block_rq_complete latency which is actually
> introduced by schedule latency of too many kworker threads. By further
> investigation, we found that the EAS scheduler which will pack small
> load tasks into one CPU core will make this scenario worse. This commit
> would like to introduce a way of synchronized read to be helpful on
> this scenario. The I2C of loop device's request reduced from 14ms to
> 2.1ms under fio test.
So fix the scheduler, or create less helper threads, but this work
around really look like fixing the symptoms instead of even trying
to aim for the root cause.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] driver: loop: introduce synchronized read for loop driver
2025-09-22 18:09 ` Christoph Hellwig
@ 2025-09-23 3:50 ` Zhaoyang Huang
2025-09-23 16:30 ` Bart Van Assche
0 siblings, 1 reply; 7+ messages in thread
From: Zhaoyang Huang @ 2025-09-23 3:50 UTC (permalink / raw)
To: Christoph Hellwig
Cc: zhaoyang.huang, Jens Axboe, Ming Lei, linux-mm, linux-block,
linux-kernel, steve.kang
On Tue, Sep 23, 2025 at 2:09 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Sep 22, 2025 at 11:29:15AM +0800, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > For now, my android system with per pid memcgv2 setup are suffering
> > high block_rq_issue to block_rq_complete latency which is actually
> > introduced by schedule latency of too many kworker threads. By further
> > investigation, we found that the EAS scheduler which will pack small
> > load tasks into one CPU core will make this scenario worse. This commit
> > would like to introduce a way of synchronized read to be helpful on
> > this scenario. The I2C of loop device's request reduced from 14ms to
> > 2.1ms under fio test.
>
> So fix the scheduler, or create less helper threads, but this work
> around really look like fixing the symptoms instead of even trying
> to aim for the root cause.
Yes, we have tried to solve this case from the above perspective. As
to the scheduler, packing small tasks to one core(Big core in ARM)
instead of spreading them is desired for power-saving reasons. To the
number of kworker threads, it is upon current design which will create
new work for each blkcg. According to ANDROID's current approach, each
PID takes one cgroup and correspondingly a kworker thread which
actually induces this scenario.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] driver: loop: introduce synchronized read for loop driver
2025-09-23 3:50 ` Zhaoyang Huang
@ 2025-09-23 16:30 ` Bart Van Assche
2025-09-24 9:13 ` Zhaoyang Huang
0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2025-09-23 16:30 UTC (permalink / raw)
To: Zhaoyang Huang, Christoph Hellwig
Cc: zhaoyang.huang, Jens Axboe, Ming Lei, linux-mm, linux-block,
linux-kernel, steve.kang, Minchan Kim
On 9/22/25 8:50 PM, Zhaoyang Huang wrote:
> Yes, we have tried to solve this case from the above perspective. As
> to the scheduler, packing small tasks to one core(Big core in ARM)
> instead of spreading them is desired for power-saving reasons. To the
> number of kworker threads, it is upon current design which will create
> new work for each blkcg. According to ANDROID's current approach, each
> PID takes one cgroup and correspondingly a kworker thread which
> actually induces this scenario.
More cgroups means more overhead from cgroup-internal tasks, e.g.
accumulating statistics. How about requesting to the Android core team
to review the approach of associating one cgroup with each PID? I'm
wondering whether the approach of one cgroup per aggregate profile
(SCHED_SP_BACKGROUND, SCHED_SP_FOREGROUND, ...) would work.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] driver: loop: introduce synchronized read for loop driver
2025-09-23 16:30 ` Bart Van Assche
@ 2025-09-24 9:13 ` Zhaoyang Huang
2025-09-24 10:04 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Zhaoyang Huang @ 2025-09-24 9:13 UTC (permalink / raw)
To: Bart Van Assche, Suren Baghdasaryan, Todd Kjos
Cc: Christoph Hellwig, zhaoyang.huang, Jens Axboe, Ming Lei,
linux-mm, linux-block, linux-kernel, steve.kang, Minchan Kim
loop google kernel team. When active_depth of the cgroupv2 is set to
3, the loop device's request I2C will be affected by schedule latency
which is introduced by huge numbers of kworker thread corresponding to
blkcg for each. What's your opinion on this RFC patch?
On Wed, Sep 24, 2025 at 12:30 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 9/22/25 8:50 PM, Zhaoyang Huang wrote:
> > Yes, we have tried to solve this case from the above perspective. As
> > to the scheduler, packing small tasks to one core(Big core in ARM)
> > instead of spreading them is desired for power-saving reasons. To the
> > number of kworker threads, it is upon current design which will create
> > new work for each blkcg. According to ANDROID's current approach, each
> > PID takes one cgroup and correspondingly a kworker thread which
> > actually induces this scenario.
>
> More cgroups means more overhead from cgroup-internal tasks, e.g.
> accumulating statistics. How about requesting to the Android core team
> to review the approach of associating one cgroup with each PID? I'm
> wondering whether the approach of one cgroup per aggregate profile
> (SCHED_SP_BACKGROUND, SCHED_SP_FOREGROUND, ...) would work.
>
> Thanks,
>
> Bart.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] driver: loop: introduce synchronized read for loop driver
2025-09-24 9:13 ` Zhaoyang Huang
@ 2025-09-24 10:04 ` Ming Lei
2025-09-25 1:14 ` Zhaoyang Huang
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2025-09-24 10:04 UTC (permalink / raw)
To: Zhaoyang Huang
Cc: Bart Van Assche, Suren Baghdasaryan, Todd Kjos,
Christoph Hellwig, zhaoyang.huang, Jens Axboe, linux-mm,
linux-block, linux-kernel, steve.kang, Minchan Kim, Ming Lei
On Wed, Sep 24, 2025 at 5:13 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> loop google kernel team. When active_depth of the cgroupv2 is set to
> 3, the loop device's request I2C will be affected by schedule latency
> which is introduced by huge numbers of kworker thread corresponding to
> blkcg for each. What's your opinion on this RFC patch?
There are some issues on this RFC patch:
- current->plug can't be touched by driver, cause there can be request
from other devices
- you can't sleep in loop_queue_rq()
The following patchset should address your issue, and I can rebase & resend
if no one objects.
https://lore.kernel.org/linux-block/20250322012617.354222-1-ming.lei@redhat.com/
Thanks,
>
> On Wed, Sep 24, 2025 at 12:30 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 9/22/25 8:50 PM, Zhaoyang Huang wrote:
> > > Yes, we have tried to solve this case from the above perspective. As
> > > to the scheduler, packing small tasks to one core(Big core in ARM)
> > > instead of spreading them is desired for power-saving reasons. To the
> > > number of kworker threads, it is upon current design which will create
> > > new work for each blkcg. According to ANDROID's current approach, each
> > > PID takes one cgroup and correspondingly a kworker thread which
> > > actually induces this scenario.
> >
> > More cgroups means more overhead from cgroup-internal tasks, e.g.
> > accumulating statistics. How about requesting to the Android core team
> > to review the approach of associating one cgroup with each PID? I'm
> > wondering whether the approach of one cgroup per aggregate profile
> > (SCHED_SP_BACKGROUND, SCHED_SP_FOREGROUND, ...) would work.
> >
> > Thanks,
> >
> > Bart.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] driver: loop: introduce synchronized read for loop driver
2025-09-24 10:04 ` Ming Lei
@ 2025-09-25 1:14 ` Zhaoyang Huang
0 siblings, 0 replies; 7+ messages in thread
From: Zhaoyang Huang @ 2025-09-25 1:14 UTC (permalink / raw)
To: Ming Lei
Cc: Bart Van Assche, Suren Baghdasaryan, Todd Kjos,
Christoph Hellwig, zhaoyang.huang, Jens Axboe, linux-mm,
linux-block, linux-kernel, steve.kang, Minchan Kim
On Wed, Sep 24, 2025 at 6:05 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Sep 24, 2025 at 5:13 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> >
> > loop google kernel team. When active_depth of the cgroupv2 is set to
> > 3, the loop device's request I2C will be affected by schedule latency
> > which is introduced by huge numbers of kworker thread corresponding to
> > blkcg for each. What's your opinion on this RFC patch?
>
> There are some issues on this RFC patch:
>
> - current->plug can't be touched by driver, cause there can be request
> from other devices
>
> - you can't sleep in loop_queue_rq()
>
> The following patchset should address your issue, and I can rebase & resend
> if no one objects.
>
> https://lore.kernel.org/linux-block/20250322012617.354222-1-ming.lei@redhat.com/
Thanks for the patch, that is what I want.
>
> Thanks,
>
>
> >
> > On Wed, Sep 24, 2025 at 12:30 AM Bart Van Assche <bvanassche@acm.org> wrote:
> > >
> > > On 9/22/25 8:50 PM, Zhaoyang Huang wrote:
> > > > Yes, we have tried to solve this case from the above perspective. As
> > > > to the scheduler, packing small tasks to one core(Big core in ARM)
> > > > instead of spreading them is desired for power-saving reasons. To the
> > > > number of kworker threads, it is upon current design which will create
> > > > new work for each blkcg. According to ANDROID's current approach, each
> > > > PID takes one cgroup and correspondingly a kworker thread which
> > > > actually induces this scenario.
> > >
> > > More cgroups means more overhead from cgroup-internal tasks, e.g.
> > > accumulating statistics. How about requesting to the Android core team
> > > to review the approach of associating one cgroup with each PID? I'm
> > > wondering whether the approach of one cgroup per aggregate profile
> > > (SCHED_SP_BACKGROUND, SCHED_SP_FOREGROUND, ...) would work.
> > >
> > > Thanks,
> > >
> > > Bart.
> >
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-25 1:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-22 3:29 [RFC PATCH] driver: loop: introduce synchronized read for loop driver zhaoyang.huang
2025-09-22 18:09 ` Christoph Hellwig
2025-09-23 3:50 ` Zhaoyang Huang
2025-09-23 16:30 ` Bart Van Assche
2025-09-24 9:13 ` Zhaoyang Huang
2025-09-24 10:04 ` Ming Lei
2025-09-25 1:14 ` Zhaoyang Huang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox