linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tal Zussman <tz2294@columbia.edu>
To: Dave Chinner <dgc@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Carlos Maiolino <cem@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH RFC v4 1/3] block: add BIO_COMPLETE_IN_TASK for task-context completion
Date: Wed, 8 Apr 2026 14:50:43 -0400	[thread overview]
Message-ID: <9fa7a89e-2461-4c58-a2df-d61af60b1b42@columbia.edu> (raw)
In-Reply-To: <acRE8pp6HxjK7Z74@dread>



On 3/25/26 4:26 PM, Dave Chinner wrote:
> On Wed, Mar 25, 2026 at 02:43:00PM -0400, Tal Zussman wrote:
>> Some bio completion handlers need to run in task context but bio_endio()
>> can be called from IRQ context (e.g. buffer_head writeback). Add a
>> BIO_COMPLETE_IN_TASK flag that bio submitters can set to request
>> task-context completion of their bi_end_io callback.
>> 
>> When bio_endio() sees this flag and is running in non-task context, it
>> queues the bio to a per-cpu list and schedules a work item to call
>> bi_end_io() from task context. A CPU hotplug dead callback drains any
>> remaining bios from the departing CPU's batch.
>> 
>> This will be used to enable RWF_DONTCACHE for block devices, and could
>> be used for other subsystems like fscrypt that need task-context bio
>> completion.
>> 
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Signed-off-by: Tal Zussman <tz2294@columbia.edu>
>> ---
>>  block/bio.c               | 84 ++++++++++++++++++++++++++++++++++++++++++++++-
>>  include/linux/blk_types.h |  1 +
>>  2 files changed, 84 insertions(+), 1 deletion(-)
>> 
>> diff --git a/block/bio.c b/block/bio.c
>> index 8203bb7455a9..69ee0d93041f 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/highmem.h>
>>  #include <linux/blk-crypto.h>
>>  #include <linux/xarray.h>
>> +#include <linux/local_lock.h>
>>  
>>  #include <trace/events/block.h>
>>  #include "blk.h"
>> @@ -1714,6 +1715,60 @@ void bio_check_pages_dirty(struct bio *bio)
>>  }
>>  EXPORT_SYMBOL_GPL(bio_check_pages_dirty);
>>  
>> +struct bio_complete_batch {
>> +	local_lock_t lock;
>> +	struct bio_list list;
>> +	struct work_struct work;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct bio_complete_batch, bio_complete_batch) = {
>> +	.lock = INIT_LOCAL_LOCK(lock),
>> +};
>> +
>> +static void bio_complete_work_fn(struct work_struct *w)
>> +{
>> +	struct bio_complete_batch *batch;
>> +	struct bio_list list;
>> +
>> +again:
>> +	local_lock_irq(&bio_complete_batch.lock);
>> +	batch = this_cpu_ptr(&bio_complete_batch);
>> +	list = batch->list;
>> +	bio_list_init(&batch->list);
>> +	local_unlock_irq(&bio_complete_batch.lock);
> 
> This is just a FIFO processing queue, and it is so wanting to be a
> struct llist for lockless queuing and dequeueing.
> 
> We do this lockless per-cpu queue + per-cpu workqueue in XFS for
> background inode GC processing. See struct xfs_inodegc and all the
> xfs_inodegc_*() functions - it may be useful to have a generic
> lockless per-cpu queue processing so we don't keep open coding this
> repeating pattern everywhere.
> 
>> +
>> +	while (!bio_list_empty(&list)) {
>> +		struct bio *bio = bio_list_pop(&list);
>> +		bio->bi_end_io(bio);
>> +	}
>> +
>> +	local_lock_irq(&bio_complete_batch.lock);
>> +	batch = this_cpu_ptr(&bio_complete_batch);
>> +	if (!bio_list_empty(&batch->list)) {
>> +		local_unlock_irq(&bio_complete_batch.lock);
>> +
>> +		if (!need_resched())
>> +			goto again;
>> +
>> +		schedule_work_on(smp_processor_id(), &batch->work);
> 
> We've learnt that immediately scheduling per-cpu batch
> processing work can cause context switch storms as the queue/dequeue
> steps one work item at a time.
> 
> Hence we use a delayed work with a scheduling delay of a singel
> jiffie to allow batches of queue work from a single context to
> complete before (potentially) being pre-empted by the per-cpu
> kworker task that will process the queue...
> 
>> +		return;
>> +	}
>> +	local_unlock_irq(&bio_complete_batch.lock);
>> +}
>> +
>> +static void bio_queue_completion(struct bio *bio)
>> +{
>> +	struct bio_complete_batch *batch;
>> +	unsigned long flags;
>> +
>> +	local_lock_irqsave(&bio_complete_batch.lock, flags);
>> +	batch = this_cpu_ptr(&bio_complete_batch);
>> +	bio_list_add(&batch->list, bio);
>> +	local_unlock_irqrestore(&bio_complete_batch.lock, flags);
>> +
>> +	schedule_work_on(smp_processor_id(), &batch->work);
>> +}
> 
> Yeah, we definitely want to queue all the pending bio completions
> the interrupt is delivering before we run the batch processing...
> 
>> +
>>  static inline bool bio_remaining_done(struct bio *bio)
>>  {
>>  	/*
>> @@ -1788,7 +1843,9 @@ void bio_endio(struct bio *bio)
>>  	}
>>  #endif
>>  
>> -	if (bio->bi_end_io)
>> +	if (!in_task() && bio_flagged(bio, BIO_COMPLETE_IN_TASK))
>> +		bio_queue_completion(bio);
>> +	else if (bio->bi_end_io)
>>  		bio->bi_end_io(bio);
>>  }
>>  EXPORT_SYMBOL(bio_endio);
>> @@ -1974,6 +2031,21 @@ int bioset_init(struct bio_set *bs,
>>  }
>>  EXPORT_SYMBOL(bioset_init);
>>  
>> +/*
>> + * Drain a dead CPU's deferred bio completions. The CPU is dead so no locking
>> + * is needed -- no new bios will be queued to it.
>> + */
>> +static int bio_complete_batch_cpu_dead(unsigned int cpu)
>> +{
>> +	struct bio_complete_batch *batch = per_cpu_ptr(&bio_complete_batch, cpu);
>> +	struct bio *bio;
>> +
>> +	while ((bio = bio_list_pop(&batch->list)))
>> +		bio->bi_end_io(bio);
>> +
>> +	return 0;
>> +}
> 
> If you use a llist for the queue, this code is no different to the
> normal processing work.
> 

Thanks Dave, these suggestions + the pointer to the XFS GC code were helpful.
I'll incorporate them into the next version.

- Tal



  parent reply	other threads:[~2026-04-08 18:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25 18:42 [PATCH RFC v4 0/3] block: enable RWF_DONTCACHE for block devices Tal Zussman
2026-03-25 18:43 ` [PATCH RFC v4 1/3] block: add BIO_COMPLETE_IN_TASK for task-context completion Tal Zussman
2026-03-25 19:54   ` Matthew Wilcox
2026-03-25 20:14   ` Jens Axboe
2026-04-08 18:48     ` Tal Zussman
2026-04-08 19:51       ` Jens Axboe
2026-04-08 22:51         ` Tal Zussman
2026-03-25 20:26   ` Dave Chinner
2026-03-25 20:39     ` Matthew Wilcox
2026-03-26  2:44       ` Dave Chinner
2026-04-08 18:50     ` Tal Zussman [this message]
2026-03-25 21:03   ` Bart Van Assche
2026-03-26  3:18     ` Dave Chinner
2026-03-27  6:01   ` Christoph Hellwig
2026-04-08 19:35     ` Tal Zussman
2026-03-25 18:43 ` [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback Tal Zussman
2026-03-25 20:21   ` Matthew Wilcox
2026-03-27  6:03     ` Christoph Hellwig
2026-04-08 19:36     ` Tal Zussman
2026-04-08 19:44     ` Tal Zussman
2026-04-08 20:01       ` Matthew Wilcox
2026-04-08 20:10         ` Tal Zussman
2026-03-25 20:34   ` Dave Chinner
2026-03-27  6:08     ` Christoph Hellwig
2026-03-27  6:24       ` Gao Xiang
2026-03-27  6:27         ` Christoph Hellwig
2026-03-27  6:45           ` Gao Xiang
2026-03-25 18:43 ` [PATCH RFC v4 3/3] block: enable RWF_DONTCACHE for block devices Tal Zussman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fa7a89e-2461-4c58-a2df-d61af60b1b42@columbia.edu \
    --to=tz2294@columbia.edu \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox