linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Jeff Layton <jlayton@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Mike Snitzer <snitzer@kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	Chuck Lever <chuck.lever@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 1/3] mm: kick writeback flusher instead of inline flush for IOCB_DONTCACHE
Date: Fri, 17 Apr 2026 08:25:52 +0530	[thread overview]
Message-ID: <ik9qthvr.ritesh.list@gmail.com> (raw)
In-Reply-To: <52b81c4d1fb2ad0e07b3b3b4dfbd3d36e8ee3e7d.camel@kernel.org>

Jeff Layton <jlayton@kernel.org> writes:

> On Thu, 2026-04-09 at 07:10 +0530, Ritesh Harjani wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>> 
>> > The IOCB_DONTCACHE writeback path in generic_write_sync() calls
>> > filemap_flush_range() on every write, submitting writeback inline in
>> > the writer's context.  Perf lock contention profiling shows the
>> > performance problem is not lock contention but the writeback submission
>> > work itself — walking the page tree and submitting I/O blocks the
>> > writer for milliseconds, inflating p99.9 latency from 23ms (buffered)
>> > to 93ms (dontcache).
>> > 
>> > Replace the inline filemap_flush_range() call with a
>> > wakeup_flusher_threads_bdi() call that kicks the BDI's flusher thread
>> > to drain dirty pages in the background.  This moves writeback
>> > submission completely off the writer's hot path.  The flusher thread
>> > handles writeback asynchronously, naturally coalescing and rate-limiting
>> > I/O without any explicit skip-if-busy or dirty pressure checks.
>> > 
>> 
>> Thanks Jeff for explaining this. It make sense now.
>> 
>> 
>> > Add WB_REASON_DONTCACHE as a new writeback reason for tracing
>> > visibility.
>> > 
>> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > ---
>> >  fs/fs-writeback.c                | 14 ++++++++++++++
>> >  include/linux/backing-dev-defs.h |  1 +
>> >  include/linux/fs.h               |  6 ++----
>> >  include/trace/events/writeback.h |  3 ++-
>> >  4 files changed, 19 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> > index 3c75ee025bda..88dc31388a31 100644
>> > --- a/fs/fs-writeback.c
>> > +++ b/fs/fs-writeback.c
>> > @@ -2466,6 +2466,20 @@ void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
>> >  	rcu_read_unlock();
>> >  }
>> >  
>> > +/**
>> > + * filemap_dontcache_kick_writeback - kick flusher for IOCB_DONTCACHE writes
>> > + * @mapping:	address_space that was just written to
>> > + *
>> > + * Wake the BDI flusher thread to start writeback of dirty pages in the
>> > + * background.
>> > + */
>> > +void filemap_dontcache_kick_writeback(struct address_space *mapping)
>> 
>> This api gives a wrong sense that we are kicking writeback to write
>> dirty pages which belongs to only this inode's address space mapping.
>> But instead we are starting wb for everything on the respective bdi.
>> 
>> So instead why not just export symbol for wakeup_flusher_threads_bdi()
>> and use it instead?
>> 
>> If not, then IMO at least making it... 
>>    filemap_kick_writeback_all(mapping, enum wb_reason)
>> 
>> ... might be better.
>
> I did draft up a version of this -- adding a way to tell the flusher
> thread to only flush a single inode. The performance is better than
> today's DONTCACHE, but was worse than just kicking the flusher thread.
>
> I think we're probably better off not doing this because we lose some
> batching opportunities by trying to force out a single inode's pages
> rather than allowing the thread to do its thing.
>

So, if I understood it correctly, Christoph might be talking about a
different approach here.
Instead of kicking flusher thread to writeback pages for a single inode,
if we can track the number of dontcache pages
(get_nr_dontcache_pages()), then we can kick the flusher for those many
target pages. I think this way we are still reducing the dirty page
cache pressure - the problem which RWF_DONTCACHE is supposed to solve.
But I guess, that doesn't necessarily always mean that only dontcache
marked folios will get written.

If we implement that then, this should still help with the batching
problem you mentioned and hopefully should not cause a major regression
for the workload which Jan mentioned.

Feel free to correct my understanding here please.

-ritesh


  reply	other threads:[~2026-04-17  3:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 14:25 [PATCH v2 0/3] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-08 14:25 ` [PATCH v2 1/3] mm: kick writeback flusher instead of inline flush for IOCB_DONTCACHE Jeff Layton
2026-04-09  1:40   ` Ritesh Harjani
2026-04-09  5:52     ` Christoph Hellwig
2026-04-16 22:49     ` Jeff Layton
2026-04-17  2:55       ` Ritesh Harjani [this message]
2026-04-09  5:50   ` Christoph Hellwig
2026-04-09  7:21     ` Jan Kara
2026-04-09 14:21       ` Christoph Hellwig
2026-04-10 10:41         ` Jan Kara
2026-04-10 11:05           ` Jeff Layton
2026-04-08 14:25 ` [PATCH v2 2/3] testing: add nfsd-io-bench NFS server benchmark suite Jeff Layton
2026-04-08 14:25 ` [PATCH v2 3/3] testing: add dontcache-bench local filesystem " Jeff Layton
2026-04-08 18:45 ` [PATCH v2 0/3] mm: improve write performance with RWF_DONTCACHE Jeff Layton
2026-04-09  6:06   ` Christoph Hellwig
2026-04-09  6:05 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ik9qthvr.ritesh.list@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@kernel.org \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=snitzer@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox