Re: [RFC PATCH v2 2/5] iomap: Add initial support for buffered RWF_WRITETHROUGH

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	djwong@kernel.org, john.g.garry@oracle.com, willy@infradead.org,
	hch@lst.de, ritesh.list@gmail.com,
	Luis Chamberlain <mcgrof@kernel.org>,
	dgc@kernel.org, tytso@mit.edu, p.raghav@samsung.com,
	andres@anarazel.de, brauner@kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v2 2/5] iomap: Add initial support for buffered RWF_WRITETHROUGH
Date: Tue, 21 Apr 2026 23:37:01 +0530	[thread overview]
Message-ID: <aee8xaz_5x-aQ5BK@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com> (raw)
In-Reply-To: <laegs2jt5ifqmfmdlsl7gsrc64z4nrnwbxazewbr7jk27emdej@l6vhj7pjh3dy>

On Mon, Apr 20, 2026 at 01:28:18PM +0200, Jan Kara wrote:
> On Sat 18-04-26 01:12:22, Ojaswin Mujoo wrote:
> > On Thu, Apr 16, 2026 at 02:34:15PM +0200, Jan Kara wrote:
> > > > @@ -1096,6 +1097,276 @@ static bool iomap_write_end(struct iomap_iter *iter, size_t len, size_t copied,
> > > > +static int iomap_writethrough_iter(struct iomap_writethrough_ctx *wt_ctx,
> > > > +				   struct iomap_iter *iter, struct iov_iter *i,
> > > > +				   const struct iomap_writethrough_ops *wt_ops)
> > > > +
> > > > +{

<...>

> > your comment but) after this email, I started diggin a bit more into why
> > it is needed.  As per my understanding, it tackles 2 things:
> > 
> > Problem 1. mkclean's the old EOF folio so that the FS can fault again. This
> > allows us to allocate new blocks which previously might not be allocated
> > if bs < ps.
> > 
> > Problem 2. Since mmap writes can dirty data beyond EOF, we zero the range from
> > old EOF to end of that folio so that readers dont read junk data after
> > isize extension.
> 
> Correct.
> 
> > Another thing I noticed is that most users of
> > iomap_file_buffered_write() do their own eof zeroing in the FS layer
> > (eg, xfs_file_write_zero_eof(), ext4's new changes,
> > ntfs_extend_initialized_size() etc). 
> > I think this FS level zerooing should take care of mkcleaning the eof
> > folio (problem 1), as they call iomap_zero_range() which would flush the
> > eof range anyways. So am I right in assuming that for FSes that do their
> > own zeroing, 1. is already taken care of?
> 
> Well, I don't see anything that would writeprotect the old tail page in
> iomap_zero_range(). I think iomap_zero_range() calls are there mostly to
> address 2. Not only due to mmap but also possibly to clear whatever junk
> there can be in the blocks after EOF.

Well I was thinking more like if the EOF page was mmap'd it would be
dirty and blocks beyond EOF would be unmapped, so iomap_zero_range() will
write it back which shall mkclean() the folio.

But I think the same race we discussed for problem 2 can also occur
here.

Thread 1 (extending write)             Thread 2 (mmap writer)

iomap_zero_range()
filemap_write_and_wait_range()
                                       // mmaps & writes EOF range
iomap_write_iter()
isize = new_size
// pagecache_isize_extended() is
   needed to mkclean() old EOF page.

>
> > As for 2, I think after the EOF zeroing of the FS, there might be a
> > window before iomap_write_iter() where an mmap writer can still dirty
> > EOF blocks, hence the pagecache_isize_extended() would be needed here.
> > But doesn't that then make the eof zeroing in the FS layer redundant? Am
> > I missing something here?
> 
> Hmm, I agree the zeroing looks duplicit (for some users of
> pagecache_isize_extended()). And yes, doing the zeroing from
> xfs_file_write_zero_eof() is somewhat racy (mmap writer can still come and
> write non-zeros before we update i_size) but I'd have hard time to argue it
> really practically matters - you are racing mmap writes with buffered
> writes so any kind of write atomicity guarantees are not there. 

Yeah, seems like it is not enough to take care of either 1 or 2 and
pagecache_isize_extended() should maybe be enough. I was just wondering
if we could optimize it away even for normal extend path (no racing mmap),
we can avoid the expensive folio_zero_range() calls.

Regardless, Ive not looked at this more closely and its a separate issue
so we can revisit it later. For now I wanted some clarity around
pagecache_isize_extended() so thanks for that.

> 
> > Regardless, for our case I think we will also need to do the
> > pagecache_isize_extended(), mainly to take care of problem 2, but where
> > exactly should we do it now?  We currently change the isize in endio()
> > but for aio, it can run outside inode or folio lock. I think this
> > function needs to be called under inode lock(). Hmm.. its a bit late here so
> > I'll revisit this tomorrow with a fresh mind :)
> 
> I think mainly to take care of problem 1... You are correct about
> inode_lock but since we are updating i_size, we should be better holding
> it, shouldn't we?

Yes you are correct. In the aio writethrough codepath, the inode update
is happening without the inode lock which is wrong. I overlooked the
fact that even aio dio uses IOMAP_DIO_FORCE_WAIT to force isize update
under inode lock, and we should do something similar as well.

So in v3, I make the change that for extending writes we shall always
finish them in "sync" fashion so ->endio() runs under inode lock. Then,
after ->endio() in iomap_dio_complete(), I will call
pagecache_isize_extended() to take care of this. Just like isize update
right now, the isize_extension only runs when the IO was successful
otherwise we return an error to the user. This gives us semantics like
dio while handling extension properly.

Does that sound okay?

Regards,
ojaswin

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

next prev parent reply	other threads:[~2026-04-21 18:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 18:45 [RFC PATCH v2 0/5] Add buffered write-through support to iomap & xfs Ojaswin Mujoo
2026-04-08 18:45 ` [RFC PATCH v2 1/5] mm: Refactor folio_clear_dirty_for_io() Ojaswin Mujoo
2026-04-15  6:14   ` Christoph Hellwig
2026-04-08 18:45 ` [RFC PATCH v2 2/5] iomap: Add initial support for buffered RWF_WRITETHROUGH Ojaswin Mujoo
2026-04-16 12:05   ` Jan Kara
2026-04-16 12:34   ` Jan Kara
2026-04-17 19:42     ` Ojaswin Mujoo
2026-04-20 11:28       ` Jan Kara
2026-04-21 18:07         ` Ojaswin Mujoo [this message]
2026-04-17  4:13   ` Pankaj Raghav (Samsung)
2026-04-18  7:33     ` Ojaswin Mujoo
2026-04-20 11:56   ` Pankaj Raghav (Samsung)
2026-04-21 18:15     ` Ojaswin Mujoo
2026-04-08 18:45 ` [RFC PATCH v2 3/5] xfs: Add RWF_WRITETHROUGH support to xfs Ojaswin Mujoo
2026-04-08 18:45 ` [RFC PATCH v2 4/5] iomap: Add aio support to RWF_WRITETHROUGH Ojaswin Mujoo
2026-04-08 18:45 ` [RFC PATCH v2 5/5] iomap: Add DSYNC support to writethrough Ojaswin Mujoo
2026-04-17  3:54 ` [RFC PATCH v2 0/5] Add buffered write-through support to iomap & xfs Pankaj Raghav (Samsung)
2026-04-18  7:26   ` Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aee8xaz_5x-aQ5BK@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com \
    --to=ojaswin@linux.ibm.com \
    --cc=andres@anarazel.de \
    --cc=brauner@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox