* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write [not found] ` <20230601145904.1385409-4-hch@lst.de> @ 2023-08-27 19:41 ` Al Viro 2023-08-27 21:45 ` Al Viro ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Al Viro @ 2023-08-27 19:41 UTC (permalink / raw) To: Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Thu, Jun 01, 2023 at 04:58:55PM +0200, Christoph Hellwig wrote: > All callers of generic_perform_write need to updated ki_pos, move it into > common code. > @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > endbyte = pos + status - 1; > err = filemap_write_and_wait_range(mapping, pos, endbyte); > if (err == 0) { > - iocb->ki_pos = endbyte + 1; > written += status; > invalidate_mapping_pages(mapping, > pos >> PAGE_SHIFT, > @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > } > } else { > written = generic_perform_write(iocb, from); > - if (likely(written > 0)) > - iocb->ki_pos += written; > } > out: > return written ? written : err; [another late reply, sorry] That part is somewhat fishy - there's a case where you return a positive value and advance ->ki_pos by more than that amount. I really wonder if all callers of ->write_iter() are OK with that. Consider e.g. this: ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) { struct fd f = fdget_pos(fd); ssize_t ret = -EBADF; if (f.file) { loff_t pos, *ppos = file_ppos(f.file); if (ppos) { pos = *ppos; ppos = &pos; } ret = vfs_write(f.file, buf, count, ppos); if (ret >= 0 && ppos) f.file->f_pos = pos; fdput_pos(f); } return ret; } ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { ssize_t ret; if (!(file->f_mode & FMODE_WRITE)) return -EBADF; if (!(file->f_mode & FMODE_CAN_WRITE)) return -EINVAL; if (unlikely(!access_ok(buf, count))) return -EFAULT; ret = rw_verify_area(WRITE, file, pos, count); if (ret) return ret; if (count > MAX_RW_COUNT) count = MAX_RW_COUNT; file_start_write(file); if (file->f_op->write) ret = file->f_op->write(file, buf, count, pos); else if (file->f_op->write_iter) ret = new_sync_write(file, buf, count, pos); else ret = -EINVAL; if (ret > 0) { fsnotify_modify(file); add_wchar(current, ret); } inc_syscw(current); file_end_write(file); return ret; } static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos) { struct kiocb kiocb; struct iov_iter iter; ssize_t ret; init_sync_kiocb(&kiocb, filp); kiocb.ki_pos = (ppos ? *ppos : 0); iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len); ret = call_write_iter(filp, &kiocb, &iter); BUG_ON(ret == -EIOCBQUEUED); if (ret > 0 && ppos) *ppos = kiocb.ki_pos; return ret; } Suppose ->write_iter() ends up doing returning a positive value smaller than the increment of kiocb.ki_pos. What do we get? ret is positive, so kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there we copy it into file->f_pos. Is it really OK to have write() return 4096 and advance the file position by 16K? AFAICS, userland wouldn't get any indication of something odd going on - just a short write to a regular file, with followup write of remaining 12K getting quietly written in the range 16K..28K. I don't remember what POSIX says about that, but it would qualify as nasty surprise for any userland program - sure, one can check fsync() results before closing the sucker and see if everything looks fine, but the way it's usually discussed could easily lead to assumption that (synchronous) O_DIRECT writes would not be affected by anything of that sort. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-27 19:41 ` [PATCH 03/12] filemap: update ki_pos in generic_perform_write Al Viro @ 2023-08-27 21:45 ` Al Viro 2023-08-28 12:32 ` Christoph Hellwig 2023-09-13 11:00 ` Christoph Hellwig 2023-08-28 1:04 ` Al Viro 2023-08-28 12:30 ` Christoph Hellwig 2 siblings, 2 replies; 11+ messages in thread From: Al Viro @ 2023-08-27 21:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote: > On Thu, Jun 01, 2023 at 04:58:55PM +0200, Christoph Hellwig wrote: > > All callers of generic_perform_write need to updated ki_pos, move it into > > common code. > > > @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > > endbyte = pos + status - 1; > > err = filemap_write_and_wait_range(mapping, pos, endbyte); > > if (err == 0) { > > - iocb->ki_pos = endbyte + 1; > > written += status; > > invalidate_mapping_pages(mapping, > > pos >> PAGE_SHIFT, > > @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > > } > > } else { > > written = generic_perform_write(iocb, from); > > - if (likely(written > 0)) > > - iocb->ki_pos += written; > > } > > out: > > return written ? written : err; > > [another late reply, sorry] > > That part is somewhat fishy - there's a case where you return a positive value > and advance ->ki_pos by more than that amount. I really wonder if all callers > of ->write_iter() are OK with that. Consider e.g. this: > > ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count) > { > struct fd f = fdget_pos(fd); > ssize_t ret = -EBADF; > > if (f.file) { > loff_t pos, *ppos = file_ppos(f.file); > if (ppos) { > pos = *ppos; > ppos = &pos; > } > ret = vfs_write(f.file, buf, count, ppos); > if (ret >= 0 && ppos) > f.file->f_pos = pos; > fdput_pos(f); > } > > return ret; > } > > ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) > { > ssize_t ret; > > if (!(file->f_mode & FMODE_WRITE)) > return -EBADF; > if (!(file->f_mode & FMODE_CAN_WRITE)) > return -EINVAL; > if (unlikely(!access_ok(buf, count))) > return -EFAULT; > > ret = rw_verify_area(WRITE, file, pos, count); > if (ret) > return ret; > if (count > MAX_RW_COUNT) > count = MAX_RW_COUNT; > file_start_write(file); > if (file->f_op->write) > ret = file->f_op->write(file, buf, count, pos); > else if (file->f_op->write_iter) > ret = new_sync_write(file, buf, count, pos); > else > ret = -EINVAL; > if (ret > 0) { > fsnotify_modify(file); > add_wchar(current, ret); > } > inc_syscw(current); > file_end_write(file); > return ret; > } > > static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos) > { > struct kiocb kiocb; > struct iov_iter iter; > ssize_t ret; > > init_sync_kiocb(&kiocb, filp); > kiocb.ki_pos = (ppos ? *ppos : 0); > iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len); > > ret = call_write_iter(filp, &kiocb, &iter); > BUG_ON(ret == -EIOCBQUEUED); > if (ret > 0 && ppos) > *ppos = kiocb.ki_pos; > return ret; > } > > Suppose ->write_iter() ends up doing returning a positive value smaller than > the increment of kiocb.ki_pos. What do we get? ret is positive, so > kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there > we copy it into file->f_pos. > > Is it really OK to have write() return 4096 and advance the file position > by 16K? AFAICS, userland wouldn't get any indication of something > odd going on - just a short write to a regular file, with followup write > of remaining 12K getting quietly written in the range 16K..28K. > > I don't remember what POSIX says about that, but it would qualify as > nasty surprise for any userland program - sure, one can check fsync() > results before closing the sucker and see if everything looks fine, > but the way it's usually discussed could easily lead to assumption that > (synchronous) O_DIRECT writes would not be affected by anything of that > sort. IOW, I suspect that the right thing to do would be something along the lines of direct_write_fallback(): on error revert the ->ki_pos update from buffered write If we fail filemap_write_and_wait_range() on the range the buffered write went into, we only report the "number of bytes which we direct-written", to quote the comment in there. Which is fine, but buffered write has already advanced iocb->ki_pos, so we need to roll that back. Otherwise we end up with e.g. write(2) advancing position by more than the amount it reports having written. Fixes: 182c25e9c157 "filemap: update ki_pos in generic_perform_write" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- diff --git a/fs/libfs.c b/fs/libfs.c index 5b851315eeed..712c57828c0e 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1646,6 +1646,7 @@ ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter, * We don't know how much we wrote, so just return the number of * bytes which were direct-written */ + iocb->ki_pos -= buffered_written; if (direct_written) return direct_written; return err; ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-27 21:45 ` Al Viro @ 2023-08-28 12:32 ` Christoph Hellwig 2023-08-28 13:56 ` Al Viro 2023-09-13 11:00 ` Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2023-08-28 12:32 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote: > IOW, I suspect that the right thing to do would be something along the lines > of The idea looks sensible to me, but we'll also need to do it for the filemap_write_and_wait_range failure case. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-28 12:32 ` Christoph Hellwig @ 2023-08-28 13:56 ` Al Viro 2023-08-28 14:15 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Al Viro @ 2023-08-28 13:56 UTC (permalink / raw) To: Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Mon, Aug 28, 2023 at 02:32:59PM +0200, Christoph Hellwig wrote: > On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote: > > IOW, I suspect that the right thing to do would be something along the lines > > of > > The idea looks sensible to me, but we'll also need to do it for the > filemap_write_and_wait_range failure case. Huh? That's precisely where this patch is doing that... That function in mainline is if (unlikely(buffered_written < 0)) { if (direct_written) return direct_written; return buffered_written; } /* * We need to ensure that the page cache pages are written to disk and * invalidated to preserve the expected O_DIRECT semantics. */ err = filemap_write_and_wait_range(mapping, pos, end); if (err < 0) { /* * We don't know how much we wrote, so just return the number of * bytes which were direct-written */ if (direct_written) return direct_written; return err; } invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT); return direct_written + buffered_written; The first failure exit does not need any work - the caller had not bumped ->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's where the patch upthread adds iocb->ki_pos -= buffered_written. Or am I completely misparsing what you've written? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-28 13:56 ` Al Viro @ 2023-08-28 14:15 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2023-08-28 14:15 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Mon, Aug 28, 2023 at 02:56:15PM +0100, Al Viro wrote: > The first failure exit does not need any work - the caller had not bumped > ->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's > where the patch upthread adds iocb->ki_pos -= buffered_written. > > Or am I completely misparsing what you've written? No, I misread the patch. Looks good: Acked-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-27 21:45 ` Al Viro 2023-08-28 12:32 ` Christoph Hellwig @ 2023-09-13 11:00 ` Christoph Hellwig 2023-09-13 16:33 ` Christian Brauner 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2023-09-13 11:00 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke > direct_write_fallback(): on error revert the ->ki_pos update from buffered write Al, Christian: can you send this fix on top Linus? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-09-13 11:00 ` Christoph Hellwig @ 2023-09-13 16:33 ` Christian Brauner 0 siblings, 0 replies; 11+ messages in thread From: Christian Brauner @ 2023-09-13 16:33 UTC (permalink / raw) To: Christoph Hellwig Cc: Al Viro, Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Wed, Sep 13, 2023 at 01:00:10PM +0200, Christoph Hellwig wrote: > > direct_write_fallback(): on error revert the ->ki_pos update from buffered write > > Al, Christian: can you send this fix on top Linus? Wasn't aware of this, sorry. I've picked it up and placed it with another set of small fixes I already have. I'm happy to have Al take it ofc. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-27 19:41 ` [PATCH 03/12] filemap: update ki_pos in generic_perform_write Al Viro 2023-08-27 21:45 ` Al Viro @ 2023-08-28 1:04 ` Al Viro 2023-08-28 12:30 ` Christoph Hellwig 2 siblings, 0 replies; 11+ messages in thread From: Al Viro @ 2023-08-28 1:04 UTC (permalink / raw) To: Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote: > That part is somewhat fishy - there's a case where you return a positive value > and advance ->ki_pos by more than that amount. I really wonder if all callers > of ->write_iter() are OK with that. Speaking of which, in case of negative return value we'd better *not* use ->ki_pos; consider e.g. generic_file_write_iter() with O_DSYNC and vfs_fsync_range() failure. An error gets returned, but ->ki_pos is left advanced. Normal write(2) is fine - it will only update file->f_pos if ->write_iter() has returned a non-negative. However, io_uring kiocb_done() starts with if (req->flags & REQ_F_CUR_POS) req->file->f_pos = rw->kiocb.ki_pos; if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) { if (!__io_complete_rw_common(req, ret)) { /* * Safe to call io_end from here as we're inline * from the submission path. */ io_req_io_end(req); io_req_set_res(req, final_ret, io_put_kbuf(req, issue_flags)); return IOU_OK; } } else { io_rw_done(&rw->kiocb, ret); } Note that ->f_pos update is *NOT* conditional upon ret >= 0 - it happens no matter what, provided that original request had ->kiocb.ki_pos equal to -1 (on a non-FMODE_STREAM file). Jens, is there any reason for doing that unconditionally? IMO it's a bad idea - there's a wide scope for fuckups that way, especially since write(2) is not sensitive to that and this use of -1 ki_pos is not particularly encouraged on io_uring side either, AFAICT. Worse, it's handling of failure exits in the first place, which already gets little testing... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-27 19:41 ` [PATCH 03/12] filemap: update ki_pos in generic_perform_write Al Viro 2023-08-27 21:45 ` Al Viro 2023-08-28 1:04 ` Al Viro @ 2023-08-28 12:30 ` Christoph Hellwig 2023-08-28 14:02 ` Al Viro 2 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2023-08-28 12:30 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote: > That part is somewhat fishy - there's a case where you return a positive value > and advance ->ki_pos by more than that amount. I really wonder if all callers > of ->write_iter() are OK with that. Consider e.g. this: This should not exist in the latest version merged by Jens. Can you check if you still see issues in the version in the block tree or linux-next. > Suppose ->write_iter() ends up doing returning a positive value smaller than > the increment of kiocb.ki_pos. What do we get? ret is positive, so > kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there > we copy it into file->f_pos. > > Is it really OK to have write() return 4096 and advance the file position > by 16K? AFAICS, userland wouldn't get any indication of something > odd going on - just a short write to a regular file, with followup write > of remaining 12K getting quietly written in the range 16K..28K. > > I don't remember what POSIX says about that, but it would qualify as > nasty surprise for any userland program - sure, one can check fsync() > results before closing the sucker and see if everything looks fine, > but the way it's usually discussed could easily lead to assumption that > (synchronous) O_DIRECT writes would not be affected by anything of that > sort. ki_pos should always be updated by the write return value. Everything else is a bug. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-08-28 12:30 ` Christoph Hellwig @ 2023-08-28 14:02 ` Al Viro 0 siblings, 0 replies; 11+ messages in thread From: Al Viro @ 2023-08-28 14:02 UTC (permalink / raw) To: Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Xiubo Li, Ilya Dryomov, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke On Mon, Aug 28, 2023 at 02:30:23PM +0200, Christoph Hellwig wrote: > On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote: > > That part is somewhat fishy - there's a case where you return a positive value > > and advance ->ki_pos by more than that amount. I really wonder if all callers > > of ->write_iter() are OK with that. Consider e.g. this: > > This should not exist in the latest version merged by Jens. Can you > check if you still see issues in the version in the block tree or > linux-next. Still does - the problem has migrated into direct_write_fallback(), but that hadn't changed the situation. We are still left with ->ki_pos bumped by generic_perform_write() (evaluated as an argument of direct_write_fallback() now) and *not* retraced in case when direct_write_fallback() decides to discard the buffered write result. Both in -next and in mainline (since 6.5-rc1). ^ permalink raw reply [flat|nested] 11+ messages in thread
* cleanup the filemap / direct I/O interaction v3 (full series now) @ 2023-05-31 7:50 Christoph Hellwig 2023-05-31 7:50 ` [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw) To: Matthew Wilcox Cc: Jens Axboe, Xiubo Li, Ilya Dryomov, Alexander Viro, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm [Sorry for the previous attempt that stopped at patch 8] Hi all, this series cleans up some of the generic write helper calling conventions and the page cache writeback / invalidation for direct I/O. This is a spinoff from the no-bufferhead kernel project, for which we'll want to an use iomap based buffered write path in the block layer. Changes since v2: - stick to the existing behavior of returning a short write if the buffer fallback write or sync fails - bring back "fuse: use direct_write_fallback" which accidentally got lost in v2 Changes since v1: - remove current->backing_dev_info entirely - fix the pos/end calculation in direct_write_fallback - rename kiocb_invalidate_post_write to kiocb_invalidate_post_direct_write - typo fixes diffstat: block/fops.c | 18 +----- fs/btrfs/file.c | 6 -- fs/ceph/file.c | 6 -- fs/direct-io.c | 10 --- fs/ext4/file.c | 11 +--- fs/f2fs/file.c | 3 - fs/fuse/file.c | 4 - fs/gfs2/file.c | 6 -- fs/iomap/buffered-io.c | 9 ++- fs/iomap/direct-io.c | 88 ++++++++++++--------------------- fs/nfs/file.c | 6 -- fs/ntfs/file.c | 2 fs/ntfs3/file.c | 3 - fs/xfs/xfs_file.c | 6 -- fs/zonefs/file.c | 4 - include/linux/fs.h | 5 - include/linux/pagemap.h | 4 + include/linux/sched.h | 3 - mm/filemap.c | 126 ++++++++++++++++++++++++++---------------------- 19 files changed, 125 insertions(+), 195 deletions(-) ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 03/12] filemap: update ki_pos in generic_perform_write 2023-05-31 7:50 cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig @ 2023-05-31 7:50 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw) To: Matthew Wilcox Cc: Jens Axboe, Xiubo Li, Ilya Dryomov, Alexander Viro, Christian Brauner, Theodore Ts'o, Jaegeuk Kim, Chao Yu, Miklos Szeredi, Andreas Gruenbacher, Darrick J. Wong, Trond Myklebust, Anna Schumaker, Damien Le Moal, Andrew Morton, linux-block, ceph-devel, linux-fsdevel, linux-ext4, linux-f2fs-devel, cluster-devel, linux-xfs, linux-nfs, linux-mm, Hannes Reinecke All callers of generic_perform_write need to updated ki_pos, move it into common code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> --- fs/ceph/file.c | 2 -- fs/ext4/file.c | 9 +++------ fs/f2fs/file.c | 1 - fs/nfs/file.c | 1 - mm/filemap.c | 8 ++++---- 5 files changed, 7 insertions(+), 14 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index c8ef72f723badd..767f4dfe7def64 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) * can not run at the same time */ written = generic_perform_write(iocb, from); - if (likely(written >= 0)) - iocb->ki_pos = pos + written; ceph_end_io_write(inode); } diff --git a/fs/ext4/file.c b/fs/ext4/file.c index bc430270c23c19..ea0ada3985cba2 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, out: inode_unlock(inode); - if (likely(ret > 0)) { - iocb->ki_pos += ret; - ret = generic_write_sync(iocb, ret); - } - - return ret; + if (unlikely(ret <= 0)) + return ret; + return generic_write_sync(iocb, ret); } static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset, diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 4f423d367a44b9..7134fe8bd008cb 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb, ret = generic_perform_write(iocb, from); if (ret > 0) { - iocb->ki_pos += ret; f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_IO, ret); } diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 665ce3fc62eaf4..e8bb4c48a3210a 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) goto out; written = result; - iocb->ki_pos += written; nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written); if (mntflags & NFS_MOUNT_WRITE_EAGER) { diff --git a/mm/filemap.c b/mm/filemap.c index 33b54660ad2b39..15907af4a57ff5 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) balance_dirty_pages_ratelimited(mapping); } while (iov_iter_count(i)); - return written ? written : status; + if (!written) + return status; + iocb->ki_pos += written; + return written; } EXPORT_SYMBOL(generic_perform_write); @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) endbyte = pos + status - 1; err = filemap_write_and_wait_range(mapping, pos, endbyte); if (err == 0) { - iocb->ki_pos = endbyte + 1; written += status; invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) } } else { written = generic_perform_write(iocb, from); - if (likely(written > 0)) - iocb->ki_pos += written; } out: return written ? written : err; -- 2.39.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-09-13 16:33 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230601145904.1385409-1-hch@lst.de>
[not found] ` <20230601145904.1385409-4-hch@lst.de>
2023-08-27 19:41 ` [PATCH 03/12] filemap: update ki_pos in generic_perform_write Al Viro
2023-08-27 21:45 ` Al Viro
2023-08-28 12:32 ` Christoph Hellwig
2023-08-28 13:56 ` Al Viro
2023-08-28 14:15 ` Christoph Hellwig
2023-09-13 11:00 ` Christoph Hellwig
2023-09-13 16:33 ` Christian Brauner
2023-08-28 1:04 ` Al Viro
2023-08-28 12:30 ` Christoph Hellwig
2023-08-28 14:02 ` Al Viro
2023-05-31 7:50 cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
2023-05-31 7:50 ` [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox