* [PATCH v4] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
[not found] <20190409114922.30095-1-amir73il@gmail.com>
@ 2019-04-17 5:45 ` Amir Goldstein
2019-04-19 0:02 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Amir Goldstein @ 2019-04-17 5:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Jan Kara, Dave Chinner, Al Viro, linux-mm, linux-api, linux-fsdevel
Commit 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
writeback") claims that sync_file_range(2) syscall was "created for
userspace to be able to issue background writeout and so waiting for
in-flight IO is undesirable there" and changes the writeback (back) to
WB_SYNC_NONE.
This claim is only partially true. It is true for users that use the flag
SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was
the reason for changing to WB_SYNC_NONE writeback.
However, that claim is not true for users that use that flag combination
SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.
Those users explicitly requested to wait for in-flight IO as well as to
writeback of dirty pages.
Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT
and use the helper filemap_write_and_wait_range(), that uses WB_SYNC_ALL
writeback, to perform the full range sync request.
Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
Andrew,
V2 of this patch is on your mmtotm queue.
However, I had already sent out V3 with a braino fix and Dave Chinner
just added more review comments which I had addressed in this version.
Thanks,
Amir.
Changes since v3:
- Remove unneeded change to VALID_FLAGS (Dave)
- Call file_fdatawait_range() before writeback (Dave)
Changes since v2:
- Return after filemap_write_and_wait_range()
Changes since v1:
- Remove non-guaranties of the API from commit message
- Added ACK by Jan
fs/sync.c | 20 +++++++++++++++-----
include/uapi/linux/fs.h | 3 +++
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/fs/sync.c b/fs/sync.c
index b54e0541ad89..1836328f1ae8 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
}
/*
- * sys_sync_file_range() permits finely controlled syncing over a segment of
+ * ksys_sync_file_range() permits finely controlled syncing over a segment of
* a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is
- * zero then sys_sync_file_range() will operate from offset out to EOF.
+ * zero then ksys_sync_file_range() will operate from offset out to EOF.
*
* The flag bits are:
*
@@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
* Useful combinations of the flag bits are:
*
* SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages
- * in the range which were dirty on entry to sys_sync_file_range() are placed
+ * in the range which were dirty on entry to ksys_sync_file_range() are placed
* under writeout. This is a start-write-for-data-integrity operation.
*
* SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
@@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
* earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait
* for that operation to complete and to return the result.
*
- * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER:
+ * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER
+ * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT):
* a traditional sync() operation. This is a write-for-data-integrity operation
* which will ensure that all pages in the range which were dirty on entry to
- * sys_sync_file_range() are committed to disk.
+ * ksys_sync_file_range() are written to disk. It should be noted that disk
+ * caches are not flushed by this call, so there are no guarantees here that the
+ * data will be available on disk after a crash.
*
*
* SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any
@@ -344,6 +347,13 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
goto out_put;
}
+ if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
+ SYNC_FILE_RANGE_WRITE_AND_WAIT) {
+ /* Unlike SYNC_FILE_RANGE_WRITE alone, uses WB_SYNC_ALL */
+ ret = filemap_write_and_wait_range(mapping, offset, endbyte);
+ goto out_put;
+ }
+
if (flags & SYNC_FILE_RANGE_WRITE) {
ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 121e82ce296b..59c71fa8c553 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -320,6 +320,9 @@ struct fscrypt_key {
#define SYNC_FILE_RANGE_WAIT_BEFORE 1
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4
+#define SYNC_FILE_RANGE_WRITE_AND_WAIT (SYNC_FILE_RANGE_WRITE | \
+ SYNC_FILE_RANGE_WAIT_BEFORE | \
+ SYNC_FILE_RANGE_WAIT_AFTER)
/*
* Flags for preadv2/pwritev2:
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v4] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
2019-04-17 5:45 ` [PATCH v4] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback Amir Goldstein
@ 2019-04-19 0:02 ` Dave Chinner
2019-04-19 7:29 ` [PATCH v5] " Amir Goldstein
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2019-04-19 0:02 UTC (permalink / raw)
To: Amir Goldstein
Cc: Andrew Morton, Jan Kara, Al Viro, linux-mm, linux-api, linux-fsdevel
On Wed, Apr 17, 2019 at 08:45:59AM +0300, Amir Goldstein wrote:
> Commit 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
> writeback") claims that sync_file_range(2) syscall was "created for
> userspace to be able to issue background writeout and so waiting for
> in-flight IO is undesirable there" and changes the writeback (back) to
> WB_SYNC_NONE.
>
> This claim is only partially true. It is true for users that use the flag
> SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was
> the reason for changing to WB_SYNC_NONE writeback.
>
> However, that claim is not true for users that use that flag combination
> SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.
> Those users explicitly requested to wait for in-flight IO as well as to
> writeback of dirty pages.
>
> Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT
> and use the helper filemap_write_and_wait_range(), that uses WB_SYNC_ALL
> writeback, to perform the full range sync request.
>
> Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
> Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> Acked-by: Jan Kara <jack@suse.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> ---
>
> Andrew,
>
> V2 of this patch is on your mmtotm queue.
> However, I had already sent out V3 with a braino fix and Dave Chinner
> just added more review comments which I had addressed in this version.
>
> Thanks,
> Amir.
>
> Changes since v3:
> - Remove unneeded change to VALID_FLAGS (Dave)
> - Call file_fdatawait_range() before writeback (Dave)
>
> Changes since v2:
> - Return after filemap_write_and_wait_range()
>
> Changes since v1:
> - Remove non-guaranties of the API from commit message
> - Added ACK by Jan
>
> fs/sync.c | 20 +++++++++++++++-----
> include/uapi/linux/fs.h | 3 +++
> 2 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/sync.c b/fs/sync.c
> index b54e0541ad89..1836328f1ae8 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
> }
>
> /*
> - * sys_sync_file_range() permits finely controlled syncing over a segment of
> + * ksys_sync_file_range() permits finely controlled syncing over a segment of
> * a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is
> - * zero then sys_sync_file_range() will operate from offset out to EOF.
> + * zero then ksys_sync_file_range() will operate from offset out to EOF.
> *
> * The flag bits are:
> *
> @@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
> * Useful combinations of the flag bits are:
> *
> * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages
> - * in the range which were dirty on entry to sys_sync_file_range() are placed
> + * in the range which were dirty on entry to ksys_sync_file_range() are placed
> * under writeout. This is a start-write-for-data-integrity operation.
> *
> * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
> @@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
> * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait
> * for that operation to complete and to return the result.
> *
> - * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER:
> + * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER
> + * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT):
> * a traditional sync() operation. This is a write-for-data-integrity operation
> * which will ensure that all pages in the range which were dirty on entry to
> - * sys_sync_file_range() are committed to disk.
> + * ksys_sync_file_range() are written to disk. It should be noted that disk
> + * caches are not flushed by this call, so there are no guarantees here that the
> + * data will be available on disk after a crash.
> *
> *
> * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any
> @@ -344,6 +347,13 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
> goto out_put;
> }
>
> + if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
> + SYNC_FILE_RANGE_WRITE_AND_WAIT) {
> + /* Unlike SYNC_FILE_RANGE_WRITE alone, uses WB_SYNC_ALL */
> + ret = filemap_write_and_wait_range(mapping, offset, endbyte);
> + goto out_put;
> + }
Clunky, now that I look at it in context.
+ int sync_mode = WB_SYNC_NONE;
+
+ if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
+ SYNC_FILE_RANGE_WRITE_AND_WAIT)
+ sync_mode = WB_SYNC_ALL;
.....
if (flags & SYNC_FILE_RANGE_WRITE) {
ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
- WB_SYNC_NONE);
+ sync_mode);
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v5] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
2019-04-19 0:02 ` Dave Chinner
@ 2019-04-19 7:29 ` Amir Goldstein
0 siblings, 0 replies; 3+ messages in thread
From: Amir Goldstein @ 2019-04-19 7:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Jan Kara, Dave Chinner, Al Viro, linux-mm, linux-api, linux-fsdevel
Commit 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
writeback") claims that sync_file_range(2) syscall was "created for
userspace to be able to issue background writeout and so waiting for
in-flight IO is undesirable there" and changes the writeback (back) to
WB_SYNC_NONE.
This claim is only partially true. It is true for users that use the flag
SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was
the reason for changing to WB_SYNC_NONE writeback.
However, that claim is not true for users that use that flag combination
SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.
Those users explicitly requested to wait for in-flight IO as well as to
writeback of dirty pages.
Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT
and use WB_SYNC_ALL writeback to perform the full range sync request.
Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
Andrew,
One more version addressing another comment by Dave Chinner.
Thanks,
Amir.
Changes since v4:
- Don't use filemap_write_and_wait_range() helper (Dave)
Changes since v3:
- Remove unneeded change to VALID_FLAGS (Dave)
- Call file_fdatawait_range() before writeback (Dave)
Changes since v2:
- Return after filemap_write_and_wait_range()
Changes since v1:
- Remove non-guaranties of the API from commit message
- Added ACK by Jan
fs/sync.c | 21 +++++++++++++++------
include/uapi/linux/fs.h | 3 +++
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/fs/sync.c b/fs/sync.c
index b54e0541ad89..9e8cd90e890f 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
}
/*
- * sys_sync_file_range() permits finely controlled syncing over a segment of
+ * ksys_sync_file_range() permits finely controlled syncing over a segment of
* a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is
- * zero then sys_sync_file_range() will operate from offset out to EOF.
+ * zero then ksys_sync_file_range() will operate from offset out to EOF.
*
* The flag bits are:
*
@@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
* Useful combinations of the flag bits are:
*
* SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages
- * in the range which were dirty on entry to sys_sync_file_range() are placed
+ * in the range which were dirty on entry to ksys_sync_file_range() are placed
* under writeout. This is a start-write-for-data-integrity operation.
*
* SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
@@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
* earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait
* for that operation to complete and to return the result.
*
- * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER:
+ * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER
+ * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT):
* a traditional sync() operation. This is a write-for-data-integrity operation
* which will ensure that all pages in the range which were dirty on entry to
- * sys_sync_file_range() are committed to disk.
+ * ksys_sync_file_range() are written to disk. It should be noted that disk
+ * caches are not flushed by this call, so there are no guarantees here that the
+ * data will be available on disk after a crash.
*
*
* SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any
@@ -345,8 +348,14 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
}
if (flags & SYNC_FILE_RANGE_WRITE) {
+ int sync_mode = WB_SYNC_NONE;
+
+ if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
+ SYNC_FILE_RANGE_WRITE_AND_WAIT)
+ sync_mode = WB_SYNC_ALL;
+
ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
- WB_SYNC_NONE);
+ sync_mode);
if (ret < 0)
goto out_put;
}
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 121e82ce296b..59c71fa8c553 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -320,6 +320,9 @@ struct fscrypt_key {
#define SYNC_FILE_RANGE_WAIT_BEFORE 1
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4
+#define SYNC_FILE_RANGE_WRITE_AND_WAIT (SYNC_FILE_RANGE_WRITE | \
+ SYNC_FILE_RANGE_WAIT_BEFORE | \
+ SYNC_FILE_RANGE_WAIT_AFTER)
/*
* Flags for preadv2/pwritev2:
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-04-19 7:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20190409114922.30095-1-amir73il@gmail.com>
2019-04-17 5:45 ` [PATCH v4] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback Amir Goldstein
2019-04-19 0:02 ` Dave Chinner
2019-04-19 7:29 ` [PATCH v5] " Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox