* [PATCH v2 1/2] filemap: skip write and wait if end offset precedes start
2022-11-28 15:56 [PATCH v2 0/2] filemap: skip write and wait if end offset precedes start Brian Foster
@ 2022-11-28 15:56 ` Brian Foster
2022-11-30 7:44 ` Christoph Hellwig
2022-11-28 15:56 ` [PATCH v2 2/2] mm/fadvise: use LLONG_MAX instead of -1 for eof Brian Foster
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2022-11-28 15:56 UTC (permalink / raw)
To: linux-mm; +Cc: Andrew Morton, Christoph Hellwig
A call to file[map]_write_and_wait_range() with an end offset that
precedes the start offset but happens to land in the same page can
trigger writeback submission but fails to wait on the submitted
page. Writeback submission occurs because
__filemap_fdatawrite_range() passes both offsets down into
write_cache_pages(), which rounds down to page indexes before it
starts processing writeback. However, __filemap_fdatawait_range()
immediately returns if the byte-granular end offset precedes the
start offset.
This behavior was observed in the form of unpredictable latency from
a frequent write and wait call with incorrect parameters. The
behavior gave the impression that the fdatawait path might
occasionally fail to wait on writeback, but further investigation
showed the latency was from write_cache_pages() waiting on writeback
state to clear for a page already under writeback. Therefore, this
indicated that fdatawait actually never waits on writeback in this
particular situation.
The byte granular check in __filemap_fdatawait_range() goes all the
way back to the old wait_on_page_writeback() helper. It originally
used page offsets and so would have waited in this problematic case.
That changed to byte granularity file offsets in commit 94004ed726f3
("kill wait_on_page_writeback_range"), which subtly changed this
behavior. The check itself has become somewhat redundant since the
error checking code that used to follow the wait loop (at the time
of the aforementioned commit) has now been removed and lifted into
the higher level callers.
Therefore, we can restore historical fdatawait behavior by simply
removing the check. Since the current fdatawait behavior has been in
place for quite some time and is consistent with other interfaces
that use file offsets, instead lift the check into the
file[map]_write_and_wait_range() helpers to provide consistent
behavior between the write and wait.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
mm/filemap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 08341616ae7a..e7711b5a3f4c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -506,9 +506,6 @@ static void __filemap_fdatawait_range(struct address_space *mapping,
struct pagevec pvec;
int nr_pages;
- if (end_byte < start_byte)
- return;
-
pagevec_init(&pvec);
while (index <= end) {
unsigned i;
@@ -670,6 +667,9 @@ int filemap_write_and_wait_range(struct address_space *mapping,
{
int err = 0, err2;
+ if (lend < lstart)
+ return 0;
+
if (mapping_needs_writeback(mapping)) {
err = __filemap_fdatawrite_range(mapping, lstart, lend,
WB_SYNC_ALL);
@@ -770,6 +770,9 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
int err = 0, err2;
struct address_space *mapping = file->f_mapping;
+ if (lend < lstart)
+ return 0;
+
if (mapping_needs_writeback(mapping)) {
err = __filemap_fdatawrite_range(mapping, lstart, lend,
WB_SYNC_ALL);
--
2.37.3
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v2 2/2] mm/fadvise: use LLONG_MAX instead of -1 for eof
2022-11-28 15:56 [PATCH v2 0/2] filemap: skip write and wait if end offset precedes start Brian Foster
2022-11-28 15:56 ` [PATCH v2 1/2] " Brian Foster
@ 2022-11-28 15:56 ` Brian Foster
2022-11-30 7:46 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2022-11-28 15:56 UTC (permalink / raw)
To: linux-mm; +Cc: Andrew Morton, Christoph Hellwig
generic_fadvise() sets endbyte = -1 to specify end of file (i.e. if
length == 0 is passed from userspace). Most other callers to
filemap_fdatawrite_range() use LLONG_MAX for this purpose,
particularly if they also call fdatawait_range() (which requires end
>= start). For example, sync_file_range(), vfs_fsync() (where the
range is passed down through per-fs ->fsync() callbacks),
filemap_flush(), etc. generic_fadvise() does not currently wait on
writeback, but fix the call up to be consistent with other callers.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
mm/fadvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/fadvise.c b/mm/fadvise.c
index c76ee665355a..bf04fec87f35 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -72,7 +72,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
*/
endbyte = (u64)offset + (u64)len;
if (!len || endbyte < len)
- endbyte = -1;
+ endbyte = LLONG_MAX;
else
endbyte--; /* inclusive */
--
2.37.3
^ permalink raw reply [flat|nested] 5+ messages in thread