* [PATCH 1/3] mm: Add a function to get a single tagged folio from a file
[not found] <20230302231638.521280-1-dhowells@redhat.com>
@ 2023-03-02 23:16 ` David Howells
2023-03-02 23:21 ` Matthew Wilcox
2023-03-02 23:16 ` [PATCH 2/3] afs: Partially revert and use filemap_get_folio_tag() David Howells
2023-03-02 23:16 ` [PATCH 3/3] cifs: " David Howells
2 siblings, 1 reply; 4+ messages in thread
From: David Howells @ 2023-03-02 23:16 UTC (permalink / raw)
To: Linus Torvalds, Steve French
Cc: David Howells, Vishal Moola, Shyam Prasad N, Rohith Surabattula,
Tom Talpey, Stefan Metzmacher, Paulo Alcantara, Jeff Layton,
Matthew Wilcox, Marc Dionne, linux-afs, linux-cifs,
linux-fsdevel, linux-kernel, Steve French, Andrew Morton,
linux-mm
Add a function to get a single tagged folio from a file rather than a batch
for use in afs and cifs where, in the common case, the batch is likely to
be rendered irrelevant by the {afs,cifs}_extend_writeback() function.
For filemap_get_folios_tag() to be of use, the batch has to be passed down,
and if it contains scattered, non-contiguous folios, these are likely to
end up being pinned by the batch for significant periods of time whilst I/O
is undertaken on earlier pages.
Further, for write_cache_pages() to be useful, it would need to wait for
PG_fscache which is used to indicate that I/O is in progress from a folio to
the cache - but it can't do this unconditionally as some filesystems, such
as btrfs, use PG_private_2 for other purposes.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/2214157.1677250083@warthog.procyon.org.uk/
---
include/linux/pagemap.h | 2 ++
mm/filemap.c | 58 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0acb8e1fb7af..577535633006 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -741,6 +741,8 @@ unsigned filemap_get_folios_contig(struct address_space *mapping,
pgoff_t *start, pgoff_t end, struct folio_batch *fbatch);
unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
pgoff_t end, xa_mark_t tag, struct folio_batch *fbatch);
+struct folio *filemap_get_folio_tag(struct address_space *mapping, pgoff_t *start,
+ pgoff_t end, xa_mark_t tag);
struct page *grab_cache_page_write_begin(struct address_space *mapping,
pgoff_t index);
diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a..1b1e9c661018 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2339,6 +2339,64 @@ unsigned filemap_get_folios_tag(struct address_space *mapping, pgoff_t *start,
}
EXPORT_SYMBOL(filemap_get_folios_tag);
+/**
+ * filemap_get_folio_tag - Get the first folio matching @tag
+ * @mapping: The address_space to search
+ * @start: The starting page index
+ * @end: The final page index (inclusive)
+ * @tag: The tag index
+ *
+ * Search for and return the first folios in the mapping starting at index
+ * @start and up to index @end (inclusive). The folio is returned with an
+ * elevated reference count.
+ *
+ * If a folio is returned, it may start before @start; if it does, it will
+ * contain @start. The folio may also extend beyond @end; if it does, it will
+ * contain @end. If folios are added to or removed from the page cache while
+ * this is running, they may or may not be found by this call.
+ *
+ * Return: The folio that was found or NULL. @start is also updated to index
+ * the next folio for the traversal or will be left pointing after @end.
+ */
+struct folio *filemap_get_folio_tag(struct address_space *mapping, pgoff_t *start,
+ pgoff_t end, xa_mark_t tag)
+{
+ XA_STATE(xas, &mapping->i_pages, *start);
+ struct folio *folio;
+
+ rcu_read_lock();
+ while ((folio = find_get_entry(&xas, end, tag)) != NULL) {
+ /*
+ * Shadow entries should never be tagged, but this iteration
+ * is lockless so there is a window for page reclaim to evict
+ * a page we saw tagged. Skip over it.
+ */
+ if (xa_is_value(folio))
+ continue;
+
+ if (folio_test_hugetlb(folio))
+ *start = folio->index + 1;
+ else
+ *start = folio_next_index(folio);
+ goto out;
+ }
+
+ /*
+ * We come here when there is no page beyond @end. We take care to not
+ * overflow the index @start as it confuses some of the callers. This
+ * breaks the iteration when there is a page at index -1 but that is
+ * already broke anyway.
+ */
+ if (end == (pgoff_t)-1)
+ *start = (pgoff_t)-1;
+ else
+ *start = end + 1;
+out:
+ rcu_read_unlock();
+ return folio;
+}
+EXPORT_SYMBOL(filemap_get_folio_tag);
+
/*
* CD/DVDs are error prone. When a medium error occurs, the driver may fail
* a _large_ part of the i/o request. Imagine the worst scenario:
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/3] afs: Partially revert and use filemap_get_folio_tag()
[not found] <20230302231638.521280-1-dhowells@redhat.com>
2023-03-02 23:16 ` [PATCH 1/3] mm: Add a function to get a single tagged folio from a file David Howells
@ 2023-03-02 23:16 ` David Howells
2023-03-02 23:16 ` [PATCH 3/3] cifs: " David Howells
2 siblings, 0 replies; 4+ messages in thread
From: David Howells @ 2023-03-02 23:16 UTC (permalink / raw)
To: Linus Torvalds, Steve French
Cc: David Howells, Vishal Moola, Shyam Prasad N, Rohith Surabattula,
Tom Talpey, Stefan Metzmacher, Paulo Alcantara, Jeff Layton,
Matthew Wilcox, Marc Dionne, linux-afs, linux-cifs,
linux-fsdevel, linux-kernel, Steve French, Andrew Morton,
linux-mm
Partially revert the changes made by:
acc8d8588cb7e3e64b0d2fa611dad06574cd67b1.
afs: convert afs_writepages_region() to use filemap_get_folios_tag()
The issue is that filemap_get_folios_tag() gets a batch of pages at a time,
and then afs_writepages_region() goes through them one at a time, extends
each into an operation with as many pages as will fit using the loop in
afs_extend_writeback() and submits it - but, in the common case, this means
that the other pages in the batch already got annexed and processed in
afs_extend_writeback() and we end up doing duplicate processing.
Switching to write_cache_pages() isn't an immediate substitute as that
doesn't take account of PG_fscache (and this bit is used in other ways by
other filesystems).
So go back to finding the next folio from the VM one at a time and then
extending the op onwards.
Fixes: acc8d8588cb7 ("afs: convert afs_writepages_region() to use filemap_get_folios_tag()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <sfrench@samba.org>
cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: linux-afs@lists.infradead.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/2214157.1677250083@warthog.procyon.org.uk/
---
fs/afs/write.c | 118 ++++++++++++++++++++++++-------------------------
1 file changed, 57 insertions(+), 61 deletions(-)
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 571f3b9a417e..2ed76697be96 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -704,87 +704,83 @@ static int afs_writepages_region(struct address_space *mapping,
bool max_one_loop)
{
struct folio *folio;
- struct folio_batch fbatch;
ssize_t ret;
- unsigned int i;
- int n, skips = 0;
+ int skips = 0;
_enter("%llx,%llx,", start, end);
- folio_batch_init(&fbatch);
do {
pgoff_t index = start / PAGE_SIZE;
- n = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
- PAGECACHE_TAG_DIRTY, &fbatch);
-
- if (!n)
+ folio = filemap_get_folio_tag(mapping, &index, end / PAGE_SIZE,
+ PAGECACHE_TAG_DIRTY);
+ if (!folio)
break;
- for (i = 0; i < n; i++) {
- folio = fbatch.folios[i];
- start = folio_pos(folio); /* May regress with THPs */
- _debug("wback %lx", folio_index(folio));
+ start = folio_pos(folio); /* May regress with THPs */
- /* At this point we hold neither the i_pages lock nor the
- * page lock: the page may be truncated or invalidated
- * (changing page->mapping to NULL), or even swizzled
- * back from swapper_space to tmpfs file mapping
- */
- if (wbc->sync_mode != WB_SYNC_NONE) {
- ret = folio_lock_killable(folio);
- if (ret < 0) {
- folio_batch_release(&fbatch);
- return ret;
- }
- } else {
- if (!folio_trylock(folio))
- continue;
- }
+ _debug("wback %lx", folio_index(folio));
- if (folio->mapping != mapping ||
- !folio_test_dirty(folio)) {
- start += folio_size(folio);
- folio_unlock(folio);
- continue;
+ /* At this point we hold neither the i_pages lock nor the
+ * page lock: the page may be truncated or invalidated
+ * (changing page->mapping to NULL), or even swizzled
+ * back from swapper_space to tmpfs file mapping
+ */
+ if (wbc->sync_mode != WB_SYNC_NONE) {
+ ret = folio_lock_killable(folio);
+ if (ret < 0) {
+ folio_put(folio);
+ return ret;
+ }
+ } else {
+ if (!folio_trylock(folio)) {
+ folio_put(folio);
+ return 0;
}
+ }
- if (folio_test_writeback(folio) ||
- folio_test_fscache(folio)) {
- folio_unlock(folio);
- if (wbc->sync_mode != WB_SYNC_NONE) {
- folio_wait_writeback(folio);
+ if (folio_mapping(folio) != mapping ||
+ !folio_test_dirty(folio)) {
+ start += folio_size(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ continue;
+ }
+
+ if (folio_test_writeback(folio) ||
+ folio_test_fscache(folio)) {
+ folio_unlock(folio);
+ if (wbc->sync_mode != WB_SYNC_NONE) {
+ folio_wait_writeback(folio);
#ifdef CONFIG_AFS_FSCACHE
- folio_wait_fscache(folio);
+ folio_wait_fscache(folio);
#endif
- } else {
- start += folio_size(folio);
- }
- if (wbc->sync_mode == WB_SYNC_NONE) {
- if (skips >= 5 || need_resched()) {
- *_next = start;
- _leave(" = 0 [%llx]", *_next);
- return 0;
- }
- skips++;
- }
- continue;
+ } else {
+ start += folio_size(folio);
}
-
- if (!folio_clear_dirty_for_io(folio))
- BUG();
- ret = afs_write_back_from_locked_folio(mapping, wbc,
- folio, start, end);
- if (ret < 0) {
- _leave(" = %zd", ret);
- folio_batch_release(&fbatch);
- return ret;
+ folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
}
+ continue;
+ }
- start += ret;
+ if (!folio_clear_dirty_for_io(folio))
+ BUG();
+ ret = afs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
+ folio_put(folio);
+ if (ret < 0) {
+ _leave(" = %zd", ret);
+ return ret;
}
- folio_batch_release(&fbatch);
+ start += ret;
+
+ if (max_one_loop)
+ break;
+
cond_resched();
} while (wbc->nr_to_write > 0);
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 3/3] cifs: Partially revert and use filemap_get_folio_tag()
[not found] <20230302231638.521280-1-dhowells@redhat.com>
2023-03-02 23:16 ` [PATCH 1/3] mm: Add a function to get a single tagged folio from a file David Howells
2023-03-02 23:16 ` [PATCH 2/3] afs: Partially revert and use filemap_get_folio_tag() David Howells
@ 2023-03-02 23:16 ` David Howells
2 siblings, 0 replies; 4+ messages in thread
From: David Howells @ 2023-03-02 23:16 UTC (permalink / raw)
To: Linus Torvalds, Steve French
Cc: David Howells, Vishal Moola, Shyam Prasad N, Rohith Surabattula,
Tom Talpey, Stefan Metzmacher, Paulo Alcantara, Jeff Layton,
Matthew Wilcox, Marc Dionne, linux-afs, linux-cifs,
linux-fsdevel, linux-kernel, Steve French, Andrew Morton,
linux-mm
Mirror the changes made to afs to partially revert the changes made by:
acc8d8588cb7e3e64b0d2fa611dad06574cd67b1.
"afs: convert afs_writepages_region() to use filemap_get_folios_tag()"
that were then mirrored into cifs.
The issue is that filemap_get_folios_tag() gets a batch of pages at a time,
and then cifs_writepages_region() goes through them one at a time, extends
each into an operation with as many pages as will fit using the loop in
cifs_extend_writeback() and submits it - but, in the common case, this
means that the other pages in the batch already got annexed and processed
in cifs_extend_writeback() and we end up doing duplicate processing.
Switching to write_cache_pages() isn't an immediate substitute as that
doesn't take account of PG_fscache (and this bit is used in other ways by
other filesystems).
So go back to finding the next folio from the VM one at a time and then
extending the op onwards.
Fixes: 3822a7c40997 ("Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Paulo Alcantara <pc@cjr.nz>
cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/2214157.1677250083@warthog.procyon.org.uk/
---
fs/cifs/file.c | 115 +++++++++++++++++++++----------------------------
1 file changed, 49 insertions(+), 66 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 4d4a2d82636d..a3e89e741b42 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2864,93 +2864,76 @@ static int cifs_writepages_region(struct address_space *mapping,
struct writeback_control *wbc,
loff_t start, loff_t end, loff_t *_next)
{
- struct folio_batch fbatch;
+ struct folio *folio;
+ ssize_t ret;
int skips = 0;
- folio_batch_init(&fbatch);
do {
- int nr;
pgoff_t index = start / PAGE_SIZE;
- nr = filemap_get_folios_tag(mapping, &index, end / PAGE_SIZE,
- PAGECACHE_TAG_DIRTY, &fbatch);
- if (!nr)
+ folio = filemap_get_folio_tag(mapping, &index, end / PAGE_SIZE,
+ PAGECACHE_TAG_DIRTY);
+ if (!folio)
break;
- for (int i = 0; i < nr; i++) {
- ssize_t ret;
- struct folio *folio = fbatch.folios[i];
-
-redo_folio:
- start = folio_pos(folio); /* May regress with THPs */
+ start = folio_pos(folio); /* May regress with THPs */
- /* At this point we hold neither the i_pages lock nor the
- * page lock: the page may be truncated or invalidated
- * (changing page->mapping to NULL), or even swizzled
- * back from swapper_space to tmpfs file mapping
- */
- if (wbc->sync_mode != WB_SYNC_NONE) {
- ret = folio_lock_killable(folio);
- if (ret < 0)
- goto write_error;
- } else {
- if (!folio_trylock(folio))
- goto skip_write;
+ /* At this point we hold neither the i_pages lock nor the
+ * page lock: the page may be truncated or invalidated
+ * (changing page->mapping to NULL), or even swizzled
+ * back from swapper_space to tmpfs file mapping
+ */
+ if (wbc->sync_mode != WB_SYNC_NONE) {
+ ret = folio_lock_killable(folio);
+ if (ret < 0) {
+ folio_put(folio);
+ return ret;
}
-
- if (folio_mapping(folio) != mapping ||
- !folio_test_dirty(folio)) {
- start += folio_size(folio);
- folio_unlock(folio);
- continue;
+ } else {
+ if (!folio_trylock(folio)) {
+ folio_put(folio);
+ return 0;
}
+ }
- if (folio_test_writeback(folio) ||
- folio_test_fscache(folio)) {
- folio_unlock(folio);
- if (wbc->sync_mode == WB_SYNC_NONE)
- goto skip_write;
+ if (folio_mapping(folio) != mapping ||
+ !folio_test_dirty(folio)) {
+ start += folio_size(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ continue;
+ }
+ if (folio_test_writeback(folio) ||
+ folio_test_fscache(folio)) {
+ folio_unlock(folio);
+ if (wbc->sync_mode != WB_SYNC_NONE) {
folio_wait_writeback(folio);
#ifdef CONFIG_CIFS_FSCACHE
folio_wait_fscache(folio);
#endif
- goto redo_folio;
+ } else {
+ start += folio_size(folio);
}
-
- if (!folio_clear_dirty_for_io(folio))
- /* We hold the page lock - it should've been dirty. */
- WARN_ON(1);
-
- ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
- if (ret < 0)
- goto write_error;
-
- start += ret;
- continue;
-
-write_error:
- folio_batch_release(&fbatch);
- *_next = start;
- return ret;
-
-skip_write:
- /*
- * Too many skipped writes, or need to reschedule?
- * Treat it as a write error without an error code.
- */
- if (skips >= 5 || need_resched()) {
- ret = 0;
- goto write_error;
+ folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
}
-
- /* Otherwise, just skip that folio and go on to the next */
- skips++;
- start += folio_size(folio);
continue;
}
- folio_batch_release(&fbatch);
+ if (!folio_clear_dirty_for_io(folio))
+ /* We hold the page lock - it should've been dirty. */
+ WARN_ON(1);
+
+ ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end);
+ folio_put(folio);
+ if (ret < 0)
+ return ret;
+
+ start += ret;
cond_resched();
} while (wbc->nr_to_write > 0);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/3] mm: Add a function to get a single tagged folio from a file
2023-03-02 23:16 ` [PATCH 1/3] mm: Add a function to get a single tagged folio from a file David Howells
@ 2023-03-02 23:21 ` Matthew Wilcox
0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2023-03-02 23:21 UTC (permalink / raw)
To: David Howells
Cc: Linus Torvalds, Steve French, Vishal Moola, Shyam Prasad N,
Rohith Surabattula, Tom Talpey, Stefan Metzmacher,
Paulo Alcantara, Jeff Layton, Marc Dionne, linux-afs, linux-cifs,
linux-fsdevel, linux-kernel, Steve French, Andrew Morton,
linux-mm
On Thu, Mar 02, 2023 at 11:16:36PM +0000, David Howells wrote:
> Add a function to get a single tagged folio from a file rather than a batch
> for use in afs and cifs where, in the common case, the batch is likely to
> be rendered irrelevant by the {afs,cifs}_extend_writeback() function.
I think this is the wrong way to go. I'll work on a replacement once
I've got a couple of other things off my plate.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-03-02 23:21 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230302231638.521280-1-dhowells@redhat.com>
2023-03-02 23:16 ` [PATCH 1/3] mm: Add a function to get a single tagged folio from a file David Howells
2023-03-02 23:21 ` Matthew Wilcox
2023-03-02 23:16 ` [PATCH 2/3] afs: Partially revert and use filemap_get_folio_tag() David Howells
2023-03-02 23:16 ` [PATCH 3/3] cifs: " David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox