* [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-05-27 16:36 support large folios for NFS Christoph Hellwig
@ 2024-05-27 16:36 ` Christoph Hellwig
2024-05-27 18:17 ` Matthew Wilcox
` (2 more replies)
2024-05-27 16:36 ` [PATCH 2/2] nfs: add support for " Christoph Hellwig
` (3 subsequent siblings)
4 siblings, 3 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-05-27 16:36 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Matthew Wilcox
Cc: linux-nfs, linux-fsdevel, linux-mm
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Modelled after the loop in iomap_write_iter(), copy larger chunks from
userspace if the filesystem has created large folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[hch: use mapping_max_folio_size to keep supporting file systems that do
not support large folios]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
mm/filemap.c | 40 +++++++++++++++++++++++++---------------
1 file changed, 25 insertions(+), 15 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 382c3d06bfb10c..860728e26ccf32 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3981,21 +3981,24 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
loff_t pos = iocb->ki_pos;
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
+ size_t chunk = mapping_max_folio_size(mapping);
long status = 0;
ssize_t written = 0;
do {
struct page *page;
- unsigned long offset; /* Offset into pagecache page */
- unsigned long bytes; /* Bytes to write to page */
+ struct folio *folio;
+ size_t offset; /* Offset into folio */
+ size_t bytes; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
void *fsdata = NULL;
- offset = (pos & (PAGE_SIZE - 1));
- bytes = min_t(unsigned long, PAGE_SIZE - offset,
- iov_iter_count(i));
+ bytes = iov_iter_count(i);
+retry:
+ offset = pos & (chunk - 1);
+ bytes = min(chunk - offset, bytes);
+ balance_dirty_pages_ratelimited(mapping);
-again:
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
@@ -4017,11 +4020,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
if (unlikely(status < 0))
break;
+ folio = page_folio(page);
+ offset = offset_in_folio(folio, pos);
+ if (bytes > folio_size(folio) - offset)
+ bytes = folio_size(folio) - offset;
+
if (mapping_writably_mapped(mapping))
- flush_dcache_page(page);
+ flush_dcache_folio(folio);
- copied = copy_page_from_iter_atomic(page, offset, bytes, i);
- flush_dcache_page(page);
+ copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
+ flush_dcache_folio(folio);
status = a_ops->write_end(file, mapping, pos, bytes, copied,
page, fsdata);
@@ -4039,14 +4047,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
* halfway through, might be a race with munmap,
* might be severe memory pressure.
*/
- if (copied)
+ if (chunk > PAGE_SIZE)
+ chunk /= 2;
+ if (copied) {
bytes = copied;
- goto again;
+ goto retry;
+ }
+ } else {
+ pos += status;
+ written += status;
}
- pos += status;
- written += status;
-
- balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
if (!written)
--
2.43.0
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-05-27 16:36 ` [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios Christoph Hellwig
@ 2024-05-27 18:17 ` Matthew Wilcox
2024-05-28 8:12 ` Christoph Hellwig
[not found] ` <CGME20240528152340eucas1p17ba2ad78d8ea869ef44cdeedb2601f80@eucas1p1.samsung.com>
2024-06-11 10:47 ` Shaun Tancheff
2 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2024-05-27 18:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-fsdevel, linux-mm
On Mon, May 27, 2024 at 06:36:08PM +0200, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Modelled after the loop in iomap_write_iter(), copy larger chunks from
> userspace if the filesystem has created large folios.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> [hch: use mapping_max_folio_size to keep supporting file systems that do
> not support large folios]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Yup, this still makes sense to me.
Could you remind me why we need to call flush_dcache_folio() in
generic_perform_write() while we don't in iomap_write_iter()?
> if (mapping_writably_mapped(mapping))
> - flush_dcache_page(page);
> + flush_dcache_folio(folio);
(i'm not talking about this one)
> - copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> - flush_dcache_page(page);
> + copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> + flush_dcache_folio(folio);
(this one has no equivalent in iomap)
> status = a_ops->write_end(file, mapping, pos, bytes, copied,
> page, fsdata);
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-05-27 18:17 ` Matthew Wilcox
@ 2024-05-28 8:12 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-05-28 8:12 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Trond Myklebust, Anna Schumaker, linux-nfs,
linux-fsdevel, linux-mm
On Mon, May 27, 2024 at 07:17:18PM +0100, Matthew Wilcox wrote:
> Could you remind me why we need to call flush_dcache_folio() in
> generic_perform_write() while we don't in iomap_write_iter()?
> > - copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> > - flush_dcache_page(page);
> > + copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> > + flush_dcache_folio(folio);
>
> (this one has no equivalent in iomap)
The iomap equivalent is in __iomap_write_end and iomap_write_end_inline
and block_write_end.
^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CGME20240528152340eucas1p17ba2ad78d8ea869ef44cdeedb2601f80@eucas1p1.samsung.com>]
* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
[not found] ` <CGME20240528152340eucas1p17ba2ad78d8ea869ef44cdeedb2601f80@eucas1p1.samsung.com>
@ 2024-05-28 15:23 ` Daniel Gomez
2024-05-28 16:50 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Daniel Gomez @ 2024-05-28 15:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Trond Myklebust, Anna Schumaker, Matthew Wilcox, linux-nfs,
linux-fsdevel, linux-mm
Hi Christoph, Matthew,
On Mon, May 27, 2024 at 06:36:08PM +0200, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Modelled after the loop in iomap_write_iter(), copy larger chunks from
> userspace if the filesystem has created large folios.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> [hch: use mapping_max_folio_size to keep supporting file systems that do
> not support large folios]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> mm/filemap.c | 40 +++++++++++++++++++++++++---------------
> 1 file changed, 25 insertions(+), 15 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 382c3d06bfb10c..860728e26ccf32 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3981,21 +3981,24 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> loff_t pos = iocb->ki_pos;
> struct address_space *mapping = file->f_mapping;
> const struct address_space_operations *a_ops = mapping->a_ops;
> + size_t chunk = mapping_max_folio_size(mapping);
> long status = 0;
> ssize_t written = 0;
>
> do {
> struct page *page;
> - unsigned long offset; /* Offset into pagecache page */
> - unsigned long bytes; /* Bytes to write to page */
> + struct folio *folio;
> + size_t offset; /* Offset into folio */
> + size_t bytes; /* Bytes to write to folio */
> size_t copied; /* Bytes copied from user */
> void *fsdata = NULL;
>
> - offset = (pos & (PAGE_SIZE - 1));
> - bytes = min_t(unsigned long, PAGE_SIZE - offset,
> - iov_iter_count(i));
> + bytes = iov_iter_count(i);
> +retry:
> + offset = pos & (chunk - 1);
> + bytes = min(chunk - offset, bytes);
> + balance_dirty_pages_ratelimited(mapping);
>
> -again:
> /*
> * Bring in the user page that we will copy from _first_.
> * Otherwise there's a nasty deadlock on copying from the
> @@ -4017,11 +4020,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> if (unlikely(status < 0))
> break;
>
> + folio = page_folio(page);
> + offset = offset_in_folio(folio, pos);
> + if (bytes > folio_size(folio) - offset)
> + bytes = folio_size(folio) - offset;
> +
> if (mapping_writably_mapped(mapping))
> - flush_dcache_page(page);
> + flush_dcache_folio(folio);
>
> - copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> - flush_dcache_page(page);
> + copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> + flush_dcache_folio(folio);
>
> status = a_ops->write_end(file, mapping, pos, bytes, copied,
> page, fsdata);
I have the same patch for shmem and large folios tree. That was the last piece
needed for getting better performance results. However, it is also needed to
support folios in the write_begin() and write_end() callbacks. In order to avoid
making them local to shmem, how should we do the transition to folios in these
2 callbacks? I was looking into aops->read_folio approach but what do you think?
> @@ -4039,14 +4047,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> * halfway through, might be a race with munmap,
> * might be severe memory pressure.
> */
> - if (copied)
> + if (chunk > PAGE_SIZE)
> + chunk /= 2;
> + if (copied) {
> bytes = copied;
> - goto again;
> + goto retry;
> + }
> + } else {
> + pos += status;
> + written += status;
> }
> - pos += status;
> - written += status;
> -
> - balance_dirty_pages_ratelimited(mapping);
> } while (iov_iter_count(i));
>
> if (!written)
> --
> 2.43.0
>
Daniel
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-05-28 15:23 ` Daniel Gomez
@ 2024-05-28 16:50 ` Matthew Wilcox
2024-05-28 19:01 ` Daniel Gomez
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2024-05-28 16:50 UTC (permalink / raw)
To: Daniel Gomez
Cc: Christoph Hellwig, Trond Myklebust, Anna Schumaker, linux-nfs,
linux-fsdevel, linux-mm
On Tue, May 28, 2024 at 03:23:39PM +0000, Daniel Gomez wrote:
> I have the same patch for shmem and large folios tree. That was the last piece
> needed for getting better performance results. However, it is also needed to
> support folios in the write_begin() and write_end() callbacks.
I don't think it's *needed*. It's nice! But clearly not necessary
since Christoph made nfs work without doing that.
> In order to avoid
> making them local to shmem, how should we do the transition to folios in these
> 2 callbacks? I was looking into aops->read_folio approach but what do you think?
See the v2 of buffer_write_operations that I just posted. I was waiting
for feedback from Christoph on the revised method for passing fsdata
around, but I may as well just post a v2 and see what happens.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-05-28 16:50 ` Matthew Wilcox
@ 2024-05-28 19:01 ` Daniel Gomez
0 siblings, 0 replies; 21+ messages in thread
From: Daniel Gomez @ 2024-05-28 19:01 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Trond Myklebust, Anna Schumaker, linux-nfs,
linux-fsdevel, linux-mm
On Tue, May 28, 2024 at 05:50:44PM +0100, Matthew Wilcox wrote:
> On Tue, May 28, 2024 at 03:23:39PM +0000, Daniel Gomez wrote:
> > I have the same patch for shmem and large folios tree. That was the last piece
> > needed for getting better performance results. However, it is also needed to
> > support folios in the write_begin() and write_end() callbacks.
>
> I don't think it's *needed*. It's nice! But clearly not necessary
> since Christoph made nfs work without doing that.
I see. We send anyway the length with bytes and the folio allocated inside
write_begin() is retrieved with folio_page().
I did test this patch (+mapping_max_folio_size() patch) for shmem an it works fine for me.
>
> > In order to avoid
> > making them local to shmem, how should we do the transition to folios in these
> > 2 callbacks? I was looking into aops->read_folio approach but what do you think?
>
> See the v2 of buffer_write_operations that I just posted. I was waiting
> for feedback from Christoph on the revised method for passing fsdata
> around, but I may as well just post a v2 and see what happens.
Interesting. I think it makes sense to convert tmpfs to
buffered_write_operations as well. Can you add me to the v2 so I can add/review
it for tmpfs?
Thanks
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-05-27 16:36 ` [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios Christoph Hellwig
2024-05-27 18:17 ` Matthew Wilcox
[not found] ` <CGME20240528152340eucas1p17ba2ad78d8ea869ef44cdeedb2601f80@eucas1p1.samsung.com>
@ 2024-06-11 10:47 ` Shaun Tancheff
2024-06-11 16:13 ` Christoph Hellwig
2 siblings, 1 reply; 21+ messages in thread
From: Shaun Tancheff @ 2024-06-11 10:47 UTC (permalink / raw)
To: Christoph Hellwig, Trond Myklebust, Anna Schumaker, Matthew Wilcox
Cc: linux-nfs, linux-fsdevel, linux-mm
On 5/27/24 23:36, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> Modelled after the loop in iomap_write_iter(), copy larger chunks from
> userspace if the filesystem has created large folios.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> [hch: use mapping_max_folio_size to keep supporting file systems that do
> not support large folios]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> mm/filemap.c | 40 +++++++++++++++++++++++++---------------
> 1 file changed, 25 insertions(+), 15 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 382c3d06bfb10c..860728e26ccf32 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3981,21 +3981,24 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> loff_t pos = iocb->ki_pos;
> struct address_space *mapping = file->f_mapping;
> const struct address_space_operations *a_ops = mapping->a_ops;
> + size_t chunk = mapping_max_folio_size(mapping);
Better to default chunk to PAGE_SIZE for backward compat
+ size_t chunk = PAGE_SIZE;
> long status = 0;
> ssize_t written = 0;
>
Have fs opt in to large folio support:
+ if (mapping_large_folio_support(mapping))
+ chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
> do {
> struct page *page;
> - unsigned long offset; /* Offset into pagecache page */
> - unsigned long bytes; /* Bytes to write to page */
> + struct folio *folio;
> + size_t offset; /* Offset into folio */
> + size_t bytes; /* Bytes to write to folio */
> size_t copied; /* Bytes copied from user */
> void *fsdata = NULL;
>
> - offset = (pos & (PAGE_SIZE - 1));
> - bytes = min_t(unsigned long, PAGE_SIZE - offset,
> - iov_iter_count(i));
> + bytes = iov_iter_count(i);
> +retry:
> + offset = pos & (chunk - 1);
> + bytes = min(chunk - offset, bytes);
> + balance_dirty_pages_ratelimited(mapping);
>
> -again:
> /*
> * Bring in the user page that we will copy from _first_.
> * Otherwise there's a nasty deadlock on copying from the
> @@ -4017,11 +4020,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> if (unlikely(status < 0))
> break;
>
> + folio = page_folio(page);
> + offset = offset_in_folio(folio, pos);
> + if (bytes > folio_size(folio) - offset)
> + bytes = folio_size(folio) - offset;
> +
> if (mapping_writably_mapped(mapping))
> - flush_dcache_page(page);
> + flush_dcache_folio(folio);
>
> - copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> - flush_dcache_page(page);
> + copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> + flush_dcache_folio(folio);
>
> status = a_ops->write_end(file, mapping, pos, bytes, copied,
> page, fsdata);
> @@ -4039,14 +4047,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> * halfway through, might be a race with munmap,
> * might be severe memory pressure.
> */
> - if (copied)
> + if (chunk > PAGE_SIZE)
> + chunk /= 2;
> + if (copied) {
> bytes = copied;
> - goto again;
> + goto retry;
> + }
> + } else {
> + pos += status;
> + written += status;
> }
> - pos += status;
> - written += status;
> -
> - balance_dirty_pages_ratelimited(mapping);
> } while (iov_iter_count(i));
>
> if (!written)
Tested with Lustre with large folios and kernel 6.6 with this patch (and suggested changes).
Tested-by: Shaun Tancheff <shaun.tancheff@hpe.com>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-06-11 10:47 ` Shaun Tancheff
@ 2024-06-11 16:13 ` Christoph Hellwig
2024-06-12 1:41 ` Shaun Tancheff
0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-06-11 16:13 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Christoph Hellwig, Trond Myklebust, Anna Schumaker,
Matthew Wilcox, linux-nfs, linux-fsdevel, linux-mm
On Tue, Jun 11, 2024 at 05:47:12PM +0700, Shaun Tancheff wrote:
>> const struct address_space_operations *a_ops = mapping->a_ops;
>> + size_t chunk = mapping_max_folio_size(mapping);
>
> Better to default chunk to PAGE_SIZE for backward compat
> + size_t chunk = PAGE_SIZE;
>
>> long status = 0;
>> ssize_t written = 0;
>>
>
> Have fs opt in to large folio support:
>
> + if (mapping_large_folio_support(mapping))
> + chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
I don't think you've actually read the code, have you?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-06-11 16:13 ` Christoph Hellwig
@ 2024-06-12 1:41 ` Shaun Tancheff
2024-06-12 4:02 ` Christoph Hellwig
0 siblings, 1 reply; 21+ messages in thread
From: Shaun Tancheff @ 2024-06-12 1:41 UTC (permalink / raw)
To: Christoph Hellwig, Shaun Tancheff
Cc: Trond Myklebust, Anna Schumaker, Matthew Wilcox, linux-nfs,
linux-fsdevel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 848 bytes --]
On 6/11/24 23:13, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 05:47:12PM +0700, Shaun Tancheff wrote:
>>> const struct address_space_operations *a_ops = mapping->a_ops;
>>> + size_t chunk = mapping_max_folio_size(mapping);
>> Better to default chunk to PAGE_SIZE for backward compat
>> + size_t chunk = PAGE_SIZE;
>>
>>> long status = 0;
>>> ssize_t written = 0;
>>>
>> Have fs opt in to large folio support:
>>
>> + if (mapping_large_folio_support(mapping))
>> + chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
> I don't think you've actually read the code, have you?
I checked from 6.6 to linux-next with this patch and my ext4 VM does not boot without the opt-in.
Almost certainly there is something I am missing, probably not looking at the correct tree.
Thanks!
--Shaun
[-- Attachment #2: Type: text/html, Size: 1608 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios
2024-06-12 1:41 ` Shaun Tancheff
@ 2024-06-12 4:02 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-06-12 4:02 UTC (permalink / raw)
To: Shaun Tancheff
Cc: Christoph Hellwig, Shaun Tancheff, Trond Myklebust,
Anna Schumaker, Matthew Wilcox, linux-nfs, linux-fsdevel,
linux-mm
On Wed, Jun 12, 2024 at 08:41:01AM +0700, Shaun Tancheff wrote:
> On 6/11/24 23:13, Christoph Hellwig wrote:
>
>> On Tue, Jun 11, 2024 at 05:47:12PM +0700, Shaun Tancheff wrote:
>>>> const struct address_space_operations *a_ops = mapping->a_ops;
>>>> + size_t chunk = mapping_max_folio_size(mapping);
>>> Better to default chunk to PAGE_SIZE for backward compat
>>> + size_t chunk = PAGE_SIZE;
>>>
>>>> long status = 0;
>>>> ssize_t written = 0;
>>>>
>>> Have fs opt in to large folio support:
>>>
>>> + if (mapping_large_folio_support(mapping))
>>> + chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
>> I don't think you've actually read the code, have you?
>
> I checked from 6.6 to linux-next with this patch and my ext4 VM does not boot without the opt-in.
Please take a look at the definition of mapping_max_folio_size,
which is called above in the quoted patch.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/2] nfs: add support for large folios
2024-05-27 16:36 support large folios for NFS Christoph Hellwig
2024-05-27 16:36 ` [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios Christoph Hellwig
@ 2024-05-27 16:36 ` Christoph Hellwig
2024-05-27 19:43 ` support large folios for NFS Sagi Grimberg
` (2 subsequent siblings)
4 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-05-27 16:36 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Matthew Wilcox
Cc: linux-nfs, linux-fsdevel, linux-mm
NFS already is void of folio size assumption, so just pass the chunk size
to __filemap_get_folio and set the large folio address_space flag for all
regular files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/nfs/file.c | 4 +++-
fs/nfs/inode.c | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 6bd127e6683dce..7f1295475a90fd 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -339,6 +339,7 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep,
void **fsdata)
{
+ fgf_t fgp = FGP_WRITEBEGIN;
struct folio *folio;
int once_thru = 0;
int ret;
@@ -346,8 +347,9 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
dfprintk(PAGECACHE, "NFS: write_begin(%pD2(%lu), %u@%lld)\n",
file, mapping->host->i_ino, len, (long long) pos);
+ fgp |= fgf_set_order(len);
start:
- folio = __filemap_get_folio(mapping, pos >> PAGE_SHIFT, FGP_WRITEBEGIN,
+ folio = __filemap_get_folio(mapping, pos >> PAGE_SHIFT, fgp,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index acef52ecb1bb7e..6d185af4cb29d4 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -491,6 +491,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
inode->i_fop = NFS_SB(sb)->nfs_client->rpc_ops->file_ops;
inode->i_data.a_ops = &nfs_file_aops;
nfs_inode_init_regular(nfsi);
+ mapping_set_large_folios(inode->i_mapping);
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = NFS_SB(sb)->nfs_client->rpc_ops->dir_inode_ops;
inode->i_fop = &nfs_dir_operations;
--
2.43.0
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: support large folios for NFS
2024-05-27 16:36 support large folios for NFS Christoph Hellwig
2024-05-27 16:36 ` [PATCH 1/2] filemap: Convert generic_perform_write() to support large folios Christoph Hellwig
2024-05-27 16:36 ` [PATCH 2/2] nfs: add support for " Christoph Hellwig
@ 2024-05-27 19:43 ` Sagi Grimberg
2024-05-28 21:05 ` Matthew Wilcox
2024-05-29 21:59 ` Trond Myklebust
4 siblings, 0 replies; 21+ messages in thread
From: Sagi Grimberg @ 2024-05-27 19:43 UTC (permalink / raw)
To: Christoph Hellwig, Trond Myklebust, Anna Schumaker, Matthew Wilcox
Cc: linux-nfs, linux-fsdevel, linux-mm
On 27/05/2024 19:36, Christoph Hellwig wrote:
> Hi all,
>
> this series adds large folio support to NFS, and almost doubles the
> buffered write throughput from the previous bottleneck of ~2.5GB/s
> (just like for other file systems).
>
> The first patch is an old one from willy that I've updated very slightly.
> Note that this update now requires the mapping_max_folio_size helper
> merged into Linus' tree only a few minutes ago.
I'll confirm that NFS buffered writes saw a dramatic >2x improvement
against my test system.
For the series:
Tested-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: support large folios for NFS
2024-05-27 16:36 support large folios for NFS Christoph Hellwig
` (2 preceding siblings ...)
2024-05-27 19:43 ` support large folios for NFS Sagi Grimberg
@ 2024-05-28 21:05 ` Matthew Wilcox
2024-05-29 5:14 ` Christoph Hellwig
2024-05-29 13:35 ` Trond Myklebust
2024-05-29 21:59 ` Trond Myklebust
4 siblings, 2 replies; 21+ messages in thread
From: Matthew Wilcox @ 2024-05-28 21:05 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-fsdevel, linux-mm
On Mon, May 27, 2024 at 06:36:07PM +0200, Christoph Hellwig wrote:
> Hi all,
>
> this series adds large folio support to NFS, and almost doubles the
> buffered write throughput from the previous bottleneck of ~2.5GB/s
> (just like for other file systems).
>
> The first patch is an old one from willy that I've updated very slightly.
> Note that this update now requires the mapping_max_folio_size helper
> merged into Linus' tree only a few minutes ago.
Kind of surprised this didn't fall over given the bugs I just sent a
patch for ... misinterpreting the folio indices seems like it should
have caused a failure in _some_ fstest.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: support large folios for NFS
2024-05-28 21:05 ` Matthew Wilcox
@ 2024-05-29 5:14 ` Christoph Hellwig
2024-05-29 13:35 ` Trond Myklebust
1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-05-29 5:14 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Trond Myklebust, Anna Schumaker, linux-nfs,
linux-fsdevel, linux-mm
On Tue, May 28, 2024 at 10:05:58PM +0100, Matthew Wilcox wrote:
> On Mon, May 27, 2024 at 06:36:07PM +0200, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series adds large folio support to NFS, and almost doubles the
> > buffered write throughput from the previous bottleneck of ~2.5GB/s
> > (just like for other file systems).
> >
> > The first patch is an old one from willy that I've updated very slightly.
> > Note that this update now requires the mapping_max_folio_size helper
> > merged into Linus' tree only a few minutes ago.
>
> Kind of surprised this didn't fall over given the bugs I just sent a
> patch for ... misinterpreting the folio indices seems like it should
> have caused a failure in _some_ fstest.
I've run quite few tests with different NFS protocol versions, and there
were no new failures, and the existing one is a MM one also reproducible
with local XFS. That's indeed a bit odd.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: support large folios for NFS
2024-05-28 21:05 ` Matthew Wilcox
2024-05-29 5:14 ` Christoph Hellwig
@ 2024-05-29 13:35 ` Trond Myklebust
1 sibling, 0 replies; 21+ messages in thread
From: Trond Myklebust @ 2024-05-29 13:35 UTC (permalink / raw)
To: hch, willy; +Cc: anna, linux-mm, linux-nfs, linux-fsdevel
On Tue, 2024-05-28 at 22:05 +0100, Matthew Wilcox wrote:
> On Mon, May 27, 2024 at 06:36:07PM +0200, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series adds large folio support to NFS, and almost doubles the
> > buffered write throughput from the previous bottleneck of ~2.5GB/s
> > (just like for other file systems).
> >
> > The first patch is an old one from willy that I've updated very
> > slightly.
> > Note that this update now requires the mapping_max_folio_size
> > helper
> > merged into Linus' tree only a few minutes ago.
>
> Kind of surprised this didn't fall over given the bugs I just sent a
> patch for ... misinterpreting the folio indices seems like it should
> have caused a failure in _some_ fstest.
Why wouldn't it work? The code you're replacing isn't assuming that
page cache indices are in units of the folio size. It is just assuming
that folio boundaries are multiples of the folio size.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: support large folios for NFS
2024-05-27 16:36 support large folios for NFS Christoph Hellwig
` (3 preceding siblings ...)
2024-05-28 21:05 ` Matthew Wilcox
@ 2024-05-29 21:59 ` Trond Myklebust
2024-05-31 6:14 ` hch
4 siblings, 1 reply; 21+ messages in thread
From: Trond Myklebust @ 2024-05-29 21:59 UTC (permalink / raw)
To: hch, anna, willy; +Cc: linux-mm, linux-nfs, linux-fsdevel
On Mon, 2024-05-27 at 18:36 +0200, Christoph Hellwig wrote:
> Hi all,
>
> this series adds large folio support to NFS, and almost doubles the
> buffered write throughput from the previous bottleneck of ~2.5GB/s
> (just like for other file systems).
>
> The first patch is an old one from willy that I've updated very
> slightly.
> Note that this update now requires the mapping_max_folio_size helper
> merged into Linus' tree only a few minutes ago.
>
> Diffstat:
> fs/nfs/file.c | 4 +++-
> fs/nfs/inode.c | 1 +
> mm/filemap.c | 40 +++++++++++++++++++++++++---------------
> 3 files changed, 29 insertions(+), 16 deletions(-)
>
Which tree did you intend to merge this through? Willy's or Anna and
mine? I'm OK either way. I just want to make sure we're on the same
page.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: support large folios for NFS
2024-05-29 21:59 ` Trond Myklebust
@ 2024-05-31 6:14 ` hch
2024-06-07 5:29 ` hch
0 siblings, 1 reply; 21+ messages in thread
From: hch @ 2024-05-31 6:14 UTC (permalink / raw)
To: Trond Myklebust; +Cc: hch, anna, willy, linux-mm, linux-nfs, linux-fsdevel
On Wed, May 29, 2024 at 09:59:44PM +0000, Trond Myklebust wrote:
> Which tree did you intend to merge this through? Willy's or Anna and
> mine? I'm OK either way. I just want to make sure we're on the same
> page.
I'm perfectly fine either way too. If willy wants to get any other
work for generic_perform_write in as per his RFC patches the pagecache
tree might be a better place, if not maybe the nfs tree.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: support large folios for NFS
2024-05-31 6:14 ` hch
@ 2024-06-07 5:29 ` hch
2024-06-07 7:57 ` Cedric Blancher
2024-06-07 15:32 ` Trond Myklebust
0 siblings, 2 replies; 21+ messages in thread
From: hch @ 2024-06-07 5:29 UTC (permalink / raw)
To: Trond Myklebust; +Cc: hch, anna, willy, linux-mm, linux-nfs, linux-fsdevel
On Fri, May 31, 2024 at 08:14:43AM +0200, hch@lst.de wrote:
> On Wed, May 29, 2024 at 09:59:44PM +0000, Trond Myklebust wrote:
> > Which tree did you intend to merge this through? Willy's or Anna and
> > mine? I'm OK either way. I just want to make sure we're on the same
> > page.
>
> I'm perfectly fine either way too. If willy wants to get any other
> work for generic_perform_write in as per his RFC patches the pagecache
> tree might be a better place, if not maybe the nfs tree.
That maintainer celebrity death match was a bit boring :) Any takers?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: support large folios for NFS
2024-06-07 5:29 ` hch
@ 2024-06-07 7:57 ` Cedric Blancher
2024-06-07 15:32 ` Trond Myklebust
1 sibling, 0 replies; 21+ messages in thread
From: Cedric Blancher @ 2024-06-07 7:57 UTC (permalink / raw)
To: hch; +Cc: Trond Myklebust, anna, willy, linux-mm, linux-nfs, linux-fsdevel
On Fri, 7 Jun 2024 at 07:29, hch@lst.de <hch@lst.de> wrote:
>
> On Fri, May 31, 2024 at 08:14:43AM +0200, hch@lst.de wrote:
> > On Wed, May 29, 2024 at 09:59:44PM +0000, Trond Myklebust wrote:
> > > Which tree did you intend to merge this through? Willy's or Anna and
> > > mine? I'm OK either way. I just want to make sure we're on the same
> > > page.
> >
> > I'm perfectly fine either way too. If willy wants to get any other
> > work for generic_perform_write in as per his RFC patches the pagecache
> > tree might be a better place, if not maybe the nfs tree.
>
> That maintainer celebrity death match was a bit boring :) Any takers?
>
As much as we like to see blood, gore, WH4K chainsawswords, ripped off
brains and ethernet cables, it would be easier (and less expensive) to
just apply the patch :)
Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: support large folios for NFS
2024-06-07 5:29 ` hch
2024-06-07 7:57 ` Cedric Blancher
@ 2024-06-07 15:32 ` Trond Myklebust
1 sibling, 0 replies; 21+ messages in thread
From: Trond Myklebust @ 2024-06-07 15:32 UTC (permalink / raw)
To: hch; +Cc: anna, linux-mm, linux-nfs, willy, linux-fsdevel
On Fri, 2024-06-07 at 07:29 +0200, hch@lst.de wrote:
> On Fri, May 31, 2024 at 08:14:43AM +0200, hch@lst.de wrote:
> > On Wed, May 29, 2024 at 09:59:44PM +0000, Trond Myklebust wrote:
> > > Which tree did you intend to merge this through? Willy's or Anna
> > > and
> > > mine? I'm OK either way. I just want to make sure we're on the
> > > same
> > > page.
> >
> > I'm perfectly fine either way too. If willy wants to get any other
> > work for generic_perform_write in as per his RFC patches the
> > pagecache
> > tree might be a better place, if not maybe the nfs tree.
>
> That maintainer celebrity death match was a bit boring :) Any
> takers?
>
🙂 We'll push them through the NFS tree.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 21+ messages in thread