linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 1/2] filemap: add helper mapping_max_folio_size()
@ 2024-05-21 11:49 Xu Yang
  2024-05-21 11:49 ` [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings Xu Yang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Xu Yang @ 2024-05-21 11:49 UTC (permalink / raw)
  To: brauner, djwong, willy, akpm; +Cc: linux-xfs, linux-fsdevel, linux-mm, jun.li

Add mapping_max_folio_size() to get the maximum folio size for this
pagecache mapping.

Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>

---
Changes in v4:
 - split the mapping_max_folio_size() from v3
 - adjust mapping_max_folio_size() comment
 - add Rb tag
Changes in v5:
 - add fix tag and stable list
---
 include/linux/pagemap.h | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c5e33e2ca48a..d43cf36bb68b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -346,6 +346,19 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
+/*
+ * There are some parts of the kernel which assume that PMD entries
+ * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
+ * limit the maximum allocation order to PMD size.  I'm not aware of any
+ * assumptions about maximum order if THP are disabled, but 8 seems like
+ * a good order (that's 1MB if you're using 4kB pages)
+ */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
+#else
+#define MAX_PAGECACHE_ORDER	8
+#endif
+
 /**
  * mapping_set_large_folios() - Indicate the file supports large folios.
  * @mapping: The file.
@@ -372,6 +385,14 @@ static inline bool mapping_large_folio_support(struct address_space *mapping)
 		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
 }
 
+/* Return the maximum folio size for this pagecache mapping, in bytes. */
+static inline size_t mapping_max_folio_size(struct address_space *mapping)
+{
+	if (mapping_large_folio_support(mapping))
+		return PAGE_SIZE << MAX_PAGECACHE_ORDER;
+	return PAGE_SIZE;
+}
+
 static inline int filemap_nr_thps(struct address_space *mapping)
 {
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
@@ -530,19 +551,6 @@ static inline void *detach_page_private(struct page *page)
 	return folio_detach_private(page_folio(page));
 }
 
-/*
- * There are some parts of the kernel which assume that PMD entries
- * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
- * limit the maximum allocation order to PMD size.  I'm not aware of any
- * assumptions about maximum order if THP are disabled, but 8 seems like
- * a good order (that's 1MB if you're using 4kB pages)
- */
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
-#else
-#define MAX_PAGECACHE_ORDER	8
-#endif
-
 #ifdef CONFIG_NUMA
 struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
 #else
-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings
  2024-05-21 11:49 [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Xu Yang
@ 2024-05-21 11:49 ` Xu Yang
  2024-05-24  8:03   ` Christoph Hellwig
  2024-05-24 12:18   ` Matthew Wilcox
  2024-05-21 14:22 ` [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Christian Brauner
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: Xu Yang @ 2024-05-21 11:49 UTC (permalink / raw)
  To: brauner, djwong, willy, akpm; +Cc: linux-xfs, linux-fsdevel, linux-mm, jun.li

Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
mapping doesn't support large folio, only one page of maximum 4KB will
be created and 4KB data will be writen to pagecache each time. Then,
next 4KB will be handled in next iteration. This will cause potential
write performance problem.

If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again:

start         	end
-
buf,    	buf+2MB
buf+4KB, 	buf+2MB
buf+8KB, 	buf+2MB
...
buf+2044KB 	buf+2MB

Obviously the checking size is wrong since only 4KB will be handled each
time. So this will get a correct chunk to let iomap work well in non-large
folio case.

With this change, the write speed will be stable. Tested on ARM64 device.

Before:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)

After:

 - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
 - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)

Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>

---
Changes in v2:
 - fix address space description in message
Changes in v3:
 - adjust 'chunk' and add mapping_max_folio_size() in header file
   as suggested by Matthew
 - add write performance results in commit message
Changes in v4:
 - split mapping_max_folio_size() into a single patch 1/2
 - adjust subject
 - add Rb tag
Changes in v5:
 - no change
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 41c8f0c68ef5..c5802a459334 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -898,11 +898,11 @@ static bool iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
 static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 {
 	loff_t length = iomap_length(iter);
-	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
 	loff_t pos = iter->pos;
 	ssize_t total_written = 0;
 	long status = 0;
 	struct address_space *mapping = iter->inode->i_mapping;
+	size_t chunk = mapping_max_folio_size(mapping);
 	unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
 
 	do {
-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] filemap: add helper mapping_max_folio_size()
  2024-05-21 11:49 [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Xu Yang
  2024-05-21 11:49 ` [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings Xu Yang
@ 2024-05-21 14:22 ` Christian Brauner
  2024-05-24  6:21   ` Ritesh Harjani
  2024-05-24  8:03 ` Christoph Hellwig
  2024-05-24 12:17 ` Matthew Wilcox
  3 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2024-05-21 14:22 UTC (permalink / raw)
  To: Xu Yang
  Cc: Christian Brauner, linux-xfs, linux-fsdevel, linux-mm, jun.li,
	djwong, willy, akpm

On Tue, 21 May 2024 19:49:38 +0800, Xu Yang wrote:
> Add mapping_max_folio_size() to get the maximum folio size for this
> pagecache mapping.
> 
> 

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/2] filemap: add helper mapping_max_folio_size()
      https://git.kernel.org/vfs/vfs/c/0c31d63eebdd
[2/2] iomap: fault in smaller chunks for non-large folio mappings
      https://git.kernel.org/vfs/vfs/c/63ba6f07d115


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] filemap: add helper mapping_max_folio_size()
  2024-05-21 14:22 ` [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Christian Brauner
@ 2024-05-24  6:21   ` Ritesh Harjani
  0 siblings, 0 replies; 8+ messages in thread
From: Ritesh Harjani @ 2024-05-24  6:21 UTC (permalink / raw)
  To: Christian Brauner, Xu Yang
  Cc: Christian Brauner, linux-xfs, linux-fsdevel, linux-mm, jun.li,
	djwong, willy, akpm

Christian Brauner <brauner@kernel.org> writes:

> On Tue, 21 May 2024 19:49:38 +0800, Xu Yang wrote:
>> Add mapping_max_folio_size() to get the maximum folio size for this
>> pagecache mapping.
>> 
>> 
>
> Applied to the vfs.fixes branch of the vfs/vfs.git tree.
> Patches in the vfs.fixes branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.

iomap_write_iter() prefaults in userspace buffer chunk bytes (order
MAX_PAGECACHE_ORDER) at a time.
However, the iomap_write_begin() function only handles writes for
PAGE_SIZE bytes at a time for mappings which does not support large
folios. Hence, this causes unnecessary loops of prefaults in
iomap_write_iter() -> fault_in_iov_iter_readable(), causing performance
hits for block device mappings.

This patch fixes iomap_write_iter() to prefault in PAGE_SIZE chunk
bytes.

I guess this change will then go back to v6.6 when large folios got added to iomap.

Looks good to me. Please feel free add - 
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.fixes
>
> [1/2] filemap: add helper mapping_max_folio_size()
>       https://git.kernel.org/vfs/vfs/c/0c31d63eebdd
> [2/2] iomap: fault in smaller chunks for non-large folio mappings
>       https://git.kernel.org/vfs/vfs/c/63ba6f07d115


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] filemap: add helper mapping_max_folio_size()
  2024-05-21 11:49 [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Xu Yang
  2024-05-21 11:49 ` [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings Xu Yang
  2024-05-21 14:22 ` [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Christian Brauner
@ 2024-05-24  8:03 ` Christoph Hellwig
  2024-05-24 12:17 ` Matthew Wilcox
  3 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-05-24  8:03 UTC (permalink / raw)
  To: Xu Yang
  Cc: brauner, djwong, willy, akpm, linux-xfs, linux-fsdevel, linux-mm, jun.li

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings
  2024-05-21 11:49 ` [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings Xu Yang
@ 2024-05-24  8:03   ` Christoph Hellwig
  2024-05-24 12:18   ` Matthew Wilcox
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-05-24  8:03 UTC (permalink / raw)
  To: Xu Yang
  Cc: brauner, djwong, willy, akpm, linux-xfs, linux-fsdevel, linux-mm, jun.li

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] filemap: add helper mapping_max_folio_size()
  2024-05-21 11:49 [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Xu Yang
                   ` (2 preceding siblings ...)
  2024-05-24  8:03 ` Christoph Hellwig
@ 2024-05-24 12:17 ` Matthew Wilcox
  3 siblings, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2024-05-24 12:17 UTC (permalink / raw)
  To: Xu Yang; +Cc: brauner, djwong, akpm, linux-xfs, linux-fsdevel, linux-mm, jun.li

On Tue, May 21, 2024 at 07:49:38PM +0800, Xu Yang wrote:
> Add mapping_max_folio_size() to get the maximum folio size for this
> pagecache mapping.
> 
> Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> Cc: stable@vger.kernel.org
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Xu Yang <xu.yang_2@nxp.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings
  2024-05-21 11:49 ` [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings Xu Yang
  2024-05-24  8:03   ` Christoph Hellwig
@ 2024-05-24 12:18   ` Matthew Wilcox
  1 sibling, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2024-05-24 12:18 UTC (permalink / raw)
  To: Xu Yang; +Cc: brauner, djwong, akpm, linux-xfs, linux-fsdevel, linux-mm, jun.li

On Tue, May 21, 2024 at 07:49:39PM +0800, Xu Yang wrote:
> Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"),
> iomap will try to copy in larger chunks than PAGE_SIZE. However, if the
> mapping doesn't support large folio, only one page of maximum 4KB will
> be created and 4KB data will be writen to pagecache each time. Then,
> next 4KB will be handled in next iteration. This will cause potential
> write performance problem.
> 
> If chunk is 2MB, total 512 pages need to be handled finally. During this
> period, fault_in_iov_iter_readable() is called to check iov_iter readable
> validity. Since only 4KB will be handled each time, below address space
> will be checked over and over again:
> 
> start         	end
> -
> buf,    	buf+2MB
> buf+4KB, 	buf+2MB
> buf+8KB, 	buf+2MB
> ...
> buf+2044KB 	buf+2MB
> 
> Obviously the checking size is wrong since only 4KB will be handled each
> time. So this will get a correct chunk to let iomap work well in non-large
> folio case.
> 
> With this change, the write speed will be stable. Tested on ARM64 device.
> 
> Before:
> 
>  - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (334 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (278 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (204 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (170 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (150 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (139 MB/s)
> 
> After:
> 
>  - dd if=/dev/zero of=/dev/sda bs=400K  count=10485  (339 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=800K  count=5242   (330 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=1600K count=2621   (332 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=2200K count=1906   (333 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=3000K count=1398   (333 MB/s)
>  - dd if=/dev/zero of=/dev/sda bs=4500K count=932    (333 MB/s)
> 
> Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> Cc: stable@vger.kernel.org
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Xu Yang <xu.yang_2@nxp.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-24 12:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-21 11:49 [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Xu Yang
2024-05-21 11:49 ` [PATCH v5 2/2] iomap: fault in smaller chunks for non-large folio mappings Xu Yang
2024-05-24  8:03   ` Christoph Hellwig
2024-05-24 12:18   ` Matthew Wilcox
2024-05-21 14:22 ` [PATCH v5 1/2] filemap: add helper mapping_max_folio_size() Christian Brauner
2024-05-24  6:21   ` Ritesh Harjani
2024-05-24  8:03 ` Christoph Hellwig
2024-05-24 12:17 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox