[PATCH v2 0/2] Improve the tmpfs large folio read performance

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Improve the tmpfs large folio read performance
@ 2024-10-18  3:00 Baolin Wang
  2024-10-18  3:00 ` [PATCH v2 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
  2024-10-18  3:00 ` [PATCH v2 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Baolin Wang @ 2024-10-18  3:00 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, wangkefeng.wang, shy828301, baolin.wang, linux-mm,
	linux-kernel

The tmpfs has already supported the PMD-sized large folios, but the tmpfs
read operation still performs copying at the PAGE SIZE granularity, which
is not perfect. This patch changes to copy data at the folio granularity,
which can improve the read performance.

Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
see about 20% performance improvement, and no regression with bs=4k. I
also did some functional test with the xfstests suite, and I did not find
any regressions with the following xfstests config.
  FSTYP=tmpfs
  export TEST_DIR=/mnt/tempfs_mnt
  export TEST_DEV=/mnt/tempfs_mnt
  export SCRATCH_MNT=/mnt/scratchdir
  export SCRATCH_DEV=/mnt/scratchdir

Changes from v1:
 - Move index calculation to the appropriate place, per Kefeng.
 - Fallback to page copy if large folio has poisoned subpages, suggested
   by Matthew and Yang.

Baolin Wang (2):
  mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  mm: shmem: improve the tmpfs large folio read performance

 mm/shmem.c | 65 +++++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 32 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  2024-10-18  3:00 [PATCH v2 0/2] Improve the tmpfs large folio read performance Baolin Wang
@ 2024-10-18  3:00 ` Baolin Wang
  2024-10-18 18:01   ` Yang Shi
  2024-10-18  3:00 ` [PATCH v2 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Baolin Wang @ 2024-10-18  3:00 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, wangkefeng.wang, shy828301, baolin.wang, linux-mm,
	linux-kernel

Use iocb->ki_pos to check if the read bytes exceeds the file size and to
calculate the bytes to be read can help simplify the code logic. Meanwhile,
this is also a preparation for improving tmpfs large folios read performace
in the following patch.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/shmem.c | 35 +++++++++++------------------------
 1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 66eae800ffab..93642aa8d1aa 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3106,27 +3106,19 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	unsigned long offset;
 	int error = 0;
 	ssize_t retval = 0;
-	loff_t *ppos = &iocb->ki_pos;
 
-	index = *ppos >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	offset = iocb->ki_pos & ~PAGE_MASK;
 
 	for (;;) {
 		struct folio *folio = NULL;
 		struct page *page = NULL;
-		pgoff_t end_index;
 		unsigned long nr, ret;
-		loff_t i_size = i_size_read(inode);
+		loff_t end_offset, i_size = i_size_read(inode);
 
-		end_index = i_size >> PAGE_SHIFT;
-		if (index > end_index)
+		if (unlikely(iocb->ki_pos >= i_size))
 			break;
-		if (index == end_index) {
-			nr = i_size & ~PAGE_MASK;
-			if (nr <= offset)
-				break;
-		}
 
+		index = iocb->ki_pos >> PAGE_SHIFT;
 		error = shmem_get_folio(inode, index, 0, &folio, SGP_READ);
 		if (error) {
 			if (error == -EINVAL)
@@ -3148,18 +3140,14 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		 * We must evaluate after, since reads (unlike writes)
 		 * are called without i_rwsem protection against truncate
 		 */
-		nr = PAGE_SIZE;
 		i_size = i_size_read(inode);
-		end_index = i_size >> PAGE_SHIFT;
-		if (index == end_index) {
-			nr = i_size & ~PAGE_MASK;
-			if (nr <= offset) {
-				if (folio)
-					folio_put(folio);
-				break;
-			}
+		if (unlikely(iocb->ki_pos >= i_size)) {
+			if (folio)
+				folio_put(folio);
+			break;
 		}
-		nr -= offset;
+		end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
+		nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
 
 		if (folio) {
 			/*
@@ -3199,8 +3187,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 		retval += ret;
 		offset += ret;
-		index += offset >> PAGE_SHIFT;
 		offset &= ~PAGE_MASK;
+		iocb->ki_pos += ret;
 
 		if (!iov_iter_count(to))
 			break;
@@ -3211,7 +3199,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		cond_resched();
 	}
 
-	*ppos = ((loff_t) index << PAGE_SHIFT) + offset;
 	file_accessed(file);
 	return retval ? retval : error;
 }
-- 
2.39.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-18  3:00 [PATCH v2 0/2] Improve the tmpfs large folio read performance Baolin Wang
  2024-10-18  3:00 ` [PATCH v2 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
@ 2024-10-18  3:00 ` Baolin Wang
  2024-10-18 18:38   ` Yang Shi
  1 sibling, 1 reply; 5+ messages in thread
From: Baolin Wang @ 2024-10-18  3:00 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, wangkefeng.wang, shy828301, baolin.wang, linux-mm,
	linux-kernel

The tmpfs has already supported the PMD-sized large folios, but the tmpfs
read operation still performs copying at the PAGE SIZE granularity, which
is unreasonable. This patch changes to copy data at the folio granularity,
which can improve the read performance, as well as changing to use folio
related functions.

Moreoever, if a large folio has a subpage that is hwpoisoned, it will still
fallback to page granularity copying.

Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
see about 20% performance improvement, and no regression with bs=4k.
Before the patch:
READ: bw=10.0GiB/s

After the patch:
READ: bw=12.0GiB/s

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/shmem.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 93642aa8d1aa..cbefd9801f6b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3107,13 +3107,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	int error = 0;
 	ssize_t retval = 0;
 
-	offset = iocb->ki_pos & ~PAGE_MASK;
-
 	for (;;) {
 		struct folio *folio = NULL;
 		struct page *page = NULL;
 		unsigned long nr, ret;
 		loff_t end_offset, i_size = i_size_read(inode);
+		bool fallback_page_copy = false;
+		size_t fsize;
 
 		if (unlikely(iocb->ki_pos >= i_size))
 			break;
@@ -3134,6 +3134,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 				error = -EIO;
 				break;
 			}
+
+			if (folio_test_large(folio) &&
+			    folio_test_has_hwpoisoned(folio))
+				fallback_page_copy = true;
 		}
 
 		/*
@@ -3147,7 +3151,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			break;
 		}
 		end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
-		nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
+		if (folio && likely(!fallback_page_copy))
+			fsize = folio_size(folio);
+		else
+			fsize = PAGE_SIZE;
+		offset = iocb->ki_pos & (fsize - 1);
+		nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset);
 
 		if (folio) {
 			/*
@@ -3155,10 +3164,15 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			 * virtual addresses, take care about potential aliasing
 			 * before reading the page on the kernel side.
 			 */
-			if (mapping_writably_mapped(mapping))
-				flush_dcache_page(page);
+			if (mapping_writably_mapped(mapping)) {
+				if (likely(!fallback_page_copy))
+					flush_dcache_folio(folio);
+				else
+					flush_dcache_page(page);
+			}
+
 			/*
-			 * Mark the page accessed if we read the beginning.
+			 * Mark the folio accessed if we read the beginning.
 			 */
 			if (!offset)
 				folio_mark_accessed(folio);
@@ -3166,9 +3180,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			 * Ok, we have the page, and it's up-to-date, so
 			 * now we can copy it to user space...
 			 */
-			ret = copy_page_to_iter(page, offset, nr, to);
+			if (likely(!fallback_page_copy))
+				ret = copy_folio_to_iter(folio, offset, nr, to);
+			else
+				ret = copy_page_to_iter(page, offset, nr, to);
 			folio_put(folio);
-
 		} else if (user_backed_iter(to)) {
 			/*
 			 * Copy to user tends to be so well optimized, but
@@ -3186,8 +3202,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		}
 
 		retval += ret;
-		offset += ret;
-		offset &= ~PAGE_MASK;
 		iocb->ki_pos += ret;
 
 		if (!iov_iter_count(to))
-- 
2.39.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  2024-10-18  3:00 ` [PATCH v2 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
@ 2024-10-18 18:01   ` Yang Shi
  0 siblings, 0 replies; 5+ messages in thread
From: Yang Shi @ 2024-10-18 18:01 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, willy, david, wangkefeng.wang, linux-mm, linux-kernel

On Thu, Oct 17, 2024 at 8:00 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> Use iocb->ki_pos to check if the read bytes exceeds the file size and to
> calculate the bytes to be read can help simplify the code logic. Meanwhile,
> this is also a preparation for improving tmpfs large folios read performace

s/performace/performance

> in the following patch.
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

The patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/shmem.c | 35 +++++++++++------------------------
>  1 file changed, 11 insertions(+), 24 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 66eae800ffab..93642aa8d1aa 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3106,27 +3106,19 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>         unsigned long offset;
>         int error = 0;
>         ssize_t retval = 0;
> -       loff_t *ppos = &iocb->ki_pos;
>
> -       index = *ppos >> PAGE_SHIFT;
> -       offset = *ppos & ~PAGE_MASK;
> +       offset = iocb->ki_pos & ~PAGE_MASK;
>
>         for (;;) {
>                 struct folio *folio = NULL;
>                 struct page *page = NULL;
> -               pgoff_t end_index;
>                 unsigned long nr, ret;
> -               loff_t i_size = i_size_read(inode);
> +               loff_t end_offset, i_size = i_size_read(inode);
>
> -               end_index = i_size >> PAGE_SHIFT;
> -               if (index > end_index)
> +               if (unlikely(iocb->ki_pos >= i_size))
>                         break;
> -               if (index == end_index) {
> -                       nr = i_size & ~PAGE_MASK;
> -                       if (nr <= offset)
> -                               break;
> -               }
>
> +               index = iocb->ki_pos >> PAGE_SHIFT;
>                 error = shmem_get_folio(inode, index, 0, &folio, SGP_READ);
>                 if (error) {
>                         if (error == -EINVAL)
> @@ -3148,18 +3140,14 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                  * We must evaluate after, since reads (unlike writes)
>                  * are called without i_rwsem protection against truncate
>                  */
> -               nr = PAGE_SIZE;
>                 i_size = i_size_read(inode);
> -               end_index = i_size >> PAGE_SHIFT;
> -               if (index == end_index) {
> -                       nr = i_size & ~PAGE_MASK;
> -                       if (nr <= offset) {
> -                               if (folio)
> -                                       folio_put(folio);
> -                               break;
> -                       }
> +               if (unlikely(iocb->ki_pos >= i_size)) {
> +                       if (folio)
> +                               folio_put(folio);
> +                       break;
>                 }
> -               nr -= offset;
> +               end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
> +               nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
>
>                 if (folio) {
>                         /*
> @@ -3199,8 +3187,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>
>                 retval += ret;
>                 offset += ret;
> -               index += offset >> PAGE_SHIFT;
>                 offset &= ~PAGE_MASK;
> +               iocb->ki_pos += ret;
>
>                 if (!iov_iter_count(to))
>                         break;
> @@ -3211,7 +3199,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                 cond_resched();
>         }
>
> -       *ppos = ((loff_t) index << PAGE_SHIFT) + offset;
>         file_accessed(file);
>         return retval ? retval : error;
>  }
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-18  3:00 ` [PATCH v2 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
@ 2024-10-18 18:38   ` Yang Shi
  0 siblings, 0 replies; 5+ messages in thread
From: Yang Shi @ 2024-10-18 18:38 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, willy, david, wangkefeng.wang, linux-mm, linux-kernel

On Thu, Oct 17, 2024 at 8:00 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> The tmpfs has already supported the PMD-sized large folios, but the tmpfs
> read operation still performs copying at the PAGE SIZE granularity, which
> is unreasonable. This patch changes to copy data at the folio granularity,
> which can improve the read performance, as well as changing to use folio
> related functions.
>
> Moreoever, if a large folio has a subpage that is hwpoisoned, it will still
> fallback to page granularity copying.

s/Moreoever/Moreover

>
> Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
> see about 20% performance improvement, and no regression with bs=4k.
> Before the patch:
> READ: bw=10.0GiB/s
>
> After the patch:
> READ: bw=12.0GiB/s
>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

The patch looks fine to me. Reviewed-by: Yang Shi <shy828301@gmail.com>


> ---
>  mm/shmem.c | 34 ++++++++++++++++++++++++----------
>  1 file changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 93642aa8d1aa..cbefd9801f6b 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3107,13 +3107,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>         int error = 0;
>         ssize_t retval = 0;
>
> -       offset = iocb->ki_pos & ~PAGE_MASK;
> -
>         for (;;) {
>                 struct folio *folio = NULL;
>                 struct page *page = NULL;
>                 unsigned long nr, ret;
>                 loff_t end_offset, i_size = i_size_read(inode);
> +               bool fallback_page_copy = false;
> +               size_t fsize;
>
>                 if (unlikely(iocb->ki_pos >= i_size))
>                         break;
> @@ -3134,6 +3134,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                                 error = -EIO;
>                                 break;
>                         }
> +
> +                       if (folio_test_large(folio) &&
> +                           folio_test_has_hwpoisoned(folio))
> +                               fallback_page_copy = true;
>                 }
>
>                 /*
> @@ -3147,7 +3151,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                         break;
>                 }
>                 end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
> -               nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
> +               if (folio && likely(!fallback_page_copy))
> +                       fsize = folio_size(folio);
> +               else
> +                       fsize = PAGE_SIZE;
> +               offset = iocb->ki_pos & (fsize - 1);
> +               nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset);
>
>                 if (folio) {
>                         /*
> @@ -3155,10 +3164,15 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                          * virtual addresses, take care about potential aliasing
>                          * before reading the page on the kernel side.
>                          */
> -                       if (mapping_writably_mapped(mapping))
> -                               flush_dcache_page(page);
> +                       if (mapping_writably_mapped(mapping)) {
> +                               if (likely(!fallback_page_copy))
> +                                       flush_dcache_folio(folio);
> +                               else
> +                                       flush_dcache_page(page);
> +                       }
> +
>                         /*
> -                        * Mark the page accessed if we read the beginning.
> +                        * Mark the folio accessed if we read the beginning.
>                          */
>                         if (!offset)
>                                 folio_mark_accessed(folio);
> @@ -3166,9 +3180,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                          * Ok, we have the page, and it's up-to-date, so
>                          * now we can copy it to user space...
>                          */
> -                       ret = copy_page_to_iter(page, offset, nr, to);
> +                       if (likely(!fallback_page_copy))
> +                               ret = copy_folio_to_iter(folio, offset, nr, to);
> +                       else
> +                               ret = copy_page_to_iter(page, offset, nr, to);
>                         folio_put(folio);
> -
>                 } else if (user_backed_iter(to)) {
>                         /*
>                          * Copy to user tends to be so well optimized, but
> @@ -3186,8 +3202,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>                 }
>
>                 retval += ret;
> -               offset += ret;
> -               offset &= ~PAGE_MASK;
>                 iocb->ki_pos += ret;
>
>                 if (!iov_iter_count(to))
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-18 18:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-18  3:00 [PATCH v2 0/2] Improve the tmpfs large folio read performance Baolin Wang
2024-10-18  3:00 ` [PATCH v2 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
2024-10-18 18:01   ` Yang Shi
2024-10-18  3:00 ` [PATCH v2 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
2024-10-18 18:38   ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox