[PATCH 0/2] Improve the tmpfs large folio read performance

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Improve the tmpfs large folio read performance
@ 2024-10-16 10:09 Baolin Wang
  2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Baolin Wang @ 2024-10-16 10:09 UTC (permalink / raw)
  To: akpm, hughd; +Cc: willy, david, baolin.wang, linux-mm, linux-kernel

The tmpfs has already supported the PMD-sized large folios, but the tmpfs
read operation still performs copying at the PAGE SIZE granularity, which
is not perfect. This patch changes to copy data at the folio granularity,
which can improve the read performance.

Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
see about 20% performance improvement, and no regression with bs=4k. I
also did some functional test with the xfstests suite, and I did not find
any regressions with the following xfstests config.
  FSTYP=tmpfs
  export TEST_DIR=/mnt/tempfs_mnt
  export TEST_DEV=/mnt/tempfs_mnt
  export SCRATCH_MNT=/mnt/scratchdir
  export SCRATCH_DEV=/mnt/scratchdir

Baolin Wang (2):
  mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  mm: shmem: improve the tmpfs large folio read performance

 mm/shmem.c | 54 ++++++++++++++++++++++--------------------------------
 1 file changed, 22 insertions(+), 32 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang
@ 2024-10-16 10:09 ` Baolin Wang
  2024-10-16 12:34   ` Kefeng Wang
  2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
  2024-10-16 11:47 ` [PATCH 0/2] Improve " Kefeng Wang
  2 siblings, 1 reply; 13+ messages in thread
From: Baolin Wang @ 2024-10-16 10:09 UTC (permalink / raw)
  To: akpm, hughd; +Cc: willy, david, baolin.wang, linux-mm, linux-kernel

Use iocb->ki_pos to check if the read bytes exceeds the file size and to
calculate the bytes to be read can help simplify the code logic. Meanwhile,
this is also a preparation for improving tmpfs large folios read performace
in the following patch.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/shmem.c | 36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 66eae800ffab..edab02a26aac 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3106,26 +3106,18 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	unsigned long offset;
 	int error = 0;
 	ssize_t retval = 0;
-	loff_t *ppos = &iocb->ki_pos;
 
-	index = *ppos >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	index = iocb->ki_pos >> PAGE_SHIFT;
+	offset = iocb->ki_pos & ~PAGE_MASK;
 
 	for (;;) {
 		struct folio *folio = NULL;
 		struct page *page = NULL;
-		pgoff_t end_index;
 		unsigned long nr, ret;
-		loff_t i_size = i_size_read(inode);
+		loff_t end_offset, i_size = i_size_read(inode);
 
-		end_index = i_size >> PAGE_SHIFT;
-		if (index > end_index)
+		if (unlikely(iocb->ki_pos >= i_size))
 			break;
-		if (index == end_index) {
-			nr = i_size & ~PAGE_MASK;
-			if (nr <= offset)
-				break;
-		}
 
 		error = shmem_get_folio(inode, index, 0, &folio, SGP_READ);
 		if (error) {
@@ -3148,18 +3140,14 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		 * We must evaluate after, since reads (unlike writes)
 		 * are called without i_rwsem protection against truncate
 		 */
-		nr = PAGE_SIZE;
 		i_size = i_size_read(inode);
-		end_index = i_size >> PAGE_SHIFT;
-		if (index == end_index) {
-			nr = i_size & ~PAGE_MASK;
-			if (nr <= offset) {
-				if (folio)
-					folio_put(folio);
-				break;
-			}
+		if (unlikely(iocb->ki_pos >= i_size)) {
+			if (folio)
+				folio_put(folio);
+			break;
 		}
-		nr -= offset;
+		end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
+		nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
 
 		if (folio) {
 			/*
@@ -3199,8 +3187,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 		retval += ret;
 		offset += ret;
-		index += offset >> PAGE_SHIFT;
 		offset &= ~PAGE_MASK;
+		iocb->ki_pos += ret;
+		index = iocb->ki_pos >> PAGE_SHIFT;
 
 		if (!iov_iter_count(to))
 			break;
@@ -3211,7 +3200,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		cond_resched();
 	}
 
-	*ppos = ((loff_t) index << PAGE_SHIFT) + offset;
 	file_accessed(file);
 	return retval ? retval : error;
 }
-- 
2.39.3



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
@ 2024-10-16 12:34   ` Kefeng Wang
  2024-10-17  2:45     ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2024-10-16 12:34 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel



On 2024/10/16 18:09, Baolin Wang wrote:
> Use iocb->ki_pos to check if the read bytes exceeds the file size and to
> calculate the bytes to be read can help simplify the code logic. Meanwhile,
> this is also a preparation for improving tmpfs large folios read performace
> in the following patch.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>   mm/shmem.c | 36 ++++++++++++------------------------
>   1 file changed, 12 insertions(+), 24 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 66eae800ffab..edab02a26aac 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3106,26 +3106,18 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   	unsigned long offset;
>   	int error = 0;
>   	ssize_t retval = 0;
> -	loff_t *ppos = &iocb->ki_pos;
>   
> -	index = *ppos >> PAGE_SHIFT;
> -	offset = *ppos & ~PAGE_MASK;
> +	index = iocb->ki_pos >> PAGE_SHIFT;

index calculate could be moved before shmem_get_folio(), then...

> +	offset = iocb->ki_pos & ~PAGE_MASK;
>   
>   	for (;;) {
>   		struct folio *folio = NULL;
>   		struct page *page = NULL;
> -		pgoff_t end_index;
>   		unsigned long nr, ret;
> -		loff_t i_size = i_size_read(inode);
> +		loff_t end_offset, i_size = i_size_read(inode);
>   
> -		end_index = i_size >> PAGE_SHIFT;
> -		if (index > end_index)
> +		if (unlikely(iocb->ki_pos >= i_size))
>   			break;
> -		if (index == end_index) {
> -			nr = i_size & ~PAGE_MASK;
> -			if (nr <= offset)
> -				break;
> -		}
>   
>   		error = shmem_get_folio(inode, index, 0, &folio, SGP_READ);
>   		if (error) {
> @@ -3148,18 +3140,14 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   		 * We must evaluate after, since reads (unlike writes)
>   		 * are called without i_rwsem protection against truncate
>   		 */
> -		nr = PAGE_SIZE;
>   		i_size = i_size_read(inode);
> -		end_index = i_size >> PAGE_SHIFT;
> -		if (index == end_index) {
> -			nr = i_size & ~PAGE_MASK;
> -			if (nr <= offset) {
> -				if (folio)
> -					folio_put(folio);
> -				break;
> -			}
> +		if (unlikely(iocb->ki_pos >= i_size)) {
> +			if (folio)
> +				folio_put(folio);
> +			break;
>   		}
> -		nr -= offset;
> +		end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
> +		nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
>   
>   		if (folio) {
>   			/*
> @@ -3199,8 +3187,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   
>   		retval += ret;
>   		offset += ret;
> -		index += offset >> PAGE_SHIFT;
>   		offset &= ~PAGE_MASK;
> +		iocb->ki_pos += ret;
> +		index = iocb->ki_pos >> PAGE_SHIFT;

remove this line.

>   
>   		if (!iov_iter_count(to))
>   			break;
> @@ -3211,7 +3200,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   		cond_resched();
>   	}
>   
> -	*ppos = ((loff_t) index << PAGE_SHIFT) + offset;
>   	file_accessed(file);
>   	return retval ? retval : error;
>   }



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
  2024-10-16 12:34   ` Kefeng Wang
@ 2024-10-17  2:45     ` Baolin Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Baolin Wang @ 2024-10-17  2:45 UTC (permalink / raw)
  To: Kefeng Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel



On 2024/10/16 20:34, Kefeng Wang wrote:
> 
> 
> On 2024/10/16 18:09, Baolin Wang wrote:
>> Use iocb->ki_pos to check if the read bytes exceeds the file size and to
>> calculate the bytes to be read can help simplify the code logic. 
>> Meanwhile,
>> this is also a preparation for improving tmpfs large folios read 
>> performace
>> in the following patch.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/shmem.c | 36 ++++++++++++------------------------
>>   1 file changed, 12 insertions(+), 24 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 66eae800ffab..edab02a26aac 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3106,26 +3106,18 @@ static ssize_t shmem_file_read_iter(struct 
>> kiocb *iocb, struct iov_iter *to)
>>       unsigned long offset;
>>       int error = 0;
>>       ssize_t retval = 0;
>> -    loff_t *ppos = &iocb->ki_pos;
>> -    index = *ppos >> PAGE_SHIFT;
>> -    offset = *ppos & ~PAGE_MASK;
>> +    index = iocb->ki_pos >> PAGE_SHIFT;
> 
> index calculate could be moved before shmem_get_folio(), then...

Yes. Will do.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang
  2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
@ 2024-10-16 10:09 ` Baolin Wang
  2024-10-16 12:36   ` Kefeng Wang
  2024-10-16 15:37   ` Matthew Wilcox
  2024-10-16 11:47 ` [PATCH 0/2] Improve " Kefeng Wang
  2 siblings, 2 replies; 13+ messages in thread
From: Baolin Wang @ 2024-10-16 10:09 UTC (permalink / raw)
  To: akpm, hughd; +Cc: willy, david, baolin.wang, linux-mm, linux-kernel

The tmpfs has already supported the PMD-sized large folios, but the tmpfs
read operation still performs copying at the PAGE SIZE granularity, which
is unreasonable. This patch changes to copy data at the folio granularity,
which can improve the read performance, as well as changing to use folio
related functions.

Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
see about 20% performance improvement, and no regression with bs=4k.
Before the patch:
READ: bw=10.0GiB/s

After the patch:
READ: bw=12.0GiB/s

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/shmem.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index edab02a26aac..7e79b6a96da0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3108,13 +3108,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ssize_t retval = 0;
 
 	index = iocb->ki_pos >> PAGE_SHIFT;
-	offset = iocb->ki_pos & ~PAGE_MASK;
 
 	for (;;) {
 		struct folio *folio = NULL;
-		struct page *page = NULL;
 		unsigned long nr, ret;
 		loff_t end_offset, i_size = i_size_read(inode);
+		size_t fsize;
 
 		if (unlikely(iocb->ki_pos >= i_size))
 			break;
@@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		if (folio) {
 			folio_unlock(folio);
 
-			page = folio_file_page(folio, index);
-			if (PageHWPoison(page)) {
+			if (folio_test_hwpoison(folio) ||
+			    (folio_test_large(folio) &&
+			     folio_test_has_hwpoisoned(folio))) {
 				folio_put(folio);
 				error = -EIO;
 				break;
@@ -3147,7 +3147,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			break;
 		}
 		end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
-		nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
+		if (folio)
+			fsize = folio_size(folio);
+		else
+			fsize = PAGE_SIZE;
+		offset = iocb->ki_pos & (fsize - 1);
+		nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset);
 
 		if (folio) {
 			/*
@@ -3156,7 +3161,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			 * before reading the page on the kernel side.
 			 */
 			if (mapping_writably_mapped(mapping))
-				flush_dcache_page(page);
+				flush_dcache_folio(folio);
 			/*
 			 * Mark the page accessed if we read the beginning.
 			 */
@@ -3166,9 +3171,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			 * Ok, we have the page, and it's up-to-date, so
 			 * now we can copy it to user space...
 			 */
-			ret = copy_page_to_iter(page, offset, nr, to);
+			ret = copy_folio_to_iter(folio, offset, nr, to);
 			folio_put(folio);
-
 		} else if (user_backed_iter(to)) {
 			/*
 			 * Copy to user tends to be so well optimized, but
@@ -3186,8 +3190,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		}
 
 		retval += ret;
-		offset += ret;
-		offset &= ~PAGE_MASK;
 		iocb->ki_pos += ret;
 		index = iocb->ki_pos >> PAGE_SHIFT;
 
-- 
2.39.3



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
@ 2024-10-16 12:36   ` Kefeng Wang
  2024-10-17  2:46     ` Baolin Wang
  2024-10-16 15:37   ` Matthew Wilcox
  1 sibling, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2024-10-16 12:36 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel



On 2024/10/16 18:09, Baolin Wang wrote:
> The tmpfs has already supported the PMD-sized large folios, but the tmpfs
> read operation still performs copying at the PAGE SIZE granularity, which
> is unreasonable. This patch changes to copy data at the folio granularity,
> which can improve the read performance, as well as changing to use folio
> related functions.
> 
> Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
> see about 20% performance improvement, and no regression with bs=4k.
> Before the patch:
> READ: bw=10.0GiB/s
> 
> After the patch:
> READ: bw=12.0GiB/s
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>   mm/shmem.c | 22 ++++++++++++----------
>   1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index edab02a26aac..7e79b6a96da0 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3108,13 +3108,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   	ssize_t retval = 0;
>   
>   	index = iocb->ki_pos >> PAGE_SHIFT;
> -	offset = iocb->ki_pos & ~PAGE_MASK;
>   
>   	for (;;) {
>   		struct folio *folio = NULL;
> -		struct page *page = NULL;
>   		unsigned long nr, ret;
>   		loff_t end_offset, i_size = i_size_read(inode);
> +		size_t fsize;
>   
>   		if (unlikely(iocb->ki_pos >= i_size))
>   			break;
> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   		if (folio) {
>   			folio_unlock(folio);
>   
> -			page = folio_file_page(folio, index);
> -			if (PageHWPoison(page)) {
> +			if (folio_test_hwpoison(folio) ||
> +			    (folio_test_large(folio) &&
> +			     folio_test_has_hwpoisoned(folio))) {
>   				folio_put(folio);
>   				error = -EIO;
>   				break;
> @@ -3147,7 +3147,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   			break;
>   		}
>   		end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
> -		nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset);
> +		if (folio)
> +			fsize = folio_size(folio);
> +		else
> +			fsize = PAGE_SIZE;
> +		offset = iocb->ki_pos & (fsize - 1);
> +		nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset);
>   
>   		if (folio) {
>   			/*
> @@ -3156,7 +3161,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   			 * before reading the page on the kernel side.
>   			 */

We'd better to update all the comment from page to folio.

>   			if (mapping_writably_mapped(mapping))
> -				flush_dcache_page(page);
> +				flush_dcache_folio(folio);
>   			/*
>   			 * Mark the page accessed if we read the beginning.
>   			 */
> @@ -3166,9 +3171,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   			 * Ok, we have the page, and it's up-to-date, so
>   			 * now we can copy it to user space...
>   			 */
> -			ret = copy_page_to_iter(page, offset, nr, to);
> +			ret = copy_folio_to_iter(folio, offset, nr, to);
>   			folio_put(folio);
> -
>   		} else if (user_backed_iter(to)) {
>   			/*
>   			 * Copy to user tends to be so well optimized, but
> @@ -3186,8 +3190,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   		}
>   
>   		retval += ret;
> -		offset += ret;
> -		offset &= ~PAGE_MASK;
>   		iocb->ki_pos += ret;
>   		index = iocb->ki_pos >> PAGE_SHIFT;
>   



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-16 12:36   ` Kefeng Wang
@ 2024-10-17  2:46     ` Baolin Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Baolin Wang @ 2024-10-17  2:46 UTC (permalink / raw)
  To: Kefeng Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel



On 2024/10/16 20:36, Kefeng Wang wrote:
> 
> 
> On 2024/10/16 18:09, Baolin Wang wrote:
>> The tmpfs has already supported the PMD-sized large folios, but the tmpfs
>> read operation still performs copying at the PAGE SIZE granularity, which
>> is unreasonable. This patch changes to copy data at the folio 
>> granularity,
>> which can improve the read performance, as well as changing to use folio
>> related functions.
>>
>> Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I 
>> can
>> see about 20% performance improvement, and no regression with bs=4k.
>> Before the patch:
>> READ: bw=10.0GiB/s
>>
>> After the patch:
>> READ: bw=12.0GiB/s
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/shmem.c | 22 ++++++++++++----------
>>   1 file changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index edab02a26aac..7e79b6a96da0 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3108,13 +3108,12 @@ static ssize_t shmem_file_read_iter(struct 
>> kiocb *iocb, struct iov_iter *to)
>>       ssize_t retval = 0;
>>       index = iocb->ki_pos >> PAGE_SHIFT;
>> -    offset = iocb->ki_pos & ~PAGE_MASK;
>>       for (;;) {
>>           struct folio *folio = NULL;
>> -        struct page *page = NULL;
>>           unsigned long nr, ret;
>>           loff_t end_offset, i_size = i_size_read(inode);
>> +        size_t fsize;
>>           if (unlikely(iocb->ki_pos >= i_size))
>>               break;
>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb 
>> *iocb, struct iov_iter *to)
>>           if (folio) {
>>               folio_unlock(folio);
>> -            page = folio_file_page(folio, index);
>> -            if (PageHWPoison(page)) {
>> +            if (folio_test_hwpoison(folio) ||
>> +                (folio_test_large(folio) &&
>> +                 folio_test_has_hwpoisoned(folio))) {
>>                   folio_put(folio);
>>                   error = -EIO;
>>                   break;
>> @@ -3147,7 +3147,12 @@ static ssize_t shmem_file_read_iter(struct 
>> kiocb *iocb, struct iov_iter *to)
>>               break;
>>           }
>>           end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count);
>> -        nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - 
>> offset);
>> +        if (folio)
>> +            fsize = folio_size(folio);
>> +        else
>> +            fsize = PAGE_SIZE;
>> +        offset = iocb->ki_pos & (fsize - 1);
>> +        nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset);
>>           if (folio) {
>>               /*
>> @@ -3156,7 +3161,7 @@ static ssize_t shmem_file_read_iter(struct kiocb 
>> *iocb, struct iov_iter *to)
>>                * before reading the page on the kernel side.
>>                */
> 
> We'd better to update all the comment from page to folio.

Ack.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
  2024-10-16 12:36   ` Kefeng Wang
@ 2024-10-16 15:37   ` Matthew Wilcox
  2024-10-16 17:33     ` Yang Shi
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-10-16 15:37 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, hughd, david, linux-mm, linux-kernel

On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote:
> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  		if (folio) {
>  			folio_unlock(folio);
>  
> -			page = folio_file_page(folio, index);
> -			if (PageHWPoison(page)) {
> +			if (folio_test_hwpoison(folio) ||
> +			    (folio_test_large(folio) &&
> +			     folio_test_has_hwpoisoned(folio))) {

Hm, so if we have hwpoison set on one page in a folio, we now can't read
bytes from any page in the folio?  That seems like we've made a bad
situation worse.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-16 15:37   ` Matthew Wilcox
@ 2024-10-16 17:33     ` Yang Shi
  2024-10-17  3:25       ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2024-10-16 17:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Baolin Wang, akpm, hughd, david, linux-mm, linux-kernel

On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote:
> > @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >               if (folio) {
> >                       folio_unlock(folio);
> >
> > -                     page = folio_file_page(folio, index);
> > -                     if (PageHWPoison(page)) {
> > +                     if (folio_test_hwpoison(folio) ||
> > +                         (folio_test_large(folio) &&
> > +                          folio_test_has_hwpoisoned(folio))) {
>
> Hm, so if we have hwpoison set on one page in a folio, we now can't read
> bytes from any page in the folio?  That seems like we've made a bad
> situation worse.

Yeah, I agree. I think we can fallback to page copy if
folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page.

The folio_test_has_hwpoisoned is kept set if the folio split is failed
in memory failure handler.

>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-16 17:33     ` Yang Shi
@ 2024-10-17  3:25       ` Baolin Wang
  2024-10-17 16:48         ` Yang Shi
  0 siblings, 1 reply; 13+ messages in thread
From: Baolin Wang @ 2024-10-17  3:25 UTC (permalink / raw)
  To: Yang Shi, Matthew Wilcox; +Cc: akpm, hughd, david, linux-mm, linux-kernel



On 2024/10/17 01:33, Yang Shi wrote:
> On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote:
>>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>                if (folio) {
>>>                        folio_unlock(folio);
>>>
>>> -                     page = folio_file_page(folio, index);
>>> -                     if (PageHWPoison(page)) {
>>> +                     if (folio_test_hwpoison(folio) ||
>>> +                         (folio_test_large(folio) &&
>>> +                          folio_test_has_hwpoisoned(folio))) {
>>
>> Hm, so if we have hwpoison set on one page in a folio, we now can't read
>> bytes from any page in the folio?  That seems like we've made a bad
>> situation worse.
> 
> Yeah, I agree. I think we can fallback to page copy if
> folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page.
> 
> The folio_test_has_hwpoisoned is kept set if the folio split is failed
> in memory failure handler.

Right. I can still keep the page size copy if 
folio_test_has_hwpoisoned() is true. Some sample changes are as follow.

Moreover, I noticed shmem splice_read() and write() also simply return 
an error if the folio_test_has_hwpoisoned() is true, without any 
fallback to page granularity. I wonder if it is worth adding page 
granularity support as well?

diff --git a/mm/shmem.c b/mm/shmem.c
index 7e79b6a96da0..f30e24e529b9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb 
*iocb, struct iov_iter *to)

         for (;;) {
                 struct folio *folio = NULL;
+               struct page *page = NULL;
                 unsigned long nr, ret;
                 loff_t end_offset, i_size = i_size_read(inode);
                 size_t fsize;
+               bool fallback_page_copy = false;

                 if (unlikely(iocb->ki_pos >= i_size))
                         break;
@@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb 
*iocb, struct iov_iter *to)
                 if (folio) {
                         folio_unlock(folio);

-                       if (folio_test_hwpoison(folio) ||
-                           (folio_test_large(folio) &&
-                            folio_test_has_hwpoisoned(folio))) {
+                       page = folio_file_page(folio, index);
+                       if (PageHWPoison(page)) {
                                 folio_put(folio);
                                 error = -EIO;
                                 break;
                         }
+
+                       if (folio_test_large(folio) &&
+                           folio_test_has_hwpoisoned(folio))
+                               fallback_page_copy = true;
                 }

                 /*
@@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb 
*iocb, struct iov_iter *to)
                         break;
                 }
                 end_offset = min_t(loff_t, i_size, iocb->ki_pos + 
to->count);
-               if (folio)
+               if (folio && likely(!fallback_page_copy))
                         fsize = folio_size(folio);
                 else
                         fsize = PAGE_SIZE;
@@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb 
*iocb, struct iov_iter *to)
                          * virtual addresses, take care about potential 
aliasing
                          * before reading the page on the kernel side.
                          */
-                       if (mapping_writably_mapped(mapping))
-                               flush_dcache_folio(folio);
+                       if (mapping_writably_mapped(mapping)) {
+                               if (unlikely(fallback_page_copy))
+                                       flush_dcache_page(page);
+                               else
+                                       flush_dcache_folio(folio);
+                       }
+
                         /*
                          * Mark the page accessed if we read the beginning.
                          */
@@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb 
*iocb, struct iov_iter *to)
                          * Ok, we have the page, and it's up-to-date, so
                          * now we can copy it to user space...
                          */
-                       ret = copy_folio_to_iter(folio, offset, nr, to);
+                       if (unlikely(fallback_page_copy))
+                               ret = copy_page_to_iter(page, offset, 
nr, to);
+                       else
+                               ret = copy_folio_to_iter(folio, offset, 
nr, to);
                         folio_put(folio);
                 } else if (user_backed_iter(to)) {
                         /*


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-17  3:25       ` Baolin Wang
@ 2024-10-17 16:48         ` Yang Shi
  2024-10-18  1:45           ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2024-10-17 16:48 UTC (permalink / raw)
  To: Baolin Wang; +Cc: Matthew Wilcox, akpm, hughd, david, linux-mm, linux-kernel

On Wed, Oct 16, 2024 at 8:25 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 2024/10/17 01:33, Yang Shi wrote:
> > On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote:
> >>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >>>                if (folio) {
> >>>                        folio_unlock(folio);
> >>>
> >>> -                     page = folio_file_page(folio, index);
> >>> -                     if (PageHWPoison(page)) {
> >>> +                     if (folio_test_hwpoison(folio) ||
> >>> +                         (folio_test_large(folio) &&
> >>> +                          folio_test_has_hwpoisoned(folio))) {
> >>
> >> Hm, so if we have hwpoison set on one page in a folio, we now can't read
> >> bytes from any page in the folio?  That seems like we've made a bad
> >> situation worse.
> >
> > Yeah, I agree. I think we can fallback to page copy if
> > folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page.
> >
> > The folio_test_has_hwpoisoned is kept set if the folio split is failed
> > in memory failure handler.
>
> Right. I can still keep the page size copy if
> folio_test_has_hwpoisoned() is true. Some sample changes are as follow.
>
> Moreover, I noticed shmem splice_read() and write() also simply return
> an error if the folio_test_has_hwpoisoned() is true, without any
> fallback to page granularity. I wonder if it is worth adding page
> granularity support as well?

I think you should do the same.

>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 7e79b6a96da0..f30e24e529b9 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>
>          for (;;) {
>                  struct folio *folio = NULL;
> +               struct page *page = NULL;
>                  unsigned long nr, ret;
>                  loff_t end_offset, i_size = i_size_read(inode);
>                  size_t fsize;
> +               bool fallback_page_copy = false;
>
>                  if (unlikely(iocb->ki_pos >= i_size))
>                          break;
> @@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>                  if (folio) {
>                          folio_unlock(folio);
>
> -                       if (folio_test_hwpoison(folio) ||
> -                           (folio_test_large(folio) &&
> -                            folio_test_has_hwpoisoned(folio))) {
> +                       page = folio_file_page(folio, index);
> +                       if (PageHWPoison(page)) {
>                                  folio_put(folio);
>                                  error = -EIO;
>                                  break;
>                          }
> +
> +                       if (folio_test_large(folio) &&
> +                           folio_test_has_hwpoisoned(folio))
> +                               fallback_page_copy = true;
>                  }
>
>                  /*
> @@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>                          break;
>                  }
>                  end_offset = min_t(loff_t, i_size, iocb->ki_pos +
> to->count);
> -               if (folio)
> +               if (folio && likely(!fallback_page_copy))
>                          fsize = folio_size(folio);
>                  else
>                          fsize = PAGE_SIZE;
> @@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>                           * virtual addresses, take care about potential
> aliasing
>                           * before reading the page on the kernel side.
>                           */
> -                       if (mapping_writably_mapped(mapping))
> -                               flush_dcache_folio(folio);
> +                       if (mapping_writably_mapped(mapping)) {
> +                               if (unlikely(fallback_page_copy))
> +                                       flush_dcache_page(page);
> +                               else
> +                                       flush_dcache_folio(folio);
> +                       }
> +
>                          /*
>                           * Mark the page accessed if we read the beginning.
>                           */
> @@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>                           * Ok, we have the page, and it's up-to-date, so
>                           * now we can copy it to user space...
>                           */
> -                       ret = copy_folio_to_iter(folio, offset, nr, to);
> +                       if (unlikely(fallback_page_copy))
> +                               ret = copy_page_to_iter(page, offset,
> nr, to);
> +                       else
> +                               ret = copy_folio_to_iter(folio, offset,
> nr, to);
>                          folio_put(folio);
>                  } else if (user_backed_iter(to)) {
>                          /*

The change seems fine to me.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance
  2024-10-17 16:48         ` Yang Shi
@ 2024-10-18  1:45           ` Baolin Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Baolin Wang @ 2024-10-18  1:45 UTC (permalink / raw)
  To: Yang Shi; +Cc: Matthew Wilcox, akpm, hughd, david, linux-mm, linux-kernel



On 2024/10/18 00:48, Yang Shi wrote:
> On Wed, Oct 16, 2024 at 8:25 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2024/10/17 01:33, Yang Shi wrote:
>>> On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote:
>>>>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>>                 if (folio) {
>>>>>                         folio_unlock(folio);
>>>>>
>>>>> -                     page = folio_file_page(folio, index);
>>>>> -                     if (PageHWPoison(page)) {
>>>>> +                     if (folio_test_hwpoison(folio) ||
>>>>> +                         (folio_test_large(folio) &&
>>>>> +                          folio_test_has_hwpoisoned(folio))) {
>>>>
>>>> Hm, so if we have hwpoison set on one page in a folio, we now can't read
>>>> bytes from any page in the folio?  That seems like we've made a bad
>>>> situation worse.
>>>
>>> Yeah, I agree. I think we can fallback to page copy if
>>> folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page.
>>>
>>> The folio_test_has_hwpoisoned is kept set if the folio split is failed
>>> in memory failure handler.
>>
>> Right. I can still keep the page size copy if
>> folio_test_has_hwpoisoned() is true. Some sample changes are as follow.
>>
>> Moreover, I noticed shmem splice_read() and write() also simply return
>> an error if the folio_test_has_hwpoisoned() is true, without any
>> fallback to page granularity. I wonder if it is worth adding page
>> granularity support as well?
> 
> I think you should do the same.

OK. Let me have a detailed look.

>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 7e79b6a96da0..f30e24e529b9 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb
>> *iocb, struct iov_iter *to)
>>
>>           for (;;) {
>>                   struct folio *folio = NULL;
>> +               struct page *page = NULL;
>>                   unsigned long nr, ret;
>>                   loff_t end_offset, i_size = i_size_read(inode);
>>                   size_t fsize;
>> +               bool fallback_page_copy = false;
>>
>>                   if (unlikely(iocb->ki_pos >= i_size))
>>                           break;
>> @@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb
>> *iocb, struct iov_iter *to)
>>                   if (folio) {
>>                           folio_unlock(folio);
>>
>> -                       if (folio_test_hwpoison(folio) ||
>> -                           (folio_test_large(folio) &&
>> -                            folio_test_has_hwpoisoned(folio))) {
>> +                       page = folio_file_page(folio, index);
>> +                       if (PageHWPoison(page)) {
>>                                   folio_put(folio);
>>                                   error = -EIO;
>>                                   break;
>>                           }
>> +
>> +                       if (folio_test_large(folio) &&
>> +                           folio_test_has_hwpoisoned(folio))
>> +                               fallback_page_copy = true;
>>                   }
>>
>>                   /*
>> @@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb
>> *iocb, struct iov_iter *to)
>>                           break;
>>                   }
>>                   end_offset = min_t(loff_t, i_size, iocb->ki_pos +
>> to->count);
>> -               if (folio)
>> +               if (folio && likely(!fallback_page_copy))
>>                           fsize = folio_size(folio);
>>                   else
>>                           fsize = PAGE_SIZE;
>> @@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb
>> *iocb, struct iov_iter *to)
>>                            * virtual addresses, take care about potential
>> aliasing
>>                            * before reading the page on the kernel side.
>>                            */
>> -                       if (mapping_writably_mapped(mapping))
>> -                               flush_dcache_folio(folio);
>> +                       if (mapping_writably_mapped(mapping)) {
>> +                               if (unlikely(fallback_page_copy))
>> +                                       flush_dcache_page(page);
>> +                               else
>> +                                       flush_dcache_folio(folio);
>> +                       }
>> +
>>                           /*
>>                            * Mark the page accessed if we read the beginning.
>>                            */
>> @@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb
>> *iocb, struct iov_iter *to)
>>                            * Ok, we have the page, and it's up-to-date, so
>>                            * now we can copy it to user space...
>>                            */
>> -                       ret = copy_folio_to_iter(folio, offset, nr, to);
>> +                       if (unlikely(fallback_page_copy))
>> +                               ret = copy_page_to_iter(page, offset,
>> nr, to);
>> +                       else
>> +                               ret = copy_folio_to_iter(folio, offset,
>> nr, to);
>>                           folio_put(folio);
>>                   } else if (user_backed_iter(to)) {
>>                           /*
> 
> The change seems fine to me.

Thanks.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Improve the tmpfs large folio read performance
  2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang
  2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
  2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
@ 2024-10-16 11:47 ` Kefeng Wang
  2 siblings, 0 replies; 13+ messages in thread
From: Kefeng Wang @ 2024-10-16 11:47 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel



On 2024/10/16 18:09, Baolin Wang wrote:
> The tmpfs has already supported the PMD-sized large folios, but the tmpfs
> read operation still performs copying at the PAGE SIZE granularity, which
> is not perfect. This patch changes to copy data at the folio granularity,
> which can improve the read performance.
> 
> Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
> see about 20% performance improvement, and no regression with bs=4k. I
> also did some functional test with the xfstests suite, and I did not find
> any regressions with the following xfstests config.
>    FSTYP=tmpfs
>    export TEST_DIR=/mnt/tempfs_mnt
>    export TEST_DEV=/mnt/tempfs_mnt
>    export SCRATCH_MNT=/mnt/scratchdir
>    export SCRATCH_DEV=/mnt/scratchdir
> 

Oh，we make same changes, my bonnie test(./bonnie -d /tmp -s 1024) see
similar improvement(19.2% with huge=always) with out inner changes :)

> Baolin Wang (2):
>    mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
>    mm: shmem: improve the tmpfs large folio read performance
> 
>   mm/shmem.c | 54 ++++++++++++++++++++++--------------------------------
>   1 file changed, 22 insertions(+), 32 deletions(-)
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-10-18  1:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang
2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
2024-10-16 12:34   ` Kefeng Wang
2024-10-17  2:45     ` Baolin Wang
2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang
2024-10-16 12:36   ` Kefeng Wang
2024-10-17  2:46     ` Baolin Wang
2024-10-16 15:37   ` Matthew Wilcox
2024-10-16 17:33     ` Yang Shi
2024-10-17  3:25       ` Baolin Wang
2024-10-17 16:48         ` Yang Shi
2024-10-18  1:45           ` Baolin Wang
2024-10-16 11:47 ` [PATCH 0/2] Improve " Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox