* [PATCH 0/2] Improve the tmpfs large folio read performance
@ 2024-10-16 10:09 Baolin Wang
2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Baolin Wang @ 2024-10-16 10:09 UTC (permalink / raw)
To: akpm, hughd; +Cc: willy, david, baolin.wang, linux-mm, linux-kernel
The tmpfs has already supported the PMD-sized large folios, but the tmpfs
read operation still performs copying at the PAGE SIZE granularity, which
is not perfect. This patch changes to copy data at the folio granularity,
which can improve the read performance.
Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can
see about 20% performance improvement, and no regression with bs=4k. I
also did some functional test with the xfstests suite, and I did not find
any regressions with the following xfstests config.
FSTYP=tmpfs
export TEST_DIR=/mnt/tempfs_mnt
export TEST_DEV=/mnt/tempfs_mnt
export SCRATCH_MNT=/mnt/scratchdir
export SCRATCH_DEV=/mnt/scratchdir
Baolin Wang (2):
mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic
mm: shmem: improve the tmpfs large folio read performance
mm/shmem.c | 54 ++++++++++++++++++++++--------------------------------
1 file changed, 22 insertions(+), 32 deletions(-)
--
2.39.3
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic 2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang @ 2024-10-16 10:09 ` Baolin Wang 2024-10-16 12:34 ` Kefeng Wang 2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang 2024-10-16 11:47 ` [PATCH 0/2] Improve " Kefeng Wang 2 siblings, 1 reply; 13+ messages in thread From: Baolin Wang @ 2024-10-16 10:09 UTC (permalink / raw) To: akpm, hughd; +Cc: willy, david, baolin.wang, linux-mm, linux-kernel Use iocb->ki_pos to check if the read bytes exceeds the file size and to calculate the bytes to be read can help simplify the code logic. Meanwhile, this is also a preparation for improving tmpfs large folios read performace in the following patch. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/shmem.c | 36 ++++++++++++------------------------ 1 file changed, 12 insertions(+), 24 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 66eae800ffab..edab02a26aac 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3106,26 +3106,18 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) unsigned long offset; int error = 0; ssize_t retval = 0; - loff_t *ppos = &iocb->ki_pos; - index = *ppos >> PAGE_SHIFT; - offset = *ppos & ~PAGE_MASK; + index = iocb->ki_pos >> PAGE_SHIFT; + offset = iocb->ki_pos & ~PAGE_MASK; for (;;) { struct folio *folio = NULL; struct page *page = NULL; - pgoff_t end_index; unsigned long nr, ret; - loff_t i_size = i_size_read(inode); + loff_t end_offset, i_size = i_size_read(inode); - end_index = i_size >> PAGE_SHIFT; - if (index > end_index) + if (unlikely(iocb->ki_pos >= i_size)) break; - if (index == end_index) { - nr = i_size & ~PAGE_MASK; - if (nr <= offset) - break; - } error = shmem_get_folio(inode, index, 0, &folio, SGP_READ); if (error) { @@ -3148,18 +3140,14 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) * We must evaluate after, since reads (unlike writes) * are called without i_rwsem protection against truncate */ - nr = PAGE_SIZE; i_size = i_size_read(inode); - end_index = i_size >> PAGE_SHIFT; - if (index == end_index) { - nr = i_size & ~PAGE_MASK; - if (nr <= offset) { - if (folio) - folio_put(folio); - break; - } + if (unlikely(iocb->ki_pos >= i_size)) { + if (folio) + folio_put(folio); + break; } - nr -= offset; + end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count); + nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); if (folio) { /* @@ -3199,8 +3187,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) retval += ret; offset += ret; - index += offset >> PAGE_SHIFT; offset &= ~PAGE_MASK; + iocb->ki_pos += ret; + index = iocb->ki_pos >> PAGE_SHIFT; if (!iov_iter_count(to)) break; @@ -3211,7 +3200,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) cond_resched(); } - *ppos = ((loff_t) index << PAGE_SHIFT) + offset; file_accessed(file); return retval ? retval : error; } -- 2.39.3 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic 2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang @ 2024-10-16 12:34 ` Kefeng Wang 2024-10-17 2:45 ` Baolin Wang 0 siblings, 1 reply; 13+ messages in thread From: Kefeng Wang @ 2024-10-16 12:34 UTC (permalink / raw) To: Baolin Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel On 2024/10/16 18:09, Baolin Wang wrote: > Use iocb->ki_pos to check if the read bytes exceeds the file size and to > calculate the bytes to be read can help simplify the code logic. Meanwhile, > this is also a preparation for improving tmpfs large folios read performace > in the following patch. > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/shmem.c | 36 ++++++++++++------------------------ > 1 file changed, 12 insertions(+), 24 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 66eae800ffab..edab02a26aac 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3106,26 +3106,18 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > unsigned long offset; > int error = 0; > ssize_t retval = 0; > - loff_t *ppos = &iocb->ki_pos; > > - index = *ppos >> PAGE_SHIFT; > - offset = *ppos & ~PAGE_MASK; > + index = iocb->ki_pos >> PAGE_SHIFT; index calculate could be moved before shmem_get_folio(), then... > + offset = iocb->ki_pos & ~PAGE_MASK; > > for (;;) { > struct folio *folio = NULL; > struct page *page = NULL; > - pgoff_t end_index; > unsigned long nr, ret; > - loff_t i_size = i_size_read(inode); > + loff_t end_offset, i_size = i_size_read(inode); > > - end_index = i_size >> PAGE_SHIFT; > - if (index > end_index) > + if (unlikely(iocb->ki_pos >= i_size)) > break; > - if (index == end_index) { > - nr = i_size & ~PAGE_MASK; > - if (nr <= offset) > - break; > - } > > error = shmem_get_folio(inode, index, 0, &folio, SGP_READ); > if (error) { > @@ -3148,18 +3140,14 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > * We must evaluate after, since reads (unlike writes) > * are called without i_rwsem protection against truncate > */ > - nr = PAGE_SIZE; > i_size = i_size_read(inode); > - end_index = i_size >> PAGE_SHIFT; > - if (index == end_index) { > - nr = i_size & ~PAGE_MASK; > - if (nr <= offset) { > - if (folio) > - folio_put(folio); > - break; > - } > + if (unlikely(iocb->ki_pos >= i_size)) { > + if (folio) > + folio_put(folio); > + break; > } > - nr -= offset; > + end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count); > + nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); > > if (folio) { > /* > @@ -3199,8 +3187,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > > retval += ret; > offset += ret; > - index += offset >> PAGE_SHIFT; > offset &= ~PAGE_MASK; > + iocb->ki_pos += ret; > + index = iocb->ki_pos >> PAGE_SHIFT; remove this line. > > if (!iov_iter_count(to)) > break; > @@ -3211,7 +3200,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > cond_resched(); > } > > - *ppos = ((loff_t) index << PAGE_SHIFT) + offset; > file_accessed(file); > return retval ? retval : error; > } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic 2024-10-16 12:34 ` Kefeng Wang @ 2024-10-17 2:45 ` Baolin Wang 0 siblings, 0 replies; 13+ messages in thread From: Baolin Wang @ 2024-10-17 2:45 UTC (permalink / raw) To: Kefeng Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel On 2024/10/16 20:34, Kefeng Wang wrote: > > > On 2024/10/16 18:09, Baolin Wang wrote: >> Use iocb->ki_pos to check if the read bytes exceeds the file size and to >> calculate the bytes to be read can help simplify the code logic. >> Meanwhile, >> this is also a preparation for improving tmpfs large folios read >> performace >> in the following patch. >> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/shmem.c | 36 ++++++++++++------------------------ >> 1 file changed, 12 insertions(+), 24 deletions(-) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 66eae800ffab..edab02a26aac 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -3106,26 +3106,18 @@ static ssize_t shmem_file_read_iter(struct >> kiocb *iocb, struct iov_iter *to) >> unsigned long offset; >> int error = 0; >> ssize_t retval = 0; >> - loff_t *ppos = &iocb->ki_pos; >> - index = *ppos >> PAGE_SHIFT; >> - offset = *ppos & ~PAGE_MASK; >> + index = iocb->ki_pos >> PAGE_SHIFT; > > index calculate could be moved before shmem_get_folio(), then... Yes. Will do. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang 2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang @ 2024-10-16 10:09 ` Baolin Wang 2024-10-16 12:36 ` Kefeng Wang 2024-10-16 15:37 ` Matthew Wilcox 2024-10-16 11:47 ` [PATCH 0/2] Improve " Kefeng Wang 2 siblings, 2 replies; 13+ messages in thread From: Baolin Wang @ 2024-10-16 10:09 UTC (permalink / raw) To: akpm, hughd; +Cc: willy, david, baolin.wang, linux-mm, linux-kernel The tmpfs has already supported the PMD-sized large folios, but the tmpfs read operation still performs copying at the PAGE SIZE granularity, which is unreasonable. This patch changes to copy data at the folio granularity, which can improve the read performance, as well as changing to use folio related functions. Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can see about 20% performance improvement, and no regression with bs=4k. Before the patch: READ: bw=10.0GiB/s After the patch: READ: bw=12.0GiB/s Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/shmem.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index edab02a26aac..7e79b6a96da0 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3108,13 +3108,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ssize_t retval = 0; index = iocb->ki_pos >> PAGE_SHIFT; - offset = iocb->ki_pos & ~PAGE_MASK; for (;;) { struct folio *folio = NULL; - struct page *page = NULL; unsigned long nr, ret; loff_t end_offset, i_size = i_size_read(inode); + size_t fsize; if (unlikely(iocb->ki_pos >= i_size)) break; @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (folio) { folio_unlock(folio); - page = folio_file_page(folio, index); - if (PageHWPoison(page)) { + if (folio_test_hwpoison(folio) || + (folio_test_large(folio) && + folio_test_has_hwpoisoned(folio))) { folio_put(folio); error = -EIO; break; @@ -3147,7 +3147,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) break; } end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count); - nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); + if (folio) + fsize = folio_size(folio); + else + fsize = PAGE_SIZE; + offset = iocb->ki_pos & (fsize - 1); + nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset); if (folio) { /* @@ -3156,7 +3161,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) * before reading the page on the kernel side. */ if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_folio(folio); /* * Mark the page accessed if we read the beginning. */ @@ -3166,9 +3171,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) * Ok, we have the page, and it's up-to-date, so * now we can copy it to user space... */ - ret = copy_page_to_iter(page, offset, nr, to); + ret = copy_folio_to_iter(folio, offset, nr, to); folio_put(folio); - } else if (user_backed_iter(to)) { /* * Copy to user tends to be so well optimized, but @@ -3186,8 +3190,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) } retval += ret; - offset += ret; - offset &= ~PAGE_MASK; iocb->ki_pos += ret; index = iocb->ki_pos >> PAGE_SHIFT; -- 2.39.3 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang @ 2024-10-16 12:36 ` Kefeng Wang 2024-10-17 2:46 ` Baolin Wang 2024-10-16 15:37 ` Matthew Wilcox 1 sibling, 1 reply; 13+ messages in thread From: Kefeng Wang @ 2024-10-16 12:36 UTC (permalink / raw) To: Baolin Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel On 2024/10/16 18:09, Baolin Wang wrote: > The tmpfs has already supported the PMD-sized large folios, but the tmpfs > read operation still performs copying at the PAGE SIZE granularity, which > is unreasonable. This patch changes to copy data at the folio granularity, > which can improve the read performance, as well as changing to use folio > related functions. > > Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can > see about 20% performance improvement, and no regression with bs=4k. > Before the patch: > READ: bw=10.0GiB/s > > After the patch: > READ: bw=12.0GiB/s > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/shmem.c | 22 ++++++++++++---------- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index edab02a26aac..7e79b6a96da0 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3108,13 +3108,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > ssize_t retval = 0; > > index = iocb->ki_pos >> PAGE_SHIFT; > - offset = iocb->ki_pos & ~PAGE_MASK; > > for (;;) { > struct folio *folio = NULL; > - struct page *page = NULL; > unsigned long nr, ret; > loff_t end_offset, i_size = i_size_read(inode); > + size_t fsize; > > if (unlikely(iocb->ki_pos >= i_size)) > break; > @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > if (folio) { > folio_unlock(folio); > > - page = folio_file_page(folio, index); > - if (PageHWPoison(page)) { > + if (folio_test_hwpoison(folio) || > + (folio_test_large(folio) && > + folio_test_has_hwpoisoned(folio))) { > folio_put(folio); > error = -EIO; > break; > @@ -3147,7 +3147,12 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > break; > } > end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count); > - nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); > + if (folio) > + fsize = folio_size(folio); > + else > + fsize = PAGE_SIZE; > + offset = iocb->ki_pos & (fsize - 1); > + nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset); > > if (folio) { > /* > @@ -3156,7 +3161,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > * before reading the page on the kernel side. > */ We'd better to update all the comment from page to folio. > if (mapping_writably_mapped(mapping)) > - flush_dcache_page(page); > + flush_dcache_folio(folio); > /* > * Mark the page accessed if we read the beginning. > */ > @@ -3166,9 +3171,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > * Ok, we have the page, and it's up-to-date, so > * now we can copy it to user space... > */ > - ret = copy_page_to_iter(page, offset, nr, to); > + ret = copy_folio_to_iter(folio, offset, nr, to); > folio_put(folio); > - > } else if (user_backed_iter(to)) { > /* > * Copy to user tends to be so well optimized, but > @@ -3186,8 +3190,6 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > } > > retval += ret; > - offset += ret; > - offset &= ~PAGE_MASK; > iocb->ki_pos += ret; > index = iocb->ki_pos >> PAGE_SHIFT; > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-16 12:36 ` Kefeng Wang @ 2024-10-17 2:46 ` Baolin Wang 0 siblings, 0 replies; 13+ messages in thread From: Baolin Wang @ 2024-10-17 2:46 UTC (permalink / raw) To: Kefeng Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel On 2024/10/16 20:36, Kefeng Wang wrote: > > > On 2024/10/16 18:09, Baolin Wang wrote: >> The tmpfs has already supported the PMD-sized large folios, but the tmpfs >> read operation still performs copying at the PAGE SIZE granularity, which >> is unreasonable. This patch changes to copy data at the folio >> granularity, >> which can improve the read performance, as well as changing to use folio >> related functions. >> >> Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I >> can >> see about 20% performance improvement, and no regression with bs=4k. >> Before the patch: >> READ: bw=10.0GiB/s >> >> After the patch: >> READ: bw=12.0GiB/s >> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/shmem.c | 22 ++++++++++++---------- >> 1 file changed, 12 insertions(+), 10 deletions(-) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index edab02a26aac..7e79b6a96da0 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -3108,13 +3108,12 @@ static ssize_t shmem_file_read_iter(struct >> kiocb *iocb, struct iov_iter *to) >> ssize_t retval = 0; >> index = iocb->ki_pos >> PAGE_SHIFT; >> - offset = iocb->ki_pos & ~PAGE_MASK; >> for (;;) { >> struct folio *folio = NULL; >> - struct page *page = NULL; >> unsigned long nr, ret; >> loff_t end_offset, i_size = i_size_read(inode); >> + size_t fsize; >> if (unlikely(iocb->ki_pos >= i_size)) >> break; >> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> if (folio) { >> folio_unlock(folio); >> - page = folio_file_page(folio, index); >> - if (PageHWPoison(page)) { >> + if (folio_test_hwpoison(folio) || >> + (folio_test_large(folio) && >> + folio_test_has_hwpoisoned(folio))) { >> folio_put(folio); >> error = -EIO; >> break; >> @@ -3147,7 +3147,12 @@ static ssize_t shmem_file_read_iter(struct >> kiocb *iocb, struct iov_iter *to) >> break; >> } >> end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count); >> - nr = min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - >> offset); >> + if (folio) >> + fsize = folio_size(folio); >> + else >> + fsize = PAGE_SIZE; >> + offset = iocb->ki_pos & (fsize - 1); >> + nr = min_t(loff_t, end_offset - iocb->ki_pos, fsize - offset); >> if (folio) { >> /* >> @@ -3156,7 +3161,7 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> * before reading the page on the kernel side. >> */ > > We'd better to update all the comment from page to folio. Ack. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang 2024-10-16 12:36 ` Kefeng Wang @ 2024-10-16 15:37 ` Matthew Wilcox 2024-10-16 17:33 ` Yang Shi 1 sibling, 1 reply; 13+ messages in thread From: Matthew Wilcox @ 2024-10-16 15:37 UTC (permalink / raw) To: Baolin Wang; +Cc: akpm, hughd, david, linux-mm, linux-kernel On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote: > @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > if (folio) { > folio_unlock(folio); > > - page = folio_file_page(folio, index); > - if (PageHWPoison(page)) { > + if (folio_test_hwpoison(folio) || > + (folio_test_large(folio) && > + folio_test_has_hwpoisoned(folio))) { Hm, so if we have hwpoison set on one page in a folio, we now can't read bytes from any page in the folio? That seems like we've made a bad situation worse. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-16 15:37 ` Matthew Wilcox @ 2024-10-16 17:33 ` Yang Shi 2024-10-17 3:25 ` Baolin Wang 0 siblings, 1 reply; 13+ messages in thread From: Yang Shi @ 2024-10-16 17:33 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Baolin Wang, akpm, hughd, david, linux-mm, linux-kernel On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote: > > @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > > if (folio) { > > folio_unlock(folio); > > > > - page = folio_file_page(folio, index); > > - if (PageHWPoison(page)) { > > + if (folio_test_hwpoison(folio) || > > + (folio_test_large(folio) && > > + folio_test_has_hwpoisoned(folio))) { > > Hm, so if we have hwpoison set on one page in a folio, we now can't read > bytes from any page in the folio? That seems like we've made a bad > situation worse. Yeah, I agree. I think we can fallback to page copy if folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page. The folio_test_has_hwpoisoned is kept set if the folio split is failed in memory failure handler. > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-16 17:33 ` Yang Shi @ 2024-10-17 3:25 ` Baolin Wang 2024-10-17 16:48 ` Yang Shi 0 siblings, 1 reply; 13+ messages in thread From: Baolin Wang @ 2024-10-17 3:25 UTC (permalink / raw) To: Yang Shi, Matthew Wilcox; +Cc: akpm, hughd, david, linux-mm, linux-kernel On 2024/10/17 01:33, Yang Shi wrote: > On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote: >> >> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote: >>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) >>> if (folio) { >>> folio_unlock(folio); >>> >>> - page = folio_file_page(folio, index); >>> - if (PageHWPoison(page)) { >>> + if (folio_test_hwpoison(folio) || >>> + (folio_test_large(folio) && >>> + folio_test_has_hwpoisoned(folio))) { >> >> Hm, so if we have hwpoison set on one page in a folio, we now can't read >> bytes from any page in the folio? That seems like we've made a bad >> situation worse. > > Yeah, I agree. I think we can fallback to page copy if > folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page. > > The folio_test_has_hwpoisoned is kept set if the folio split is failed > in memory failure handler. Right. I can still keep the page size copy if folio_test_has_hwpoisoned() is true. Some sample changes are as follow. Moreover, I noticed shmem splice_read() and write() also simply return an error if the folio_test_has_hwpoisoned() is true, without any fallback to page granularity. I wonder if it is worth adding page granularity support as well? diff --git a/mm/shmem.c b/mm/shmem.c index 7e79b6a96da0..f30e24e529b9 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) for (;;) { struct folio *folio = NULL; + struct page *page = NULL; unsigned long nr, ret; loff_t end_offset, i_size = i_size_read(inode); size_t fsize; + bool fallback_page_copy = false; if (unlikely(iocb->ki_pos >= i_size)) break; @@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (folio) { folio_unlock(folio); - if (folio_test_hwpoison(folio) || - (folio_test_large(folio) && - folio_test_has_hwpoisoned(folio))) { + page = folio_file_page(folio, index); + if (PageHWPoison(page)) { folio_put(folio); error = -EIO; break; } + + if (folio_test_large(folio) && + folio_test_has_hwpoisoned(folio)) + fallback_page_copy = true; } /* @@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) break; } end_offset = min_t(loff_t, i_size, iocb->ki_pos + to->count); - if (folio) + if (folio && likely(!fallback_page_copy)) fsize = folio_size(folio); else fsize = PAGE_SIZE; @@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) * virtual addresses, take care about potential aliasing * before reading the page on the kernel side. */ - if (mapping_writably_mapped(mapping)) - flush_dcache_folio(folio); + if (mapping_writably_mapped(mapping)) { + if (unlikely(fallback_page_copy)) + flush_dcache_page(page); + else + flush_dcache_folio(folio); + } + /* * Mark the page accessed if we read the beginning. */ @@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) * Ok, we have the page, and it's up-to-date, so * now we can copy it to user space... */ - ret = copy_folio_to_iter(folio, offset, nr, to); + if (unlikely(fallback_page_copy)) + ret = copy_page_to_iter(page, offset, nr, to); + else + ret = copy_folio_to_iter(folio, offset, nr, to); folio_put(folio); } else if (user_backed_iter(to)) { /* ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-17 3:25 ` Baolin Wang @ 2024-10-17 16:48 ` Yang Shi 2024-10-18 1:45 ` Baolin Wang 0 siblings, 1 reply; 13+ messages in thread From: Yang Shi @ 2024-10-17 16:48 UTC (permalink / raw) To: Baolin Wang; +Cc: Matthew Wilcox, akpm, hughd, david, linux-mm, linux-kernel On Wed, Oct 16, 2024 at 8:25 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2024/10/17 01:33, Yang Shi wrote: > > On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote: > >> > >> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote: > >>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > >>> if (folio) { > >>> folio_unlock(folio); > >>> > >>> - page = folio_file_page(folio, index); > >>> - if (PageHWPoison(page)) { > >>> + if (folio_test_hwpoison(folio) || > >>> + (folio_test_large(folio) && > >>> + folio_test_has_hwpoisoned(folio))) { > >> > >> Hm, so if we have hwpoison set on one page in a folio, we now can't read > >> bytes from any page in the folio? That seems like we've made a bad > >> situation worse. > > > > Yeah, I agree. I think we can fallback to page copy if > > folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page. > > > > The folio_test_has_hwpoisoned is kept set if the folio split is failed > > in memory failure handler. > > Right. I can still keep the page size copy if > folio_test_has_hwpoisoned() is true. Some sample changes are as follow. > > Moreover, I noticed shmem splice_read() and write() also simply return > an error if the folio_test_has_hwpoisoned() is true, without any > fallback to page granularity. I wonder if it is worth adding page > granularity support as well? I think you should do the same. > > diff --git a/mm/shmem.c b/mm/shmem.c > index 7e79b6a96da0..f30e24e529b9 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > > for (;;) { > struct folio *folio = NULL; > + struct page *page = NULL; > unsigned long nr, ret; > loff_t end_offset, i_size = i_size_read(inode); > size_t fsize; > + bool fallback_page_copy = false; > > if (unlikely(iocb->ki_pos >= i_size)) > break; > @@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > if (folio) { > folio_unlock(folio); > > - if (folio_test_hwpoison(folio) || > - (folio_test_large(folio) && > - folio_test_has_hwpoisoned(folio))) { > + page = folio_file_page(folio, index); > + if (PageHWPoison(page)) { > folio_put(folio); > error = -EIO; > break; > } > + > + if (folio_test_large(folio) && > + folio_test_has_hwpoisoned(folio)) > + fallback_page_copy = true; > } > > /* > @@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > break; > } > end_offset = min_t(loff_t, i_size, iocb->ki_pos + > to->count); > - if (folio) > + if (folio && likely(!fallback_page_copy)) > fsize = folio_size(folio); > else > fsize = PAGE_SIZE; > @@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > * virtual addresses, take care about potential > aliasing > * before reading the page on the kernel side. > */ > - if (mapping_writably_mapped(mapping)) > - flush_dcache_folio(folio); > + if (mapping_writably_mapped(mapping)) { > + if (unlikely(fallback_page_copy)) > + flush_dcache_page(page); > + else > + flush_dcache_folio(folio); > + } > + > /* > * Mark the page accessed if we read the beginning. > */ > @@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb > *iocb, struct iov_iter *to) > * Ok, we have the page, and it's up-to-date, so > * now we can copy it to user space... > */ > - ret = copy_folio_to_iter(folio, offset, nr, to); > + if (unlikely(fallback_page_copy)) > + ret = copy_page_to_iter(page, offset, > nr, to); > + else > + ret = copy_folio_to_iter(folio, offset, > nr, to); > folio_put(folio); > } else if (user_backed_iter(to)) { > /* The change seems fine to me. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance 2024-10-17 16:48 ` Yang Shi @ 2024-10-18 1:45 ` Baolin Wang 0 siblings, 0 replies; 13+ messages in thread From: Baolin Wang @ 2024-10-18 1:45 UTC (permalink / raw) To: Yang Shi; +Cc: Matthew Wilcox, akpm, hughd, david, linux-mm, linux-kernel On 2024/10/18 00:48, Yang Shi wrote: > On Wed, Oct 16, 2024 at 8:25 PM Baolin Wang > <baolin.wang@linux.alibaba.com> wrote: >> >> >> >> On 2024/10/17 01:33, Yang Shi wrote: >>> On Wed, Oct 16, 2024 at 8:38 AM Matthew Wilcox <willy@infradead.org> wrote: >>>> >>>> On Wed, Oct 16, 2024 at 06:09:30PM +0800, Baolin Wang wrote: >>>>> @@ -3128,8 +3127,9 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) >>>>> if (folio) { >>>>> folio_unlock(folio); >>>>> >>>>> - page = folio_file_page(folio, index); >>>>> - if (PageHWPoison(page)) { >>>>> + if (folio_test_hwpoison(folio) || >>>>> + (folio_test_large(folio) && >>>>> + folio_test_has_hwpoisoned(folio))) { >>>> >>>> Hm, so if we have hwpoison set on one page in a folio, we now can't read >>>> bytes from any page in the folio? That seems like we've made a bad >>>> situation worse. >>> >>> Yeah, I agree. I think we can fallback to page copy if >>> folio_test_has_hwpoisoned is true. The PG_hwpoison flag is per page. >>> >>> The folio_test_has_hwpoisoned is kept set if the folio split is failed >>> in memory failure handler. >> >> Right. I can still keep the page size copy if >> folio_test_has_hwpoisoned() is true. Some sample changes are as follow. >> >> Moreover, I noticed shmem splice_read() and write() also simply return >> an error if the folio_test_has_hwpoisoned() is true, without any >> fallback to page granularity. I wonder if it is worth adding page >> granularity support as well? > > I think you should do the same. OK. Let me have a detailed look. >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 7e79b6a96da0..f30e24e529b9 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -3111,9 +3111,11 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> >> for (;;) { >> struct folio *folio = NULL; >> + struct page *page = NULL; >> unsigned long nr, ret; >> loff_t end_offset, i_size = i_size_read(inode); >> size_t fsize; >> + bool fallback_page_copy = false; >> >> if (unlikely(iocb->ki_pos >= i_size)) >> break; >> @@ -3127,13 +3129,16 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> if (folio) { >> folio_unlock(folio); >> >> - if (folio_test_hwpoison(folio) || >> - (folio_test_large(folio) && >> - folio_test_has_hwpoisoned(folio))) { >> + page = folio_file_page(folio, index); >> + if (PageHWPoison(page)) { >> folio_put(folio); >> error = -EIO; >> break; >> } >> + >> + if (folio_test_large(folio) && >> + folio_test_has_hwpoisoned(folio)) >> + fallback_page_copy = true; >> } >> >> /* >> @@ -3147,7 +3152,7 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> break; >> } >> end_offset = min_t(loff_t, i_size, iocb->ki_pos + >> to->count); >> - if (folio) >> + if (folio && likely(!fallback_page_copy)) >> fsize = folio_size(folio); >> else >> fsize = PAGE_SIZE; >> @@ -3160,8 +3165,13 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> * virtual addresses, take care about potential >> aliasing >> * before reading the page on the kernel side. >> */ >> - if (mapping_writably_mapped(mapping)) >> - flush_dcache_folio(folio); >> + if (mapping_writably_mapped(mapping)) { >> + if (unlikely(fallback_page_copy)) >> + flush_dcache_page(page); >> + else >> + flush_dcache_folio(folio); >> + } >> + >> /* >> * Mark the page accessed if we read the beginning. >> */ >> @@ -3171,7 +3181,10 @@ static ssize_t shmem_file_read_iter(struct kiocb >> *iocb, struct iov_iter *to) >> * Ok, we have the page, and it's up-to-date, so >> * now we can copy it to user space... >> */ >> - ret = copy_folio_to_iter(folio, offset, nr, to); >> + if (unlikely(fallback_page_copy)) >> + ret = copy_page_to_iter(page, offset, >> nr, to); >> + else >> + ret = copy_folio_to_iter(folio, offset, >> nr, to); >> folio_put(folio); >> } else if (user_backed_iter(to)) { >> /* > > The change seems fine to me. Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Improve the tmpfs large folio read performance 2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang 2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang 2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang @ 2024-10-16 11:47 ` Kefeng Wang 2 siblings, 0 replies; 13+ messages in thread From: Kefeng Wang @ 2024-10-16 11:47 UTC (permalink / raw) To: Baolin Wang, akpm, hughd; +Cc: willy, david, linux-mm, linux-kernel On 2024/10/16 18:09, Baolin Wang wrote: > The tmpfs has already supported the PMD-sized large folios, but the tmpfs > read operation still performs copying at the PAGE SIZE granularity, which > is not perfect. This patch changes to copy data at the folio granularity, > which can improve the read performance. > > Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can > see about 20% performance improvement, and no regression with bs=4k. I > also did some functional test with the xfstests suite, and I did not find > any regressions with the following xfstests config. > FSTYP=tmpfs > export TEST_DIR=/mnt/tempfs_mnt > export TEST_DEV=/mnt/tempfs_mnt > export SCRATCH_MNT=/mnt/scratchdir > export SCRATCH_DEV=/mnt/scratchdir > Oh,we make same changes, my bonnie test(./bonnie -d /tmp -s 1024) see similar improvement(19.2% with huge=always) with out inner changes :) > Baolin Wang (2): > mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic > mm: shmem: improve the tmpfs large folio read performance > > mm/shmem.c | 54 ++++++++++++++++++++++-------------------------------- > 1 file changed, 22 insertions(+), 32 deletions(-) > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-18 1:45 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-10-16 10:09 [PATCH 0/2] Improve the tmpfs large folio read performance Baolin Wang 2024-10-16 10:09 ` [PATCH 1/2] mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logic Baolin Wang 2024-10-16 12:34 ` Kefeng Wang 2024-10-17 2:45 ` Baolin Wang 2024-10-16 10:09 ` [PATCH 2/2] mm: shmem: improve the tmpfs large folio read performance Baolin Wang 2024-10-16 12:36 ` Kefeng Wang 2024-10-17 2:46 ` Baolin Wang 2024-10-16 15:37 ` Matthew Wilcox 2024-10-16 17:33 ` Yang Shi 2024-10-17 3:25 ` Baolin Wang 2024-10-17 16:48 ` Yang Shi 2024-10-18 1:45 ` Baolin Wang 2024-10-16 11:47 ` [PATCH 0/2] Improve " Kefeng Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox