[PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny
@ 2023-04-05 16:18 Qi Zheng
  2023-04-05 16:18 ` [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch() Qi Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Qi Zheng @ 2023-04-05 16:18 UTC (permalink / raw)
  To: akpm, willy, lstoakes; +Cc: mgorman, vbabka, linux-mm, linux-kernel, Qi Zheng

The ->percpu_pvec_drained was originally introduced by
commit d9ed0d08b6c6 ("mm: only drain per-cpu pagevecs once per
pagevec usage") to drain per-cpu pagevecs only once per pagevec
usage. But after converting the swap code to be more folio-based,
the commit c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
breaks this logic, which would cause ->percpu_pvec_drained to be
reset to false, that means per-cpu pagevecs will be drained
multiple times per pagevec usage.

In theory, there should be no functional changes when converting
code to be more folio-based. We should call folio_batch_reinit()
in folio_batch_move_lru() instead of folio_batch_init(). And to
verify that we still need ->percpu_pvec_drained, I ran
mmtests/sparsetruncate-tiny and got the following data:

                             baseline                   with
                            baseline/                 patch/
Min       Time      326.00 (   0.00%)      328.00 (  -0.61%)
1st-qrtle Time      334.00 (   0.00%)      336.00 (  -0.60%)
2nd-qrtle Time      338.00 (   0.00%)      341.00 (  -0.89%)
3rd-qrtle Time      343.00 (   0.00%)      347.00 (  -1.17%)
Max-1     Time      326.00 (   0.00%)      328.00 (  -0.61%)
Max-5     Time      327.00 (   0.00%)      330.00 (  -0.92%)
Max-10    Time      328.00 (   0.00%)      331.00 (  -0.91%)
Max-90    Time      350.00 (   0.00%)      357.00 (  -2.00%)
Max-95    Time      395.00 (   0.00%)      390.00 (   1.27%)
Max-99    Time      508.00 (   0.00%)      434.00 (  14.57%)
Max       Time      547.00 (   0.00%)      476.00 (  12.98%)
Amean     Time      344.61 (   0.00%)      345.56 *  -0.28%*
Stddev    Time       30.34 (   0.00%)       19.51 (  35.69%)
CoeffVar  Time        8.81 (   0.00%)        5.65 (  35.87%)
BAmean-99 Time      342.38 (   0.00%)      344.27 (  -0.55%)
BAmean-95 Time      338.58 (   0.00%)      341.87 (  -0.97%)
BAmean-90 Time      336.89 (   0.00%)      340.26 (  -1.00%)
BAmean-75 Time      335.18 (   0.00%)      338.40 (  -0.96%)
BAmean-50 Time      332.54 (   0.00%)      335.42 (  -0.87%)
BAmean-25 Time      329.30 (   0.00%)      332.00 (  -0.82%)

From the above it can be seen that we get similar data to when
->percpu_pvec_drained was introduced, so we still need it. Let's
call folio_batch_reinit() in folio_batch_move_lru() to restore
the original logic.

Fixes: c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
Changlog in v1 to v2:
 - revise commit message and add test data

 mm/swap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swap.c b/mm/swap.c
index 57cb01b042f6..423199ee8478 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -222,7 +222,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
 	if (lruvec)
 		unlock_page_lruvec_irqrestore(lruvec, flags);
 	folios_put(fbatch->folios, folio_batch_count(fbatch));
-	folio_batch_init(fbatch);
+	folio_batch_reinit(fbatch);
 }
 
 static void folio_batch_add_and_move(struct folio_batch *fbatch,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch()
  2023-04-05 16:18 [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Qi Zheng
@ 2023-04-05 16:18 ` Qi Zheng
  2023-04-06 10:26   ` Mel Gorman
  2023-04-05 16:45 ` [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Matthew Wilcox
  2023-04-06 10:22 ` Mel Gorman
  2 siblings, 1 reply; 5+ messages in thread
From: Qi Zheng @ 2023-04-05 16:18 UTC (permalink / raw)
  To: akpm, willy, lstoakes; +Cc: mgorman, vbabka, linux-mm, linux-kernel, Qi Zheng

Since we have updated mlock to use folios, it's better
to call folios_put() instead of calling release_pages()
directly.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/mlock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 617469fce96d..40b43f8740df 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -206,7 +206,7 @@ static void mlock_folio_batch(struct folio_batch *fbatch)
 
 	if (lruvec)
 		unlock_page_lruvec_irq(lruvec);
-	release_pages(fbatch->folios, fbatch->nr);
+	folios_put(fbatch->folios, folio_batch_count(fbatch));
 	folio_batch_reinit(fbatch);
 }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny
  2023-04-05 16:18 [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Qi Zheng
  2023-04-05 16:18 ` [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch() Qi Zheng
@ 2023-04-05 16:45 ` Matthew Wilcox
  2023-04-06 10:22 ` Mel Gorman
  2 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2023-04-05 16:45 UTC (permalink / raw)
  To: Qi Zheng; +Cc: akpm, lstoakes, mgorman, vbabka, linux-mm, linux-kernel

On Thu, Apr 06, 2023 at 12:18:53AM +0800, Qi Zheng wrote:
> The ->percpu_pvec_drained was originally introduced by
> commit d9ed0d08b6c6 ("mm: only drain per-cpu pagevecs once per
> pagevec usage") to drain per-cpu pagevecs only once per pagevec
> usage. But after converting the swap code to be more folio-based,
> the commit c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
> breaks this logic, which would cause ->percpu_pvec_drained to be
> reset to false, that means per-cpu pagevecs will be drained
> multiple times per pagevec usage.

My mistake.  I didn't reaise that we'd need a folio_batch_reinit(),
and indeed we didn't have one until 811561288397 (January 2023).
I thought this usage of percpu_pvec_drained was going to be fine
with being set to false each time.  Thanks for showing I was wrong.

> Fixes: c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny
  2023-04-05 16:18 [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Qi Zheng
  2023-04-05 16:18 ` [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch() Qi Zheng
  2023-04-05 16:45 ` [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Matthew Wilcox
@ 2023-04-06 10:22 ` Mel Gorman
  2 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2023-04-06 10:22 UTC (permalink / raw)
  To: Qi Zheng; +Cc: akpm, willy, lstoakes, vbabka, linux-mm, linux-kernel

On Thu, Apr 06, 2023 at 12:18:53AM +0800, Qi Zheng wrote:
> The ->percpu_pvec_drained was originally introduced by
> commit d9ed0d08b6c6 ("mm: only drain per-cpu pagevecs once per
> pagevec usage") to drain per-cpu pagevecs only once per pagevec
> usage. But after converting the swap code to be more folio-based,
> the commit c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
> breaks this logic, which would cause ->percpu_pvec_drained to be
> reset to false, that means per-cpu pagevecs will be drained
> multiple times per pagevec usage.
> 
> In theory, there should be no functional changes when converting
> code to be more folio-based. We should call folio_batch_reinit()
> in folio_batch_move_lru() instead of folio_batch_init(). And to
> verify that we still need ->percpu_pvec_drained, I ran
> mmtests/sparsetruncate-tiny and got the following data:
> 
>                              baseline                   with
>                             baseline/                 patch/
> Min       Time      326.00 (   0.00%)      328.00 (  -0.61%)
> 1st-qrtle Time      334.00 (   0.00%)      336.00 (  -0.60%)
> 2nd-qrtle Time      338.00 (   0.00%)      341.00 (  -0.89%)
> 3rd-qrtle Time      343.00 (   0.00%)      347.00 (  -1.17%)
> Max-1     Time      326.00 (   0.00%)      328.00 (  -0.61%)
> Max-5     Time      327.00 (   0.00%)      330.00 (  -0.92%)
> Max-10    Time      328.00 (   0.00%)      331.00 (  -0.91%)
> Max-90    Time      350.00 (   0.00%)      357.00 (  -2.00%)
> Max-95    Time      395.00 (   0.00%)      390.00 (   1.27%)
> Max-99    Time      508.00 (   0.00%)      434.00 (  14.57%)
> Max       Time      547.00 (   0.00%)      476.00 (  12.98%)
> Amean     Time      344.61 (   0.00%)      345.56 *  -0.28%*
> Stddev    Time       30.34 (   0.00%)       19.51 (  35.69%)
> CoeffVar  Time        8.81 (   0.00%)        5.65 (  35.87%)
> BAmean-99 Time      342.38 (   0.00%)      344.27 (  -0.55%)
> BAmean-95 Time      338.58 (   0.00%)      341.87 (  -0.97%)
> BAmean-90 Time      336.89 (   0.00%)      340.26 (  -1.00%)
> BAmean-75 Time      335.18 (   0.00%)      338.40 (  -0.96%)
> BAmean-50 Time      332.54 (   0.00%)      335.42 (  -0.87%)
> BAmean-25 Time      329.30 (   0.00%)      332.00 (  -0.82%)
> 
> From the above it can be seen that we get similar data to when
> ->percpu_pvec_drained was introduced, so we still need it. Let's
> call folio_batch_reinit() in folio_batch_move_lru() to restore
> the original logic.
> 
> Fixes: c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Well spotted,

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch()
  2023-04-05 16:18 ` [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch() Qi Zheng
@ 2023-04-06 10:26   ` Mel Gorman
  0 siblings, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2023-04-06 10:26 UTC (permalink / raw)
  To: Qi Zheng; +Cc: akpm, willy, lstoakes, vbabka, linux-mm, linux-kernel

On Thu, Apr 06, 2023 at 12:18:54AM +0800, Qi Zheng wrote:
> Since we have updated mlock to use folios, it's better
> to call folios_put() instead of calling release_pages()
> directly.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-04-06 10:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05 16:18 [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Qi Zheng
2023-04-05 16:18 ` [PATCH v2 2/2] mm: mlock: use folios_put() in mlock_folio_batch() Qi Zheng
2023-04-06 10:26   ` Mel Gorman
2023-04-05 16:45 ` [PATCH v2 1/2] mm: swap: fix performance regression on sparsetruncate-tiny Matthew Wilcox
2023-04-06 10:22 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox