linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/1] mm: vmascan: retry folios written back while isolated for traditional LRU
@ 2024-12-09  8:36 Chen Ridong
  2024-12-09  8:36 ` [PATCH v4 1/1] " Chen Ridong
  2024-12-10  2:13 ` [PATCH v4 0/1] " Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Chen Ridong @ 2024-12-09  8:36 UTC (permalink / raw)
  To: akpm, mhocko, hannes, yosryahmed, yuzhao, david, willy,
	ryan.roberts, baohua, 21cnbao, wangkefeng.wang
  Cc: linux-mm, linux-kernel, chenridong, wangweiyang2, xieym_ict

From: Chen Ridong <chenridong@huawei.com>

The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
while isolated") only fixed the issue for mglru. However, this issue
also exists in the traditional active/inactive LRU. Fix this issue
in the same way for active/inactive lru.

What is fixed:
The page reclaim isolates a batch of folios from the tail of one of the
LRU lists and works on those folios one by one.  For a suitable
swap-backed folio, if the swap device is async, it queues that folio for
writeback.  After the page reclaim finishes an entire batch, it puts back
the folios it queued for writeback to the head of the original LRU list.

In the meantime, the page writeback flushes the queued folios also by
batches.  Its batching logic is independent from that of the page reclaim.
For each of the folios it writes back, the page writeback calls
folio_rotate_reclaimable() which tries to rotate a folio to the tail.

folio_rotate_reclaimable() only works for a folio after the page reclaim
has put it back.  If an async swap device is fast enough, the page
writeback can finish with that folio while the page reclaim is still
working on the rest of the batch containing it.  In this case, that folio
will remain at the head and the page reclaim will not retry it before
reaching there.

---
v4:
 - conbine patch 1 and patch 2 together in v3.
 - refine commit msg.
 - fix builds errors reported-by: kernel test robot <lkp@intel.com>.
v3:
 - fix this issue in the same with way as multi-gen LRU.

v2:
 - detect folios whose writeback has done and move them to the tail
    of lru. suggested by Barry Song
[2] https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/

v1:
[1] https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/

Chen Ridong (1):
  mm: vmascan: retry folios written back while isolated for traditional
    LRU

 include/linux/mmzone.h |   3 +-
 mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
 2 files changed, 77 insertions(+), 34 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-09  8:36 [PATCH v4 0/1] mm: vmascan: retry folios written back while isolated for traditional LRU Chen Ridong
@ 2024-12-09  8:36 ` Chen Ridong
  2024-12-10  4:54   ` Barry Song
  2024-12-10  2:13 ` [PATCH v4 0/1] " Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Chen Ridong @ 2024-12-09  8:36 UTC (permalink / raw)
  To: akpm, mhocko, hannes, yosryahmed, yuzhao, david, willy,
	ryan.roberts, baohua, 21cnbao, wangkefeng.wang
  Cc: linux-mm, linux-kernel, chenridong, wangweiyang2, xieym_ict

From: Chen Ridong <chenridong@huawei.com>

The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
while isolated") only fixed the issue for mglru. However, this issue
also exists in the traditional active/inactive LRU. This issue will be
worse if THP is split, which makes the list longer and needs longer time
to finish a batch of folios reclaim.

This issue should be fixed in the same way for the traditional LRU.
Therefore, the common logic was extracted to the 'find_folios_written_back'
function firstly, which is then reused in the 'shrink_inactive_list'
function. Finally, retry reclaiming those folios that may have missed the
rotation for traditional LRU.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 include/linux/mmzone.h |   3 +-
 mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
 2 files changed, 77 insertions(+), 34 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b36124145a16..47c6e8c43dcd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
 
 #define LRU_GEN_MASK		((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
 #define LRU_REFS_MASK		((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
+#define LRU_REFS_FLAGS		(BIT(PG_referenced) | BIT(PG_workingset))
 
 #ifdef CONFIG_LRU_GEN
 
@@ -406,8 +407,6 @@ enum {
 	NR_LRU_GEN_CAPS
 };
 
-#define LRU_REFS_FLAGS		(BIT(PG_referenced) | BIT(PG_workingset))
-
 #define MIN_LRU_BATCH		BITS_PER_LONG
 #define MAX_LRU_BATCH		(MIN_LRU_BATCH * 64)
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 76378bc257e3..1f0d194f8b2f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
 	task->reclaim_state = rs;
 }
 
+/**
+ * find_folios_written_back - Find and move the written back folios to a new list.
+ * @list: filios list
+ * @clean: the written back folios list
+ * @skip: whether skip to move the written back folios to clean list.
+ */
+static inline void find_folios_written_back(struct list_head *list,
+		struct list_head *clean, bool skip)
+{
+	struct folio *folio;
+	struct folio *next;
+
+	list_for_each_entry_safe_reverse(folio, next, list, lru) {
+		if (!folio_evictable(folio)) {
+			list_del(&folio->lru);
+			folio_putback_lru(folio);
+			continue;
+		}
+
+		if (folio_test_reclaim(folio) &&
+		    (folio_test_dirty(folio) || folio_test_writeback(folio))) {
+			/* restore LRU_REFS_FLAGS cleared by isolate_folio() */
+			if (lru_gen_enabled() && folio_test_workingset(folio))
+				folio_set_referenced(folio);
+			continue;
+		}
+
+		if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
+		    folio_mapped(folio) || folio_test_locked(folio) ||
+		    folio_test_dirty(folio) || folio_test_writeback(folio)) {
+			/* don't add rejected folios to the oldest generation */
+			if (lru_gen_enabled())
+				set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
+					      BIT(PG_active));
+			continue;
+		}
+
+		/* retry folios that may have missed folio_rotate_reclaimable() */
+		list_move(&folio->lru, clean);
+	}
+}
+
 /*
  * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
  * scan_control->nr_reclaimed.
@@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
 	return !(current->flags & PF_LOCAL_THROTTLE);
 }
 
+static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
+		struct reclaim_stat *curr)
+{
+	int i;
+
+	stat->nr_dirty += curr->nr_dirty;
+	stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
+	stat->nr_congested += curr->nr_congested;
+	stat->nr_writeback += curr->nr_writeback;
+	stat->nr_immediate += curr->nr_immediate;
+	stat->nr_pageout += curr->nr_pageout;
+	stat->nr_ref_keep += curr->nr_ref_keep;
+	stat->nr_unmap_fail += curr->nr_unmap_fail;
+	stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
+	stat->nr_demoted += curr->nr_demoted;
+	for (i = 0; i < ANON_AND_FILE; i++)
+		stat->nr_activate[i] = curr->nr_activate[i];
+}
+
 /*
  * shrink_inactive_list() is a helper for shrink_node().  It returns the number
  * of reclaimed pages
@@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 		enum lru_list lru)
 {
 	LIST_HEAD(folio_list);
+	LIST_HEAD(clean_list);
 	unsigned long nr_scanned;
 	unsigned int nr_reclaimed = 0;
 	unsigned long nr_taken;
-	struct reclaim_stat stat;
+	struct reclaim_stat stat, curr;
 	bool file = is_file_lru(lru);
 	enum vm_event_item item;
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	bool stalled = false;
+	bool skip_retry = false;
 
 	while (unlikely(too_many_isolated(pgdat, file, sc))) {
 		if (stalled)
@@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	if (nr_taken == 0)
 		return 0;
 
-	nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
+	memset(&stat, 0, sizeof(stat));
+retry:
+	nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
+	find_folios_written_back(&folio_list, &clean_list, skip_retry);
+	acc_reclaimed_stat(&stat, &curr);
 
 	spin_lock_irq(&lruvec->lru_lock);
 	move_folios_to_lru(lruvec, &folio_list);
+	if (!list_empty(&clean_list)) {
+		list_splice_init(&clean_list, &folio_list);
+		skip_retry = true;
+		spin_unlock_irq(&lruvec->lru_lock);
+		goto retry;
+	}
 
 	__mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
 					stat.nr_demoted);
@@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	int reclaimed;
 	LIST_HEAD(list);
 	LIST_HEAD(clean);
-	struct folio *folio;
-	struct folio *next;
 	enum vm_event_item item;
 	struct reclaim_stat stat;
 	struct lru_gen_mm_walk *walk;
@@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 			scanned, reclaimed, &stat, sc->priority,
 			type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
 
-	list_for_each_entry_safe_reverse(folio, next, &list, lru) {
-		if (!folio_evictable(folio)) {
-			list_del(&folio->lru);
-			folio_putback_lru(folio);
-			continue;
-		}
-
-		if (folio_test_reclaim(folio) &&
-		    (folio_test_dirty(folio) || folio_test_writeback(folio))) {
-			/* restore LRU_REFS_FLAGS cleared by isolate_folio() */
-			if (folio_test_workingset(folio))
-				folio_set_referenced(folio);
-			continue;
-		}
-
-		if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
-		    folio_mapped(folio) || folio_test_locked(folio) ||
-		    folio_test_dirty(folio) || folio_test_writeback(folio)) {
-			/* don't add rejected folios to the oldest generation */
-			set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
-				      BIT(PG_active));
-			continue;
-		}
-
-		/* retry folios that may have missed folio_rotate_reclaimable() */
-		list_move(&folio->lru, &clean);
-	}
-
+	find_folios_written_back(&list, &clean, skip_retry);
 	spin_lock_irq(&lruvec->lru_lock);
 
 	move_folios_to_lru(lruvec, &list);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 0/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-09  8:36 [PATCH v4 0/1] mm: vmascan: retry folios written back while isolated for traditional LRU Chen Ridong
  2024-12-09  8:36 ` [PATCH v4 1/1] " Chen Ridong
@ 2024-12-10  2:13 ` Andrew Morton
  2024-12-10  6:42   ` chenridong
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2024-12-10  2:13 UTC (permalink / raw)
  To: Chen Ridong
  Cc: mhocko, hannes, yosryahmed, yuzhao, david, willy, ryan.roberts,
	baohua, 21cnbao, wangkefeng.wang, linux-mm, linux-kernel,
	chenridong, wangweiyang2, xieym_ict

On Mon,  9 Dec 2024 08:36:17 +0000 Chen Ridong <chenridong@huaweicloud.com> wrote:

> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> while isolated") only fixed the issue for mglru. However, this issue
> also exists in the traditional active/inactive LRU. Fix this issue
> in the same way for active/inactive lru.
> 
> What is fixed:
> The page reclaim isolates a batch of folios from the tail of one of the
> LRU lists and works on those folios one by one.  For a suitable
> swap-backed folio, if the swap device is async, it queues that folio for
> writeback.  After the page reclaim finishes an entire batch, it puts back
> the folios it queued for writeback to the head of the original LRU list.
> 
> In the meantime, the page writeback flushes the queued folios also by
> batches.  Its batching logic is independent from that of the page reclaim.
> For each of the folios it writes back, the page writeback calls
> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> 
> folio_rotate_reclaimable() only works for a folio after the page reclaim
> has put it back.  If an async swap device is fast enough, the page
> writeback can finish with that folio while the page reclaim is still
> working on the rest of the batch containing it.  In this case, that folio
> will remain at the head and the page reclaim will not retry it before
> reaching there.

For a single patch series I think it's best to just make it a single
patch!  No need for a [0/n]: just put all the info into the patch's
changelog.

The patch doesn't apply to current development kernels.  Please check
the mm-unstable branch of
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/, or
linux-next.

Please replace vmascan with vmscan in the title.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-09  8:36 ` [PATCH v4 1/1] " Chen Ridong
@ 2024-12-10  4:54   ` Barry Song
  2024-12-10  6:41     ` chenridong
  0 siblings, 1 reply; 10+ messages in thread
From: Barry Song @ 2024-12-10  4:54 UTC (permalink / raw)
  To: Chen Ridong
  Cc: akpm, mhocko, hannes, yosryahmed, yuzhao, david, willy,
	ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	chenridong, wangweiyang2, xieym_ict

On Mon, Dec 9, 2024 at 4:46 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
> From: Chen Ridong <chenridong@huawei.com>
>
> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> while isolated") only fixed the issue for mglru. However, this issue
> also exists in the traditional active/inactive LRU. This issue will be
> worse if THP is split, which makes the list longer and needs longer time
> to finish a batch of folios reclaim.
>
> This issue should be fixed in the same way for the traditional LRU.
> Therefore, the common logic was extracted to the 'find_folios_written_back'
> function firstly, which is then reused in the 'shrink_inactive_list'
> function. Finally, retry reclaiming those folios that may have missed the
> rotation for traditional LRU.

let's drop the cover-letter and refine the changelog.

>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  include/linux/mmzone.h |   3 +-
>  mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
>  2 files changed, 77 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b36124145a16..47c6e8c43dcd 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
>
>  #define LRU_GEN_MASK           ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
>  #define LRU_REFS_MASK          ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
> +#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>
>  #ifdef CONFIG_LRU_GEN
>
> @@ -406,8 +407,6 @@ enum {
>         NR_LRU_GEN_CAPS
>  };
>
> -#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
> -
>  #define MIN_LRU_BATCH          BITS_PER_LONG
>  #define MAX_LRU_BATCH          (MIN_LRU_BATCH * 64)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 76378bc257e3..1f0d194f8b2f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
>         task->reclaim_state = rs;
>  }
>
> +/**
> + * find_folios_written_back - Find and move the written back folios to a new list.
> + * @list: filios list
> + * @clean: the written back folios list
> + * @skip: whether skip to move the written back folios to clean list.
> + */
> +static inline void find_folios_written_back(struct list_head *list,
> +               struct list_head *clean, bool skip)
> +{
> +       struct folio *folio;
> +       struct folio *next;
> +
> +       list_for_each_entry_safe_reverse(folio, next, list, lru) {
> +               if (!folio_evictable(folio)) {
> +                       list_del(&folio->lru);
> +                       folio_putback_lru(folio);
> +                       continue;
> +               }
> +
> +               if (folio_test_reclaim(folio) &&
> +                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
> +                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
> +                       if (lru_gen_enabled() && folio_test_workingset(folio))
> +                               folio_set_referenced(folio);
> +                       continue;
> +               }
> +
> +               if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
> +                   folio_mapped(folio) || folio_test_locked(folio) ||
> +                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
> +                       /* don't add rejected folios to the oldest generation */
> +                       if (lru_gen_enabled())
> +                               set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
> +                                             BIT(PG_active));
> +                       continue;
> +               }
> +
> +               /* retry folios that may have missed folio_rotate_reclaimable() */
> +               list_move(&folio->lru, clean);
> +       }
> +}
> +
>  /*
>   * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
>   * scan_control->nr_reclaimed.
> @@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
>         return !(current->flags & PF_LOCAL_THROTTLE);
>  }
>
> +static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
> +               struct reclaim_stat *curr)
> +{
> +       int i;
> +
> +       stat->nr_dirty += curr->nr_dirty;
> +       stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
> +       stat->nr_congested += curr->nr_congested;
> +       stat->nr_writeback += curr->nr_writeback;
> +       stat->nr_immediate += curr->nr_immediate;
> +       stat->nr_pageout += curr->nr_pageout;
> +       stat->nr_ref_keep += curr->nr_ref_keep;
> +       stat->nr_unmap_fail += curr->nr_unmap_fail;
> +       stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
> +       stat->nr_demoted += curr->nr_demoted;
> +       for (i = 0; i < ANON_AND_FILE; i++)
> +               stat->nr_activate[i] = curr->nr_activate[i];
> +}

you had no this before, what's the purpose of this?

> +
>  /*
>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>   * of reclaimed pages
> @@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>                 enum lru_list lru)
>  {
>         LIST_HEAD(folio_list);
> +       LIST_HEAD(clean_list);
>         unsigned long nr_scanned;
>         unsigned int nr_reclaimed = 0;
>         unsigned long nr_taken;
> -       struct reclaim_stat stat;
> +       struct reclaim_stat stat, curr;
>         bool file = is_file_lru(lru);
>         enum vm_event_item item;
>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>         bool stalled = false;
> +       bool skip_retry = false;
>
>         while (unlikely(too_many_isolated(pgdat, file, sc))) {
>                 if (stalled)
> @@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>         if (nr_taken == 0)
>                 return 0;
>
> -       nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
> +       memset(&stat, 0, sizeof(stat));
> +retry:
> +       nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
> +       find_folios_written_back(&folio_list, &clean_list, skip_retry);
> +       acc_reclaimed_stat(&stat, &curr);
>
>         spin_lock_irq(&lruvec->lru_lock);
>         move_folios_to_lru(lruvec, &folio_list);
> +       if (!list_empty(&clean_list)) {
> +               list_splice_init(&clean_list, &folio_list);
> +               skip_retry = true;
> +               spin_unlock_irq(&lruvec->lru_lock);
> +               goto retry;
> +       }
>
>         __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
>                                         stat.nr_demoted);
> @@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>         int reclaimed;
>         LIST_HEAD(list);
>         LIST_HEAD(clean);
> -       struct folio *folio;
> -       struct folio *next;
>         enum vm_event_item item;
>         struct reclaim_stat stat;
>         struct lru_gen_mm_walk *walk;
> @@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>                         scanned, reclaimed, &stat, sc->priority,
>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>
> -       list_for_each_entry_safe_reverse(folio, next, &list, lru) {
> -               if (!folio_evictable(folio)) {
> -                       list_del(&folio->lru);
> -                       folio_putback_lru(folio);
> -                       continue;
> -               }
> -
> -               if (folio_test_reclaim(folio) &&
> -                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
> -                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
> -                       if (folio_test_workingset(folio))
> -                               folio_set_referenced(folio);
> -                       continue;
> -               }
> -
> -               if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
> -                   folio_mapped(folio) || folio_test_locked(folio) ||
> -                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
> -                       /* don't add rejected folios to the oldest generation */
> -                       set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
> -                                     BIT(PG_active));
> -                       continue;
> -               }
> -
> -               /* retry folios that may have missed folio_rotate_reclaimable() */
> -               list_move(&folio->lru, &clean);
> -       }
> -
> +       find_folios_written_back(&list, &clean, skip_retry);
>         spin_lock_irq(&lruvec->lru_lock);
>
>         move_folios_to_lru(lruvec, &list);
> --
> 2.34.1
>

Thanks
Barry

>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-10  4:54   ` Barry Song
@ 2024-12-10  6:41     ` chenridong
  2024-12-10  8:24       ` Barry Song
  0 siblings, 1 reply; 10+ messages in thread
From: chenridong @ 2024-12-10  6:41 UTC (permalink / raw)
  To: Barry Song, Chen Ridong
  Cc: akpm, mhocko, hannes, yosryahmed, yuzhao, david, willy,
	ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict



On 2024/12/10 12:54, Barry Song wrote:
> On Mon, Dec 9, 2024 at 4:46 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
>> while isolated") only fixed the issue for mglru. However, this issue
>> also exists in the traditional active/inactive LRU. This issue will be
>> worse if THP is split, which makes the list longer and needs longer time
>> to finish a batch of folios reclaim.
>>
>> This issue should be fixed in the same way for the traditional LRU.
>> Therefore, the common logic was extracted to the 'find_folios_written_back'
>> function firstly, which is then reused in the 'shrink_inactive_list'
>> function. Finally, retry reclaiming those folios that may have missed the
>> rotation for traditional LRU.
> 
> let's drop the cover-letter and refine the changelog.
> 
Will update.

>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  include/linux/mmzone.h |   3 +-
>>  mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
>>  2 files changed, 77 insertions(+), 34 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index b36124145a16..47c6e8c43dcd 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
>>
>>  #define LRU_GEN_MASK           ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
>>  #define LRU_REFS_MASK          ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
>> +#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>>
>>  #ifdef CONFIG_LRU_GEN
>>
>> @@ -406,8 +407,6 @@ enum {
>>         NR_LRU_GEN_CAPS
>>  };
>>
>> -#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>> -
>>  #define MIN_LRU_BATCH          BITS_PER_LONG
>>  #define MAX_LRU_BATCH          (MIN_LRU_BATCH * 64)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 76378bc257e3..1f0d194f8b2f 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
>>         task->reclaim_state = rs;
>>  }
>>
>> +/**
>> + * find_folios_written_back - Find and move the written back folios to a new list.
>> + * @list: filios list
>> + * @clean: the written back folios list
>> + * @skip: whether skip to move the written back folios to clean list.
>> + */
>> +static inline void find_folios_written_back(struct list_head *list,
>> +               struct list_head *clean, bool skip)
>> +{
>> +       struct folio *folio;
>> +       struct folio *next;
>> +
>> +       list_for_each_entry_safe_reverse(folio, next, list, lru) {
>> +               if (!folio_evictable(folio)) {
>> +                       list_del(&folio->lru);
>> +                       folio_putback_lru(folio);
>> +                       continue;
>> +               }
>> +
>> +               if (folio_test_reclaim(folio) &&
>> +                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
>> +                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
>> +                       if (lru_gen_enabled() && folio_test_workingset(folio))
>> +                               folio_set_referenced(folio);
>> +                       continue;
>> +               }
>> +
>> +               if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
>> +                   folio_mapped(folio) || folio_test_locked(folio) ||
>> +                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
>> +                       /* don't add rejected folios to the oldest generation */
>> +                       if (lru_gen_enabled())
>> +                               set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
>> +                                             BIT(PG_active));
>> +                       continue;
>> +               }
>> +
>> +               /* retry folios that may have missed folio_rotate_reclaimable() */
>> +               list_move(&folio->lru, clean);
>> +       }
>> +}
>> +
>>  /*
>>   * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
>>   * scan_control->nr_reclaimed.
>> @@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
>>         return !(current->flags & PF_LOCAL_THROTTLE);
>>  }
>>
>> +static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
>> +               struct reclaim_stat *curr)
>> +{
>> +       int i;
>> +
>> +       stat->nr_dirty += curr->nr_dirty;
>> +       stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
>> +       stat->nr_congested += curr->nr_congested;
>> +       stat->nr_writeback += curr->nr_writeback;
>> +       stat->nr_immediate += curr->nr_immediate;
>> +       stat->nr_pageout += curr->nr_pageout;
>> +       stat->nr_ref_keep += curr->nr_ref_keep;
>> +       stat->nr_unmap_fail += curr->nr_unmap_fail;
>> +       stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
>> +       stat->nr_demoted += curr->nr_demoted;
>> +       for (i = 0; i < ANON_AND_FILE; i++)
>> +               stat->nr_activate[i] = curr->nr_activate[i];
>> +}
> 
> you had no this before, what's the purpose of this?
> 

We may call shrink_folio_list twice, and the 'stat curr' will reset in
the shrink_folio_list function. We should accumulate the stats as a
whole, which will then be used to calculate the cost and return it to
the caller.

Thanks,
Ridong

>> +
>>  /*
>>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>>   * of reclaimed pages
>> @@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>>                 enum lru_list lru)
>>  {
>>         LIST_HEAD(folio_list);
>> +       LIST_HEAD(clean_list);
>>         unsigned long nr_scanned;
>>         unsigned int nr_reclaimed = 0;
>>         unsigned long nr_taken;
>> -       struct reclaim_stat stat;
>> +       struct reclaim_stat stat, curr;
>>         bool file = is_file_lru(lru);
>>         enum vm_event_item item;
>>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>>         bool stalled = false;
>> +       bool skip_retry = false;
>>
>>         while (unlikely(too_many_isolated(pgdat, file, sc))) {
>>                 if (stalled)
>> @@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>>         if (nr_taken == 0)
>>                 return 0;
>>
>> -       nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
>> +       memset(&stat, 0, sizeof(stat));
>> +retry:
>> +       nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
>> +       find_folios_written_back(&folio_list, &clean_list, skip_retry);
>> +       acc_reclaimed_stat(&stat, &curr);
>>
>>         spin_lock_irq(&lruvec->lru_lock);
>>         move_folios_to_lru(lruvec, &folio_list);
>> +       if (!list_empty(&clean_list)) {
>> +               list_splice_init(&clean_list, &folio_list);
>> +               skip_retry = true;
>> +               spin_unlock_irq(&lruvec->lru_lock);
>> +               goto retry;
>> +       }
>>
>>         __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
>>                                         stat.nr_demoted);
>> @@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>         int reclaimed;
>>         LIST_HEAD(list);
>>         LIST_HEAD(clean);
>> -       struct folio *folio;
>> -       struct folio *next;
>>         enum vm_event_item item;
>>         struct reclaim_stat stat;
>>         struct lru_gen_mm_walk *walk;
>> @@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>                         scanned, reclaimed, &stat, sc->priority,
>>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>>
>> -       list_for_each_entry_safe_reverse(folio, next, &list, lru) {
>> -               if (!folio_evictable(folio)) {
>> -                       list_del(&folio->lru);
>> -                       folio_putback_lru(folio);
>> -                       continue;
>> -               }
>> -
>> -               if (folio_test_reclaim(folio) &&
>> -                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
>> -                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
>> -                       if (folio_test_workingset(folio))
>> -                               folio_set_referenced(folio);
>> -                       continue;
>> -               }
>> -
>> -               if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
>> -                   folio_mapped(folio) || folio_test_locked(folio) ||
>> -                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
>> -                       /* don't add rejected folios to the oldest generation */
>> -                       set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
>> -                                     BIT(PG_active));
>> -                       continue;
>> -               }
>> -
>> -               /* retry folios that may have missed folio_rotate_reclaimable() */
>> -               list_move(&folio->lru, &clean);
>> -       }
>> -
>> +       find_folios_written_back(&list, &clean, skip_retry);
>>         spin_lock_irq(&lruvec->lru_lock);
>>
>>         move_folios_to_lru(lruvec, &list);
>> --
>> 2.34.1
>>
> 
> Thanks
> Barry
> 
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 0/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-10  2:13 ` [PATCH v4 0/1] " Andrew Morton
@ 2024-12-10  6:42   ` chenridong
  0 siblings, 0 replies; 10+ messages in thread
From: chenridong @ 2024-12-10  6:42 UTC (permalink / raw)
  To: Andrew Morton, Chen Ridong
  Cc: mhocko, hannes, yosryahmed, yuzhao, david, willy, ryan.roberts,
	baohua, 21cnbao, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict



On 2024/12/10 10:13, Andrew Morton wrote:
> On Mon,  9 Dec 2024 08:36:17 +0000 Chen Ridong <chenridong@huaweicloud.com> wrote:
> 
>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
>> while isolated") only fixed the issue for mglru. However, this issue
>> also exists in the traditional active/inactive LRU. Fix this issue
>> in the same way for active/inactive lru.
>>
>> What is fixed:
>> The page reclaim isolates a batch of folios from the tail of one of the
>> LRU lists and works on those folios one by one.  For a suitable
>> swap-backed folio, if the swap device is async, it queues that folio for
>> writeback.  After the page reclaim finishes an entire batch, it puts back
>> the folios it queued for writeback to the head of the original LRU list.
>>
>> In the meantime, the page writeback flushes the queued folios also by
>> batches.  Its batching logic is independent from that of the page reclaim.
>> For each of the folios it writes back, the page writeback calls
>> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
>>
>> folio_rotate_reclaimable() only works for a folio after the page reclaim
>> has put it back.  If an async swap device is fast enough, the page
>> writeback can finish with that folio while the page reclaim is still
>> working on the rest of the batch containing it.  In this case, that folio
>> will remain at the head and the page reclaim will not retry it before
>> reaching there.
> 
> For a single patch series I think it's best to just make it a single
> patch!  No need for a [0/n]: just put all the info into the patch's
> changelog.
> 
> The patch doesn't apply to current development kernels.  Please check
> the mm-unstable branch of
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/, or
> linux-next.
> 
> Please replace vmascan with vmscan in the title.

Thanks, Will update.

Best regards,
Ridong


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-10  6:41     ` chenridong
@ 2024-12-10  8:24       ` Barry Song
  2024-12-10 12:11         ` chenridong
  0 siblings, 1 reply; 10+ messages in thread
From: Barry Song @ 2024-12-10  8:24 UTC (permalink / raw)
  To: chenridong
  Cc: Chen Ridong, akpm, mhocko, hannes, yosryahmed, yuzhao, david,
	willy, ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict

On Tue, Dec 10, 2024 at 2:41 PM chenridong <chenridong@huawei.com> wrote:
>
>
>
> On 2024/12/10 12:54, Barry Song wrote:
> > On Mon, Dec 9, 2024 at 4:46 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>
> >> From: Chen Ridong <chenridong@huawei.com>
> >>
> >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> >> while isolated") only fixed the issue for mglru. However, this issue
> >> also exists in the traditional active/inactive LRU. This issue will be
> >> worse if THP is split, which makes the list longer and needs longer time
> >> to finish a batch of folios reclaim.
> >>
> >> This issue should be fixed in the same way for the traditional LRU.
> >> Therefore, the common logic was extracted to the 'find_folios_written_back'
> >> function firstly, which is then reused in the 'shrink_inactive_list'
> >> function. Finally, retry reclaiming those folios that may have missed the
> >> rotation for traditional LRU.
> >
> > let's drop the cover-letter and refine the changelog.
> >
> Will update.
>
> >>
> >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >> ---
> >>  include/linux/mmzone.h |   3 +-
> >>  mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
> >>  2 files changed, 77 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index b36124145a16..47c6e8c43dcd 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
> >>
> >>  #define LRU_GEN_MASK           ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
> >>  #define LRU_REFS_MASK          ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
> >> +#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
> >>
> >>  #ifdef CONFIG_LRU_GEN
> >>
> >> @@ -406,8 +407,6 @@ enum {
> >>         NR_LRU_GEN_CAPS
> >>  };
> >>
> >> -#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
> >> -
> >>  #define MIN_LRU_BATCH          BITS_PER_LONG
> >>  #define MAX_LRU_BATCH          (MIN_LRU_BATCH * 64)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index 76378bc257e3..1f0d194f8b2f 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
> >>         task->reclaim_state = rs;
> >>  }
> >>
> >> +/**
> >> + * find_folios_written_back - Find and move the written back folios to a new list.
> >> + * @list: filios list
> >> + * @clean: the written back folios list
> >> + * @skip: whether skip to move the written back folios to clean list.
> >> + */
> >> +static inline void find_folios_written_back(struct list_head *list,
> >> +               struct list_head *clean, bool skip)
> >> +{
> >> +       struct folio *folio;
> >> +       struct folio *next;
> >> +
> >> +       list_for_each_entry_safe_reverse(folio, next, list, lru) {
> >> +               if (!folio_evictable(folio)) {
> >> +                       list_del(&folio->lru);
> >> +                       folio_putback_lru(folio);
> >> +                       continue;
> >> +               }
> >> +
> >> +               if (folio_test_reclaim(folio) &&
> >> +                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
> >> +                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
> >> +                       if (lru_gen_enabled() && folio_test_workingset(folio))
> >> +                               folio_set_referenced(folio);
> >> +                       continue;
> >> +               }
> >> +
> >> +               if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
> >> +                   folio_mapped(folio) || folio_test_locked(folio) ||
> >> +                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
> >> +                       /* don't add rejected folios to the oldest generation */
> >> +                       if (lru_gen_enabled())
> >> +                               set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
> >> +                                             BIT(PG_active));
> >> +                       continue;
> >> +               }
> >> +
> >> +               /* retry folios that may have missed folio_rotate_reclaimable() */
> >> +               list_move(&folio->lru, clean);
> >> +       }
> >> +}
> >> +
> >>  /*
> >>   * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
> >>   * scan_control->nr_reclaimed.
> >> @@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
> >>         return !(current->flags & PF_LOCAL_THROTTLE);
> >>  }
> >>
> >> +static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
> >> +               struct reclaim_stat *curr)
> >> +{
> >> +       int i;
> >> +
> >> +       stat->nr_dirty += curr->nr_dirty;
> >> +       stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
> >> +       stat->nr_congested += curr->nr_congested;
> >> +       stat->nr_writeback += curr->nr_writeback;
> >> +       stat->nr_immediate += curr->nr_immediate;
> >> +       stat->nr_pageout += curr->nr_pageout;
> >> +       stat->nr_ref_keep += curr->nr_ref_keep;
> >> +       stat->nr_unmap_fail += curr->nr_unmap_fail;
> >> +       stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
> >> +       stat->nr_demoted += curr->nr_demoted;
> >> +       for (i = 0; i < ANON_AND_FILE; i++)
> >> +               stat->nr_activate[i] = curr->nr_activate[i];
> >> +}
> >
> > you had no this before, what's the purpose of this?
> >
>
> We may call shrink_folio_list twice, and the 'stat curr' will reset in
> the shrink_folio_list function. We should accumulate the stats as a
> whole, which will then be used to calculate the cost and return it to
> the caller.

Does mglru have the same issue? If so, we may need to send a patch to
fix mglru's stat accounting as well. By the way, the code is rather
messy—could it be implemented as shown below instead?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1f0d194f8b2f..40d2ddde21f5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1094,7 +1094,6 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
  struct swap_iocb *plug = NULL;

  folio_batch_init(&free_folios);
- memset(stat, 0, sizeof(*stat));
  cond_resched();
  do_demote_pass = can_demote(pgdat->node_id, sc);

@@ -1949,25 +1948,6 @@ static int current_may_throttle(void)
  return !(current->flags & PF_LOCAL_THROTTLE);
 }

-static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
- struct reclaim_stat *curr)
-{
- int i;
-
- stat->nr_dirty += curr->nr_dirty;
- stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
- stat->nr_congested += curr->nr_congested;
- stat->nr_writeback += curr->nr_writeback;
- stat->nr_immediate += curr->nr_immediate;
- stat->nr_pageout += curr->nr_pageout;
- stat->nr_ref_keep += curr->nr_ref_keep;
- stat->nr_unmap_fail += curr->nr_unmap_fail;
- stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
- stat->nr_demoted += curr->nr_demoted;
- for (i = 0; i < ANON_AND_FILE; i++)
- stat->nr_activate[i] = curr->nr_activate[i];
-}
-
 /*
  * shrink_inactive_list() is a helper for shrink_node().  It returns the number
  * of reclaimed pages
@@ -1981,7 +1961,7 @@ static unsigned long
shrink_inactive_list(unsigned long nr_to_scan,
  unsigned long nr_scanned;
  unsigned int nr_reclaimed = 0;
  unsigned long nr_taken;
- struct reclaim_stat stat, curr;
+ struct reclaim_stat stat;
  bool file = is_file_lru(lru);
  enum vm_event_item item;
  struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -2022,9 +2002,8 @@ static unsigned long
shrink_inactive_list(unsigned long nr_to_scan,

  memset(&stat, 0, sizeof(stat));
 retry:
- nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
+ nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
  find_folios_written_back(&folio_list, &clean_list, skip_retry);
- acc_reclaimed_stat(&stat, &curr);

  spin_lock_irq(&lruvec->lru_lock);
  move_folios_to_lru(lruvec, &folio_list);

>
> Thanks,
> Ridong
>
> >> +
> >>  /*
> >>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
> >>   * of reclaimed pages
> >> @@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> >>                 enum lru_list lru)
> >>  {
> >>         LIST_HEAD(folio_list);
> >> +       LIST_HEAD(clean_list);
> >>         unsigned long nr_scanned;
> >>         unsigned int nr_reclaimed = 0;
> >>         unsigned long nr_taken;
> >> -       struct reclaim_stat stat;
> >> +       struct reclaim_stat stat, curr;
> >>         bool file = is_file_lru(lru);
> >>         enum vm_event_item item;
> >>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> >>         bool stalled = false;
> >> +       bool skip_retry = false;
> >>
> >>         while (unlikely(too_many_isolated(pgdat, file, sc))) {
> >>                 if (stalled)
> >> @@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> >>         if (nr_taken == 0)
> >>                 return 0;
> >>
> >> -       nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
> >> +       memset(&stat, 0, sizeof(stat));
> >> +retry:
> >> +       nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
> >> +       find_folios_written_back(&folio_list, &clean_list, skip_retry);
> >> +       acc_reclaimed_stat(&stat, &curr);
> >>
> >>         spin_lock_irq(&lruvec->lru_lock);
> >>         move_folios_to_lru(lruvec, &folio_list);
> >> +       if (!list_empty(&clean_list)) {
> >> +               list_splice_init(&clean_list, &folio_list);
> >> +               skip_retry = true;
> >> +               spin_unlock_irq(&lruvec->lru_lock);
> >> +               goto retry;

This is rather confusing. We're still jumping to retry even though
skip_retry=true is set. Can we find a clearer approach for this?

It was somewhat acceptable before we introduced the extracted
function find_folios_written_back(). However, it has become
harder to follow now that skip_retry is passed across functions.

I find renaming skip_retry to is_retry more intuitive. The logic
is that since we are already retrying, find_folios_written_back()
shouldn’t move folios to the clean list again. The intended semantics
are: we have retris, don’t retry again.


> >> +       }
> >>
> >>         __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
> >>                                         stat.nr_demoted);
> >> @@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
> >>         int reclaimed;
> >>         LIST_HEAD(list);
> >>         LIST_HEAD(clean);
> >> -       struct folio *folio;
> >> -       struct folio *next;
> >>         enum vm_event_item item;
> >>         struct reclaim_stat stat;
> >>         struct lru_gen_mm_walk *walk;
> >> @@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
> >>                         scanned, reclaimed, &stat, sc->priority,
> >>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> >>
> >> -       list_for_each_entry_safe_reverse(folio, next, &list, lru) {
> >> -               if (!folio_evictable(folio)) {
> >> -                       list_del(&folio->lru);
> >> -                       folio_putback_lru(folio);
> >> -                       continue;
> >> -               }
> >> -
> >> -               if (folio_test_reclaim(folio) &&
> >> -                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
> >> -                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
> >> -                       if (folio_test_workingset(folio))
> >> -                               folio_set_referenced(folio);
> >> -                       continue;
> >> -               }
> >> -
> >> -               if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
> >> -                   folio_mapped(folio) || folio_test_locked(folio) ||
> >> -                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
> >> -                       /* don't add rejected folios to the oldest generation */
> >> -                       set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
> >> -                                     BIT(PG_active));
> >> -                       continue;
> >> -               }
> >> -
> >> -               /* retry folios that may have missed folio_rotate_reclaimable() */
> >> -               list_move(&folio->lru, &clean);
> >> -       }
> >> -
> >> +       find_folios_written_back(&list, &clean, skip_retry);
> >>         spin_lock_irq(&lruvec->lru_lock);
> >>
> >>         move_folios_to_lru(lruvec, &list);
> >> --
> >> 2.34.1
> >>
> >

Thanks
 Barry


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-10  8:24       ` Barry Song
@ 2024-12-10 12:11         ` chenridong
  2024-12-12 23:17           ` Barry Song
  0 siblings, 1 reply; 10+ messages in thread
From: chenridong @ 2024-12-10 12:11 UTC (permalink / raw)
  To: Barry Song
  Cc: Chen Ridong, akpm, mhocko, hannes, yosryahmed, yuzhao, david,
	willy, ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict



On 2024/12/10 16:24, Barry Song wrote:
> On Tue, Dec 10, 2024 at 2:41 PM chenridong <chenridong@huawei.com> wrote:
>>
>>
>>
>> On 2024/12/10 12:54, Barry Song wrote:
>>> On Mon, Dec 9, 2024 at 4:46 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
>>>> while isolated") only fixed the issue for mglru. However, this issue
>>>> also exists in the traditional active/inactive LRU. This issue will be
>>>> worse if THP is split, which makes the list longer and needs longer time
>>>> to finish a batch of folios reclaim.
>>>>
>>>> This issue should be fixed in the same way for the traditional LRU.
>>>> Therefore, the common logic was extracted to the 'find_folios_written_back'
>>>> function firstly, which is then reused in the 'shrink_inactive_list'
>>>> function. Finally, retry reclaiming those folios that may have missed the
>>>> rotation for traditional LRU.
>>>
>>> let's drop the cover-letter and refine the changelog.
>>>
>> Will update.
>>
>>>>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>  include/linux/mmzone.h |   3 +-
>>>>  mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
>>>>  2 files changed, 77 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index b36124145a16..47c6e8c43dcd 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
>>>>
>>>>  #define LRU_GEN_MASK           ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
>>>>  #define LRU_REFS_MASK          ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
>>>> +#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>>>>
>>>>  #ifdef CONFIG_LRU_GEN
>>>>
>>>> @@ -406,8 +407,6 @@ enum {
>>>>         NR_LRU_GEN_CAPS
>>>>  };
>>>>
>>>> -#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>>>> -
>>>>  #define MIN_LRU_BATCH          BITS_PER_LONG
>>>>  #define MAX_LRU_BATCH          (MIN_LRU_BATCH * 64)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 76378bc257e3..1f0d194f8b2f 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
>>>>         task->reclaim_state = rs;
>>>>  }
>>>>
>>>> +/**
>>>> + * find_folios_written_back - Find and move the written back folios to a new list.
>>>> + * @list: filios list
>>>> + * @clean: the written back folios list
>>>> + * @skip: whether skip to move the written back folios to clean list.
>>>> + */
>>>> +static inline void find_folios_written_back(struct list_head *list,
>>>> +               struct list_head *clean, bool skip)
>>>> +{
>>>> +       struct folio *folio;
>>>> +       struct folio *next;
>>>> +
>>>> +       list_for_each_entry_safe_reverse(folio, next, list, lru) {
>>>> +               if (!folio_evictable(folio)) {
>>>> +                       list_del(&folio->lru);
>>>> +                       folio_putback_lru(folio);
>>>> +                       continue;
>>>> +               }
>>>> +
>>>> +               if (folio_test_reclaim(folio) &&
>>>> +                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
>>>> +                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
>>>> +                       if (lru_gen_enabled() && folio_test_workingset(folio))
>>>> +                               folio_set_referenced(folio);
>>>> +                       continue;
>>>> +               }
>>>> +
>>>> +               if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
>>>> +                   folio_mapped(folio) || folio_test_locked(folio) ||
>>>> +                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
>>>> +                       /* don't add rejected folios to the oldest generation */
>>>> +                       if (lru_gen_enabled())
>>>> +                               set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
>>>> +                                             BIT(PG_active));
>>>> +                       continue;
>>>> +               }
>>>> +
>>>> +               /* retry folios that may have missed folio_rotate_reclaimable() */
>>>> +               list_move(&folio->lru, clean);
>>>> +       }
>>>> +}
>>>> +
>>>>  /*
>>>>   * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
>>>>   * scan_control->nr_reclaimed.
>>>> @@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
>>>>         return !(current->flags & PF_LOCAL_THROTTLE);
>>>>  }
>>>>
>>>> +static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
>>>> +               struct reclaim_stat *curr)
>>>> +{
>>>> +       int i;
>>>> +
>>>> +       stat->nr_dirty += curr->nr_dirty;
>>>> +       stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
>>>> +       stat->nr_congested += curr->nr_congested;
>>>> +       stat->nr_writeback += curr->nr_writeback;
>>>> +       stat->nr_immediate += curr->nr_immediate;
>>>> +       stat->nr_pageout += curr->nr_pageout;
>>>> +       stat->nr_ref_keep += curr->nr_ref_keep;
>>>> +       stat->nr_unmap_fail += curr->nr_unmap_fail;
>>>> +       stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
>>>> +       stat->nr_demoted += curr->nr_demoted;
>>>> +       for (i = 0; i < ANON_AND_FILE; i++)
>>>> +               stat->nr_activate[i] = curr->nr_activate[i];
>>>> +}
>>>
>>> you had no this before, what's the purpose of this?
>>>
>>
>> We may call shrink_folio_list twice, and the 'stat curr' will reset in
>> the shrink_folio_list function. We should accumulate the stats as a
>> whole, which will then be used to calculate the cost and return it to
>> the caller.
> 
> Does mglru have the same issue? If so, we may need to send a patch to
> fix mglru's stat accounting as well. By the way, the code is rather
> messy—could it be implemented as shown below instead?
> 

I have checked the code (in the evict_folios function) again, and it
appears that 'reclaimed' should correspond to sc->nr_reclaimed, which
accumulates the results twice. Should I address this issue with a
separate patch?

if (!cgroup_reclaim(sc))
	__count_vm_events(item, reclaimed);
__count_memcg_events(memcg, item, reclaimed);
__count_vm_events(PGSTEAL_ANON + type, reclaimed);


> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1f0d194f8b2f..40d2ddde21f5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1094,7 +1094,6 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   struct swap_iocb *plug = NULL;
> 
>   folio_batch_init(&free_folios);
> - memset(stat, 0, sizeof(*stat));
>   cond_resched();
>   do_demote_pass = can_demote(pgdat->node_id, sc);
> 
> @@ -1949,25 +1948,6 @@ static int current_may_throttle(void)
>   return !(current->flags & PF_LOCAL_THROTTLE);
>  }
> 
> -static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
> - struct reclaim_stat *curr)
> -{
> - int i;
> -
> - stat->nr_dirty += curr->nr_dirty;
> - stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
> - stat->nr_congested += curr->nr_congested;
> - stat->nr_writeback += curr->nr_writeback;
> - stat->nr_immediate += curr->nr_immediate;
> - stat->nr_pageout += curr->nr_pageout;
> - stat->nr_ref_keep += curr->nr_ref_keep;
> - stat->nr_unmap_fail += curr->nr_unmap_fail;
> - stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
> - stat->nr_demoted += curr->nr_demoted;
> - for (i = 0; i < ANON_AND_FILE; i++)
> - stat->nr_activate[i] = curr->nr_activate[i];
> -}
> -
>  /*
>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>   * of reclaimed pages
> @@ -1981,7 +1961,7 @@ static unsigned long
> shrink_inactive_list(unsigned long nr_to_scan,
>   unsigned long nr_scanned;
>   unsigned int nr_reclaimed = 0;
>   unsigned long nr_taken;
> - struct reclaim_stat stat, curr;
> + struct reclaim_stat stat;
>   bool file = is_file_lru(lru);
>   enum vm_event_item item;
>   struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> @@ -2022,9 +2002,8 @@ static unsigned long
> shrink_inactive_list(unsigned long nr_to_scan,
> 
>   memset(&stat, 0, sizeof(stat));
>  retry:
> - nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
> + nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
>   find_folios_written_back(&folio_list, &clean_list, skip_retry);
> - acc_reclaimed_stat(&stat, &curr);
> 
>   spin_lock_irq(&lruvec->lru_lock);
>   move_folios_to_lru(lruvec, &folio_list);
> 

This seems much better. But we have extras works to do:

1. In the shrink_folio_list function:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1089,12 +1089,12 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
        LIST_HEAD(ret_folios);
        LIST_HEAD(demote_folios);
        unsigned int nr_reclaimed = 0;
-       unsigned int pgactivate = 0;
+       unsigned int pgactivate = stat->nr_activate[0] +
stat->nr_activate[1];
+       unsigned int nr_demote = 0;
        bool do_demote_pass;
        struct swap_iocb *plug = NULL;

        folio_batch_init(&free_folios);
-       memset(stat, 0, sizeof(*stat));
        cond_resched();
        do_demote_pass = can_demote(pgdat->node_id, sc);

@@ -1558,7 +1558,8 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,

        /* Migrate folios selected for demotion */
        stat->nr_demoted = demote_folio_list(&demote_folios, pgdat);
-       nr_reclaimed += stat->nr_demoted;
+       stat->nr_demoted += nr_demote;
+       nr_reclaimed += nr_demote;
        /* Folios that could not be demoted are still in @demote_folios */
        if (!list_empty(&demote_folios)) {
                /* Folios which weren't demoted go back on @folio_list */
@@ -1586,7 +1587,7 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
                }
        }

-       pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
+       pgactivate = stat->nr_activate[0] + stat->nr_activate[1] -
pgactivate;

        mem_cgroup_uncharge_folios(&free_folios);
        try_to_unmap_flush();


2. Outsize of the shrink_folio_list function, The callers should memset
the stat.

If you think this will be better, I will update like this.

>>
>> Thanks,
>> Ridong
>>
>>>> +
>>>>  /*
>>>>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>>>>   * of reclaimed pages
>>>> @@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>>>>                 enum lru_list lru)
>>>>  {
>>>>         LIST_HEAD(folio_list);
>>>> +       LIST_HEAD(clean_list);
>>>>         unsigned long nr_scanned;
>>>>         unsigned int nr_reclaimed = 0;
>>>>         unsigned long nr_taken;
>>>> -       struct reclaim_stat stat;
>>>> +       struct reclaim_stat stat, curr;
>>>>         bool file = is_file_lru(lru);
>>>>         enum vm_event_item item;
>>>>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>>>>         bool stalled = false;
>>>> +       bool skip_retry = false;
>>>>
>>>>         while (unlikely(too_many_isolated(pgdat, file, sc))) {
>>>>                 if (stalled)
>>>> @@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>>>>         if (nr_taken == 0)
>>>>                 return 0;
>>>>
>>>> -       nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
>>>> +       memset(&stat, 0, sizeof(stat));
>>>> +retry:
>>>> +       nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
>>>> +       find_folios_written_back(&folio_list, &clean_list, skip_retry);
>>>> +       acc_reclaimed_stat(&stat, &curr);
>>>>
>>>>         spin_lock_irq(&lruvec->lru_lock);
>>>>         move_folios_to_lru(lruvec, &folio_list);
>>>> +       if (!list_empty(&clean_list)) {
>>>> +               list_splice_init(&clean_list, &folio_list);
>>>> +               skip_retry = true;
>>>> +               spin_unlock_irq(&lruvec->lru_lock);
>>>> +               goto retry;
> 
> This is rather confusing. We're still jumping to retry even though
> skip_retry=true is set. Can we find a clearer approach for this?
> 
> It was somewhat acceptable before we introduced the extracted
> function find_folios_written_back(). However, it has become
> harder to follow now that skip_retry is passed across functions.
> 
> I find renaming skip_retry to is_retry more intuitive. The logic
> is that since we are already retrying, find_folios_written_back()
> shouldn’t move folios to the clean list again. The intended semantics
> are: we have retris, don’t retry again.
> 

Reasonable. Will update.

Thanks,
Ridong

> 
>>>> +       }
>>>>
>>>>         __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
>>>>                                         stat.nr_demoted);
>>>> @@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>>>         int reclaimed;
>>>>         LIST_HEAD(list);
>>>>         LIST_HEAD(clean);
>>>> -       struct folio *folio;
>>>> -       struct folio *next;
>>>>         enum vm_event_item item;
>>>>         struct reclaim_stat stat;
>>>>         struct lru_gen_mm_walk *walk;
>>>> @@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>>>                         scanned, reclaimed, &stat, sc->priority,
>>>>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>>>>
>>>> -       list_for_each_entry_safe_reverse(folio, next, &list, lru) {
>>>> -               if (!folio_evictable(folio)) {
>>>> -                       list_del(&folio->lru);
>>>> -                       folio_putback_lru(folio);
>>>> -                       continue;
>>>> -               }
>>>> -
>>>> -               if (folio_test_reclaim(folio) &&
>>>> -                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
>>>> -                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
>>>> -                       if (folio_test_workingset(folio))
>>>> -                               folio_set_referenced(folio);
>>>> -                       continue;
>>>> -               }
>>>> -
>>>> -               if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
>>>> -                   folio_mapped(folio) || folio_test_locked(folio) ||
>>>> -                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
>>>> -                       /* don't add rejected folios to the oldest generation */
>>>> -                       set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
>>>> -                                     BIT(PG_active));
>>>> -                       continue;
>>>> -               }
>>>> -
>>>> -               /* retry folios that may have missed folio_rotate_reclaimable() */
>>>> -               list_move(&folio->lru, &clean);
>>>> -       }
>>>> -
>>>> +       find_folios_written_back(&list, &clean, skip_retry);
>>>>         spin_lock_irq(&lruvec->lru_lock);
>>>>
>>>>         move_folios_to_lru(lruvec, &list);
>>>> --
>>>> 2.34.1
>>>>
>>>
> 
> Thanks
>  Barry


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-10 12:11         ` chenridong
@ 2024-12-12 23:17           ` Barry Song
  2024-12-13  0:54             ` chenridong
  0 siblings, 1 reply; 10+ messages in thread
From: Barry Song @ 2024-12-12 23:17 UTC (permalink / raw)
  To: chenridong
  Cc: Chen Ridong, akpm, mhocko, hannes, yosryahmed, yuzhao, david,
	willy, ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict

On Wed, Dec 11, 2024 at 1:11 AM chenridong <chenridong@huawei.com> wrote:
>
>
>
> On 2024/12/10 16:24, Barry Song wrote:
> > On Tue, Dec 10, 2024 at 2:41 PM chenridong <chenridong@huawei.com> wrote:
> >>
> >>
> >>
> >> On 2024/12/10 12:54, Barry Song wrote:
> >>> On Mon, Dec 9, 2024 at 4:46 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>>>
> >>>> From: Chen Ridong <chenridong@huawei.com>
> >>>>
> >>>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> >>>> while isolated") only fixed the issue for mglru. However, this issue
> >>>> also exists in the traditional active/inactive LRU. This issue will be
> >>>> worse if THP is split, which makes the list longer and needs longer time
> >>>> to finish a batch of folios reclaim.
> >>>>
> >>>> This issue should be fixed in the same way for the traditional LRU.
> >>>> Therefore, the common logic was extracted to the 'find_folios_written_back'
> >>>> function firstly, which is then reused in the 'shrink_inactive_list'
> >>>> function. Finally, retry reclaiming those folios that may have missed the
> >>>> rotation for traditional LRU.
> >>>
> >>> let's drop the cover-letter and refine the changelog.
> >>>
> >> Will update.
> >>
> >>>>
> >>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >>>> ---
> >>>>  include/linux/mmzone.h |   3 +-
> >>>>  mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
> >>>>  2 files changed, 77 insertions(+), 34 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >>>> index b36124145a16..47c6e8c43dcd 100644
> >>>> --- a/include/linux/mmzone.h
> >>>> +++ b/include/linux/mmzone.h
> >>>> @@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
> >>>>
> >>>>  #define LRU_GEN_MASK           ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
> >>>>  #define LRU_REFS_MASK          ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
> >>>> +#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
> >>>>
> >>>>  #ifdef CONFIG_LRU_GEN
> >>>>
> >>>> @@ -406,8 +407,6 @@ enum {
> >>>>         NR_LRU_GEN_CAPS
> >>>>  };
> >>>>
> >>>> -#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
> >>>> -
> >>>>  #define MIN_LRU_BATCH          BITS_PER_LONG
> >>>>  #define MAX_LRU_BATCH          (MIN_LRU_BATCH * 64)
> >>>>
> >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>>> index 76378bc257e3..1f0d194f8b2f 100644
> >>>> --- a/mm/vmscan.c
> >>>> +++ b/mm/vmscan.c
> >>>> @@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
> >>>>         task->reclaim_state = rs;
> >>>>  }
> >>>>
> >>>> +/**
> >>>> + * find_folios_written_back - Find and move the written back folios to a new list.
> >>>> + * @list: filios list
> >>>> + * @clean: the written back folios list
> >>>> + * @skip: whether skip to move the written back folios to clean list.
> >>>> + */
> >>>> +static inline void find_folios_written_back(struct list_head *list,
> >>>> +               struct list_head *clean, bool skip)
> >>>> +{
> >>>> +       struct folio *folio;
> >>>> +       struct folio *next;
> >>>> +
> >>>> +       list_for_each_entry_safe_reverse(folio, next, list, lru) {
> >>>> +               if (!folio_evictable(folio)) {
> >>>> +                       list_del(&folio->lru);
> >>>> +                       folio_putback_lru(folio);
> >>>> +                       continue;
> >>>> +               }
> >>>> +
> >>>> +               if (folio_test_reclaim(folio) &&
> >>>> +                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
> >>>> +                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
> >>>> +                       if (lru_gen_enabled() && folio_test_workingset(folio))
> >>>> +                               folio_set_referenced(folio);
> >>>> +                       continue;
> >>>> +               }
> >>>> +
> >>>> +               if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
> >>>> +                   folio_mapped(folio) || folio_test_locked(folio) ||
> >>>> +                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
> >>>> +                       /* don't add rejected folios to the oldest generation */
> >>>> +                       if (lru_gen_enabled())
> >>>> +                               set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
> >>>> +                                             BIT(PG_active));
> >>>> +                       continue;
> >>>> +               }
> >>>> +
> >>>> +               /* retry folios that may have missed folio_rotate_reclaimable() */
> >>>> +               list_move(&folio->lru, clean);
> >>>> +       }
> >>>> +}
> >>>> +
> >>>>  /*
> >>>>   * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
> >>>>   * scan_control->nr_reclaimed.
> >>>> @@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
> >>>>         return !(current->flags & PF_LOCAL_THROTTLE);
> >>>>  }
> >>>>
> >>>> +static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
> >>>> +               struct reclaim_stat *curr)
> >>>> +{
> >>>> +       int i;
> >>>> +
> >>>> +       stat->nr_dirty += curr->nr_dirty;
> >>>> +       stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
> >>>> +       stat->nr_congested += curr->nr_congested;
> >>>> +       stat->nr_writeback += curr->nr_writeback;
> >>>> +       stat->nr_immediate += curr->nr_immediate;
> >>>> +       stat->nr_pageout += curr->nr_pageout;
> >>>> +       stat->nr_ref_keep += curr->nr_ref_keep;
> >>>> +       stat->nr_unmap_fail += curr->nr_unmap_fail;
> >>>> +       stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
> >>>> +       stat->nr_demoted += curr->nr_demoted;
> >>>> +       for (i = 0; i < ANON_AND_FILE; i++)
> >>>> +               stat->nr_activate[i] = curr->nr_activate[i];
> >>>> +}
> >>>
> >>> you had no this before, what's the purpose of this?
> >>>
> >>
> >> We may call shrink_folio_list twice, and the 'stat curr' will reset in
> >> the shrink_folio_list function. We should accumulate the stats as a
> >> whole, which will then be used to calculate the cost and return it to
> >> the caller.
> >
> > Does mglru have the same issue? If so, we may need to send a patch to
> > fix mglru's stat accounting as well. By the way, the code is rather
> > messy—could it be implemented as shown below instead?
> >
>
> I have checked the code (in the evict_folios function) again, and it
> appears that 'reclaimed' should correspond to sc->nr_reclaimed, which
> accumulates the results twice. Should I address this issue with a
> separate patch?

I don't think there is any problem.

        reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
        sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
        sc->nr_reclaimed += reclaimed;

reclaimed is always the number of pages we have reclaimed in
shrink_folio_list() no matter if it is retry or not.

>
> if (!cgroup_reclaim(sc))
>         __count_vm_events(item, reclaimed);
> __count_memcg_events(memcg, item, reclaimed);
> __count_vm_events(PGSTEAL_ANON + type, reclaimed);
>
>
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1f0d194f8b2f..40d2ddde21f5 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1094,7 +1094,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> >   struct swap_iocb *plug = NULL;
> >
> >   folio_batch_init(&free_folios);
> > - memset(stat, 0, sizeof(*stat));
> >   cond_resched();
> >   do_demote_pass = can_demote(pgdat->node_id, sc);
> >
> > @@ -1949,25 +1948,6 @@ static int current_may_throttle(void)
> >   return !(current->flags & PF_LOCAL_THROTTLE);
> >  }
> >
> > -static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
> > - struct reclaim_stat *curr)
> > -{
> > - int i;
> > -
> > - stat->nr_dirty += curr->nr_dirty;
> > - stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
> > - stat->nr_congested += curr->nr_congested;
> > - stat->nr_writeback += curr->nr_writeback;
> > - stat->nr_immediate += curr->nr_immediate;
> > - stat->nr_pageout += curr->nr_pageout;
> > - stat->nr_ref_keep += curr->nr_ref_keep;
> > - stat->nr_unmap_fail += curr->nr_unmap_fail;
> > - stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
> > - stat->nr_demoted += curr->nr_demoted;
> > - for (i = 0; i < ANON_AND_FILE; i++)
> > - stat->nr_activate[i] = curr->nr_activate[i];
> > -}
> > -
> >  /*
> >   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
> >   * of reclaimed pages
> > @@ -1981,7 +1961,7 @@ static unsigned long
> > shrink_inactive_list(unsigned long nr_to_scan,
> >   unsigned long nr_scanned;
> >   unsigned int nr_reclaimed = 0;
> >   unsigned long nr_taken;
> > - struct reclaim_stat stat, curr;
> > + struct reclaim_stat stat;
> >   bool file = is_file_lru(lru);
> >   enum vm_event_item item;
> >   struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> > @@ -2022,9 +2002,8 @@ static unsigned long
> > shrink_inactive_list(unsigned long nr_to_scan,
> >
> >   memset(&stat, 0, sizeof(stat));
> >  retry:
> > - nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
> > + nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
> >   find_folios_written_back(&folio_list, &clean_list, skip_retry);
> > - acc_reclaimed_stat(&stat, &curr);
> >
> >   spin_lock_irq(&lruvec->lru_lock);
> >   move_folios_to_lru(lruvec, &folio_list);
> >
>
> This seems much better. But we have extras works to do:
>
> 1. In the shrink_folio_list function:
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1089,12 +1089,12 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>         LIST_HEAD(ret_folios);
>         LIST_HEAD(demote_folios);
>         unsigned int nr_reclaimed = 0;
> -       unsigned int pgactivate = 0;
> +       unsigned int pgactivate = stat->nr_activate[0] +
> stat->nr_activate[1];
> +       unsigned int nr_demote = 0;
>         bool do_demote_pass;
>         struct swap_iocb *plug = NULL;
>
>         folio_batch_init(&free_folios);
> -       memset(stat, 0, sizeof(*stat));
>         cond_resched();
>         do_demote_pass = can_demote(pgdat->node_id, sc);
>
> @@ -1558,7 +1558,8 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>
>         /* Migrate folios selected for demotion */
>         stat->nr_demoted = demote_folio_list(&demote_folios, pgdat);
> -       nr_reclaimed += stat->nr_demoted;
> +       stat->nr_demoted += nr_demote;
> +       nr_reclaimed += nr_demote;
>         /* Folios that could not be demoted are still in @demote_folios */
>         if (!list_empty(&demote_folios)) {
>                 /* Folios which weren't demoted go back on @folio_list */
> @@ -1586,7 +1587,7 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>                 }
>         }
>
> -       pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
> +       pgactivate = stat->nr_activate[0] + stat->nr_activate[1] -
> pgactivate;
>
>         mem_cgroup_uncharge_folios(&free_folios);
>         try_to_unmap_flush();
>
>
> 2. Outsize of the shrink_folio_list function, The callers should memset
> the stat.
>
> If you think this will be better, I will update like this.

no. Please goto retry after you have collected all you need from
`stat` just as mglru
is doing, drop the "curr" and acc_reclaimed_stat().

        sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
        sc->nr_reclaimed += reclaimed;

move_folios_to_lru() has helped moving all uninterested folios back to lruvec
before you retry. There is no duplicated counting.

>
> >>
> >> Thanks,
> >> Ridong
> >>
> >>>> +
> >>>>  /*
> >>>>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
> >>>>   * of reclaimed pages
> >>>> @@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> >>>>                 enum lru_list lru)
> >>>>  {
> >>>>         LIST_HEAD(folio_list);
> >>>> +       LIST_HEAD(clean_list);
> >>>>         unsigned long nr_scanned;
> >>>>         unsigned int nr_reclaimed = 0;
> >>>>         unsigned long nr_taken;
> >>>> -       struct reclaim_stat stat;
> >>>> +       struct reclaim_stat stat, curr;
> >>>>         bool file = is_file_lru(lru);
> >>>>         enum vm_event_item item;
> >>>>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> >>>>         bool stalled = false;
> >>>> +       bool skip_retry = false;
> >>>>
> >>>>         while (unlikely(too_many_isolated(pgdat, file, sc))) {
> >>>>                 if (stalled)
> >>>> @@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> >>>>         if (nr_taken == 0)
> >>>>                 return 0;
> >>>>
> >>>> -       nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
> >>>> +       memset(&stat, 0, sizeof(stat));
> >>>> +retry:
> >>>> +       nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
> >>>> +       find_folios_written_back(&folio_list, &clean_list, skip_retry);
> >>>> +       acc_reclaimed_stat(&stat, &curr);
> >>>>
> >>>>         spin_lock_irq(&lruvec->lru_lock);
> >>>>         move_folios_to_lru(lruvec, &folio_list);
> >>>> +       if (!list_empty(&clean_list)) {
> >>>> +               list_splice_init(&clean_list, &folio_list);
> >>>> +               skip_retry = true;
> >>>> +               spin_unlock_irq(&lruvec->lru_lock);
> >>>> +               goto retry;
> >
> > This is rather confusing. We're still jumping to retry even though
> > skip_retry=true is set. Can we find a clearer approach for this?
> >
> > It was somewhat acceptable before we introduced the extracted
> > function find_folios_written_back(). However, it has become
> > harder to follow now that skip_retry is passed across functions.
> >
> > I find renaming skip_retry to is_retry more intuitive. The logic
> > is that since we are already retrying, find_folios_written_back()
> > shouldn’t move folios to the clean list again. The intended semantics
> > are: we have retris, don’t retry again.
> >
>
> Reasonable. Will update.
>
> Thanks,
> Ridong
>
> >
> >>>> +       }
> >>>>
> >>>>         __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
> >>>>                                         stat.nr_demoted);
> >>>> @@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
> >>>>         int reclaimed;
> >>>>         LIST_HEAD(list);
> >>>>         LIST_HEAD(clean);
> >>>> -       struct folio *folio;
> >>>> -       struct folio *next;
> >>>>         enum vm_event_item item;
> >>>>         struct reclaim_stat stat;
> >>>>         struct lru_gen_mm_walk *walk;
> >>>> @@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
> >>>>                         scanned, reclaimed, &stat, sc->priority,
> >>>>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> >>>>
> >>>> -       list_for_each_entry_safe_reverse(folio, next, &list, lru) {
> >>>> -               if (!folio_evictable(folio)) {
> >>>> -                       list_del(&folio->lru);
> >>>> -                       folio_putback_lru(folio);
> >>>> -                       continue;
> >>>> -               }
> >>>> -
> >>>> -               if (folio_test_reclaim(folio) &&
> >>>> -                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
> >>>> -                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
> >>>> -                       if (folio_test_workingset(folio))
> >>>> -                               folio_set_referenced(folio);
> >>>> -                       continue;
> >>>> -               }
> >>>> -
> >>>> -               if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
> >>>> -                   folio_mapped(folio) || folio_test_locked(folio) ||
> >>>> -                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
> >>>> -                       /* don't add rejected folios to the oldest generation */
> >>>> -                       set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
> >>>> -                                     BIT(PG_active));
> >>>> -                       continue;
> >>>> -               }
> >>>> -
> >>>> -               /* retry folios that may have missed folio_rotate_reclaimable() */
> >>>> -               list_move(&folio->lru, &clean);
> >>>> -       }
> >>>> -
> >>>> +       find_folios_written_back(&list, &clean, skip_retry);
> >>>>         spin_lock_irq(&lruvec->lru_lock);
> >>>>
> >>>>         move_folios_to_lru(lruvec, &list);
> >>>> --
> >>>> 2.34.1
> >>>>
> >>>
> >
> > Thanks
> >  Barry


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] mm: vmascan: retry folios written back while isolated for traditional LRU
  2024-12-12 23:17           ` Barry Song
@ 2024-12-13  0:54             ` chenridong
  0 siblings, 0 replies; 10+ messages in thread
From: chenridong @ 2024-12-13  0:54 UTC (permalink / raw)
  To: Barry Song
  Cc: Chen Ridong, akpm, mhocko, hannes, yosryahmed, yuzhao, david,
	willy, ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict



On 2024/12/13 7:17, Barry Song wrote:
> On Wed, Dec 11, 2024 at 1:11 AM chenridong <chenridong@huawei.com> wrote:
>>
>>
>>
>> On 2024/12/10 16:24, Barry Song wrote:
>>> On Tue, Dec 10, 2024 at 2:41 PM chenridong <chenridong@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/12/10 12:54, Barry Song wrote:
>>>>> On Mon, Dec 9, 2024 at 4:46 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>>>
>>>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>>>
>>>>>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
>>>>>> while isolated") only fixed the issue for mglru. However, this issue
>>>>>> also exists in the traditional active/inactive LRU. This issue will be
>>>>>> worse if THP is split, which makes the list longer and needs longer time
>>>>>> to finish a batch of folios reclaim.
>>>>>>
>>>>>> This issue should be fixed in the same way for the traditional LRU.
>>>>>> Therefore, the common logic was extracted to the 'find_folios_written_back'
>>>>>> function firstly, which is then reused in the 'shrink_inactive_list'
>>>>>> function. Finally, retry reclaiming those folios that may have missed the
>>>>>> rotation for traditional LRU.
>>>>>
>>>>> let's drop the cover-letter and refine the changelog.
>>>>>
>>>> Will update.
>>>>
>>>>>>
>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>> ---
>>>>>>  include/linux/mmzone.h |   3 +-
>>>>>>  mm/vmscan.c            | 108 +++++++++++++++++++++++++++++------------
>>>>>>  2 files changed, 77 insertions(+), 34 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>>>> index b36124145a16..47c6e8c43dcd 100644
>>>>>> --- a/include/linux/mmzone.h
>>>>>> +++ b/include/linux/mmzone.h
>>>>>> @@ -391,6 +391,7 @@ struct page_vma_mapped_walk;
>>>>>>
>>>>>>  #define LRU_GEN_MASK           ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
>>>>>>  #define LRU_REFS_MASK          ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
>>>>>> +#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>>>>>>
>>>>>>  #ifdef CONFIG_LRU_GEN
>>>>>>
>>>>>> @@ -406,8 +407,6 @@ enum {
>>>>>>         NR_LRU_GEN_CAPS
>>>>>>  };
>>>>>>
>>>>>> -#define LRU_REFS_FLAGS         (BIT(PG_referenced) | BIT(PG_workingset))
>>>>>> -
>>>>>>  #define MIN_LRU_BATCH          BITS_PER_LONG
>>>>>>  #define MAX_LRU_BATCH          (MIN_LRU_BATCH * 64)
>>>>>>
>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>> index 76378bc257e3..1f0d194f8b2f 100644
>>>>>> --- a/mm/vmscan.c
>>>>>> +++ b/mm/vmscan.c
>>>>>> @@ -283,6 +283,48 @@ static void set_task_reclaim_state(struct task_struct *task,
>>>>>>         task->reclaim_state = rs;
>>>>>>  }
>>>>>>
>>>>>> +/**
>>>>>> + * find_folios_written_back - Find and move the written back folios to a new list.
>>>>>> + * @list: filios list
>>>>>> + * @clean: the written back folios list
>>>>>> + * @skip: whether skip to move the written back folios to clean list.
>>>>>> + */
>>>>>> +static inline void find_folios_written_back(struct list_head *list,
>>>>>> +               struct list_head *clean, bool skip)
>>>>>> +{
>>>>>> +       struct folio *folio;
>>>>>> +       struct folio *next;
>>>>>> +
>>>>>> +       list_for_each_entry_safe_reverse(folio, next, list, lru) {
>>>>>> +               if (!folio_evictable(folio)) {
>>>>>> +                       list_del(&folio->lru);
>>>>>> +                       folio_putback_lru(folio);
>>>>>> +                       continue;
>>>>>> +               }
>>>>>> +
>>>>>> +               if (folio_test_reclaim(folio) &&
>>>>>> +                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
>>>>>> +                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
>>>>>> +                       if (lru_gen_enabled() && folio_test_workingset(folio))
>>>>>> +                               folio_set_referenced(folio);
>>>>>> +                       continue;
>>>>>> +               }
>>>>>> +
>>>>>> +               if (skip || folio_test_active(folio) || folio_test_referenced(folio) ||
>>>>>> +                   folio_mapped(folio) || folio_test_locked(folio) ||
>>>>>> +                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
>>>>>> +                       /* don't add rejected folios to the oldest generation */
>>>>>> +                       if (lru_gen_enabled())
>>>>>> +                               set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
>>>>>> +                                             BIT(PG_active));
>>>>>> +                       continue;
>>>>>> +               }
>>>>>> +
>>>>>> +               /* retry folios that may have missed folio_rotate_reclaimable() */
>>>>>> +               list_move(&folio->lru, clean);
>>>>>> +       }
>>>>>> +}
>>>>>> +
>>>>>>  /*
>>>>>>   * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
>>>>>>   * scan_control->nr_reclaimed.
>>>>>> @@ -1907,6 +1949,25 @@ static int current_may_throttle(void)
>>>>>>         return !(current->flags & PF_LOCAL_THROTTLE);
>>>>>>  }
>>>>>>
>>>>>> +static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
>>>>>> +               struct reclaim_stat *curr)
>>>>>> +{
>>>>>> +       int i;
>>>>>> +
>>>>>> +       stat->nr_dirty += curr->nr_dirty;
>>>>>> +       stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
>>>>>> +       stat->nr_congested += curr->nr_congested;
>>>>>> +       stat->nr_writeback += curr->nr_writeback;
>>>>>> +       stat->nr_immediate += curr->nr_immediate;
>>>>>> +       stat->nr_pageout += curr->nr_pageout;
>>>>>> +       stat->nr_ref_keep += curr->nr_ref_keep;
>>>>>> +       stat->nr_unmap_fail += curr->nr_unmap_fail;
>>>>>> +       stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
>>>>>> +       stat->nr_demoted += curr->nr_demoted;
>>>>>> +       for (i = 0; i < ANON_AND_FILE; i++)
>>>>>> +               stat->nr_activate[i] = curr->nr_activate[i];
>>>>>> +}
>>>>>
>>>>> you had no this before, what's the purpose of this?
>>>>>
>>>>
>>>> We may call shrink_folio_list twice, and the 'stat curr' will reset in
>>>> the shrink_folio_list function. We should accumulate the stats as a
>>>> whole, which will then be used to calculate the cost and return it to
>>>> the caller.
>>>
>>> Does mglru have the same issue? If so, we may need to send a patch to
>>> fix mglru's stat accounting as well. By the way, the code is rather
>>> messy—could it be implemented as shown below instead?
>>>
>>
>> I have checked the code (in the evict_folios function) again, and it
>> appears that 'reclaimed' should correspond to sc->nr_reclaimed, which
>> accumulates the results twice. Should I address this issue with a
>> separate patch?
> 
> I don't think there is any problem.
> 
>         reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
>         sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
>         sc->nr_reclaimed += reclaimed;
> 
> reclaimed is always the number of pages we have reclaimed in
> shrink_folio_list() no matter if it is retry or not.
> 

Thank you, I made a mistake.
>>
>> if (!cgroup_reclaim(sc))
>>         __count_vm_events(item, reclaimed);
>> __count_memcg_events(memcg, item, reclaimed);
>> __count_vm_events(PGSTEAL_ANON + type, reclaimed);
>>
>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 1f0d194f8b2f..40d2ddde21f5 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1094,7 +1094,6 @@ static unsigned int shrink_folio_list(struct
>>> list_head *folio_list,
>>>   struct swap_iocb *plug = NULL;
>>>
>>>   folio_batch_init(&free_folios);
>>> - memset(stat, 0, sizeof(*stat));
>>>   cond_resched();
>>>   do_demote_pass = can_demote(pgdat->node_id, sc);
>>>
>>> @@ -1949,25 +1948,6 @@ static int current_may_throttle(void)
>>>   return !(current->flags & PF_LOCAL_THROTTLE);
>>>  }
>>>
>>> -static inline void acc_reclaimed_stat(struct reclaim_stat *stat,
>>> - struct reclaim_stat *curr)
>>> -{
>>> - int i;
>>> -
>>> - stat->nr_dirty += curr->nr_dirty;
>>> - stat->nr_unqueued_dirty += curr->nr_unqueued_dirty;
>>> - stat->nr_congested += curr->nr_congested;
>>> - stat->nr_writeback += curr->nr_writeback;
>>> - stat->nr_immediate += curr->nr_immediate;
>>> - stat->nr_pageout += curr->nr_pageout;
>>> - stat->nr_ref_keep += curr->nr_ref_keep;
>>> - stat->nr_unmap_fail += curr->nr_unmap_fail;
>>> - stat->nr_lazyfree_fail += curr->nr_lazyfree_fail;
>>> - stat->nr_demoted += curr->nr_demoted;
>>> - for (i = 0; i < ANON_AND_FILE; i++)
>>> - stat->nr_activate[i] = curr->nr_activate[i];
>>> -}
>>> -
>>>  /*
>>>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>>>   * of reclaimed pages
>>> @@ -1981,7 +1961,7 @@ static unsigned long
>>> shrink_inactive_list(unsigned long nr_to_scan,
>>>   unsigned long nr_scanned;
>>>   unsigned int nr_reclaimed = 0;
>>>   unsigned long nr_taken;
>>> - struct reclaim_stat stat, curr;
>>> + struct reclaim_stat stat;
>>>   bool file = is_file_lru(lru);
>>>   enum vm_event_item item;
>>>   struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>>> @@ -2022,9 +2002,8 @@ static unsigned long
>>> shrink_inactive_list(unsigned long nr_to_scan,
>>>
>>>   memset(&stat, 0, sizeof(stat));
>>>  retry:
>>> - nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
>>> + nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
>>>   find_folios_written_back(&folio_list, &clean_list, skip_retry);
>>> - acc_reclaimed_stat(&stat, &curr);
>>>
>>>   spin_lock_irq(&lruvec->lru_lock);
>>>   move_folios_to_lru(lruvec, &folio_list);
>>>
>>
>> This seems much better. But we have extras works to do:
>>
>> 1. In the shrink_folio_list function:
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1089,12 +1089,12 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>>         LIST_HEAD(ret_folios);
>>         LIST_HEAD(demote_folios);
>>         unsigned int nr_reclaimed = 0;
>> -       unsigned int pgactivate = 0;
>> +       unsigned int pgactivate = stat->nr_activate[0] +
>> stat->nr_activate[1];
>> +       unsigned int nr_demote = 0;
>>         bool do_demote_pass;
>>         struct swap_iocb *plug = NULL;
>>
>>         folio_batch_init(&free_folios);
>> -       memset(stat, 0, sizeof(*stat));
>>         cond_resched();
>>         do_demote_pass = can_demote(pgdat->node_id, sc);
>>
>> @@ -1558,7 +1558,8 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>>
>>         /* Migrate folios selected for demotion */
>>         stat->nr_demoted = demote_folio_list(&demote_folios, pgdat);
>> -       nr_reclaimed += stat->nr_demoted;
>> +       stat->nr_demoted += nr_demote;
>> +       nr_reclaimed += nr_demote;
>>         /* Folios that could not be demoted are still in @demote_folios */
>>         if (!list_empty(&demote_folios)) {
>>                 /* Folios which weren't demoted go back on @folio_list */
>> @@ -1586,7 +1587,7 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>>                 }
>>         }
>>
>> -       pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
>> +       pgactivate = stat->nr_activate[0] + stat->nr_activate[1] -
>> pgactivate;
>>
>>         mem_cgroup_uncharge_folios(&free_folios);
>>         try_to_unmap_flush();
>>
>>
>> 2. Outsize of the shrink_folio_list function, The callers should memset
>> the stat.
>>
>> If you think this will be better, I will update like this.
> 
> no. Please goto retry after you have collected all you need from
> `stat` just as mglru
> is doing, drop the "curr" and acc_reclaimed_stat().
> 
>         sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
>         sc->nr_reclaimed += reclaimed;
> 
> move_folios_to_lru() has helped moving all uninterested folios back to lruvec
> before you retry. There is no duplicated counting.
> 

See, thank you.

Thanks,
Ridong

>>
>>>>
>>>> Thanks,
>>>> Ridong
>>>>
>>>>>> +
>>>>>>  /*
>>>>>>   * shrink_inactive_list() is a helper for shrink_node().  It returns the number
>>>>>>   * of reclaimed pages
>>>>>> @@ -1916,14 +1977,16 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>>>>>>                 enum lru_list lru)
>>>>>>  {
>>>>>>         LIST_HEAD(folio_list);
>>>>>> +       LIST_HEAD(clean_list);
>>>>>>         unsigned long nr_scanned;
>>>>>>         unsigned int nr_reclaimed = 0;
>>>>>>         unsigned long nr_taken;
>>>>>> -       struct reclaim_stat stat;
>>>>>> +       struct reclaim_stat stat, curr;
>>>>>>         bool file = is_file_lru(lru);
>>>>>>         enum vm_event_item item;
>>>>>>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>>>>>>         bool stalled = false;
>>>>>> +       bool skip_retry = false;
>>>>>>
>>>>>>         while (unlikely(too_many_isolated(pgdat, file, sc))) {
>>>>>>                 if (stalled)
>>>>>> @@ -1957,10 +2020,20 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>>>>>>         if (nr_taken == 0)
>>>>>>                 return 0;
>>>>>>
>>>>>> -       nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
>>>>>> +       memset(&stat, 0, sizeof(stat));
>>>>>> +retry:
>>>>>> +       nr_reclaimed += shrink_folio_list(&folio_list, pgdat, sc, &curr, false);
>>>>>> +       find_folios_written_back(&folio_list, &clean_list, skip_retry);
>>>>>> +       acc_reclaimed_stat(&stat, &curr);
>>>>>>
>>>>>>         spin_lock_irq(&lruvec->lru_lock);
>>>>>>         move_folios_to_lru(lruvec, &folio_list);
>>>>>> +       if (!list_empty(&clean_list)) {
>>>>>> +               list_splice_init(&clean_list, &folio_list);
>>>>>> +               skip_retry = true;
>>>>>> +               spin_unlock_irq(&lruvec->lru_lock);
>>>>>> +               goto retry;
>>>
>>> This is rather confusing. We're still jumping to retry even though
>>> skip_retry=true is set. Can we find a clearer approach for this?
>>>
>>> It was somewhat acceptable before we introduced the extracted
>>> function find_folios_written_back(). However, it has become
>>> harder to follow now that skip_retry is passed across functions.
>>>
>>> I find renaming skip_retry to is_retry more intuitive. The logic
>>> is that since we are already retrying, find_folios_written_back()
>>> shouldn’t move folios to the clean list again. The intended semantics
>>> are: we have retris, don’t retry again.
>>>
>>
>> Reasonable. Will update.
>>
>> Thanks,
>> Ridong
>>
>>>
>>>>>> +       }
>>>>>>
>>>>>>         __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
>>>>>>                                         stat.nr_demoted);
>>>>>> @@ -4567,8 +4640,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>>>>>         int reclaimed;
>>>>>>         LIST_HEAD(list);
>>>>>>         LIST_HEAD(clean);
>>>>>> -       struct folio *folio;
>>>>>> -       struct folio *next;
>>>>>>         enum vm_event_item item;
>>>>>>         struct reclaim_stat stat;
>>>>>>         struct lru_gen_mm_walk *walk;
>>>>>> @@ -4597,34 +4668,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>>>>>>                         scanned, reclaimed, &stat, sc->priority,
>>>>>>                         type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>>>>>>
>>>>>> -       list_for_each_entry_safe_reverse(folio, next, &list, lru) {
>>>>>> -               if (!folio_evictable(folio)) {
>>>>>> -                       list_del(&folio->lru);
>>>>>> -                       folio_putback_lru(folio);
>>>>>> -                       continue;
>>>>>> -               }
>>>>>> -
>>>>>> -               if (folio_test_reclaim(folio) &&
>>>>>> -                   (folio_test_dirty(folio) || folio_test_writeback(folio))) {
>>>>>> -                       /* restore LRU_REFS_FLAGS cleared by isolate_folio() */
>>>>>> -                       if (folio_test_workingset(folio))
>>>>>> -                               folio_set_referenced(folio);
>>>>>> -                       continue;
>>>>>> -               }
>>>>>> -
>>>>>> -               if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) ||
>>>>>> -                   folio_mapped(folio) || folio_test_locked(folio) ||
>>>>>> -                   folio_test_dirty(folio) || folio_test_writeback(folio)) {
>>>>>> -                       /* don't add rejected folios to the oldest generation */
>>>>>> -                       set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS,
>>>>>> -                                     BIT(PG_active));
>>>>>> -                       continue;
>>>>>> -               }
>>>>>> -
>>>>>> -               /* retry folios that may have missed folio_rotate_reclaimable() */
>>>>>> -               list_move(&folio->lru, &clean);
>>>>>> -       }
>>>>>> -
>>>>>> +       find_folios_written_back(&list, &clean, skip_retry);
>>>>>>         spin_lock_irq(&lruvec->lru_lock);
>>>>>>
>>>>>>         move_folios_to_lru(lruvec, &list);
>>>>>> --
>>>>>> 2.34.1
>>>>>>
>>>>>
>>>
>>> Thanks
>>>  Barry


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-12-13  0:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-09  8:36 [PATCH v4 0/1] mm: vmascan: retry folios written back while isolated for traditional LRU Chen Ridong
2024-12-09  8:36 ` [PATCH v4 1/1] " Chen Ridong
2024-12-10  4:54   ` Barry Song
2024-12-10  6:41     ` chenridong
2024-12-10  8:24       ` Barry Song
2024-12-10 12:11         ` chenridong
2024-12-12 23:17           ` Barry Song
2024-12-13  0:54             ` chenridong
2024-12-10  2:13 ` [PATCH v4 0/1] " Andrew Morton
2024-12-10  6:42   ` chenridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox