[PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU
@ 2024-12-23  8:20 Chen Ridong
  2024-12-24  4:19 ` Yu Zhao
  0 siblings, 1 reply; 6+ messages in thread
From: Chen Ridong @ 2024-12-23  8:20 UTC (permalink / raw)
  To: akpm, mhocko, hannes, yosryahmed, yuzhao, david, willy,
	ryan.roberts, baohua, 21cnbao, wangkefeng.wang
  Cc: linux-mm, linux-kernel, chenridong, wangweiyang2, xieym_ict

From: Chen Ridong <chenridong@huawei.com>

The page reclaim isolates a batch of folios from the tail of one of the
LRU lists and works on those folios one by one.  For a suitable
swap-backed folio, if the swap device is async, it queues that folio for
writeback.  After the page reclaim finishes an entire batch, it puts back
the folios it queued for writeback to the head of the original LRU list.

In the meantime, the page writeback flushes the queued folios also by
batches.  Its batching logic is independent from that of the page reclaim.
For each of the folios it writes back, the page writeback calls
folio_rotate_reclaimable() which tries to rotate a folio to the tail.

folio_rotate_reclaimable() only works for a folio after the page reclaim
has put it back.  If an async swap device is fast enough, the page
writeback can finish with that folio while the page reclaim is still
working on the rest of the batch containing it.  In this case, that folio
will remain at the head and the page reclaim will not retry it before
reaching there.

The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
while isolated") only fixed the issue for mglru. However, this issue
also exists in the traditional active/inactive LRU. This issue will be
worse if THP is split, which makes the list longer and needs longer time
to finish a batch of folios reclaim.

This issue should be fixed in the same way for the traditional LRU.
Therefore, the common logic was extracted to the 'find_folios_written_back'
function firstly, which is then reused in the 'shrink_inactive_list'
function. Finally, retry reclaiming those folios that may have missed the
rotation for traditional LRU.

Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Barry Song <baohua@kernel.org>
---

v5->v6:
 - fix compile error(implicit declaration of function 'lru_gen_distance')
   when CONFIG_LRU_GEN is disable.
 - rename 'is_retried' to is_retrying suggested by Barry Song.

v5: https://lore.kernel.org/linux-kernel/CAGsJ_4x3Aj7wieK1FQKQC4Vbz5N+1dExs=Q70KQt-whS1dMxpw@mail.gmail.com/

 include/linux/mm_inline.h |   5 ++
 mm/vmscan.c               | 108 ++++++++++++++++++++++++--------------
 2 files changed, 75 insertions(+), 38 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 3fcf5fa797fe..07b2fda6fafa 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -342,6 +342,11 @@ static inline void folio_migrate_refs(struct folio *new, struct folio *old)
 {
 
 }
+
+static inline int lru_gen_distance(struct folio *folio, bool reclaiming)
+{
+	return -1;
+}
 #endif /* CONFIG_LRU_GEN */
 
 static __always_inline
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 39886f435ec5..701716306f8b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -283,6 +283,39 @@ static void set_task_reclaim_state(struct task_struct *task,
 	task->reclaim_state = rs;
 }
 
+/**
+ * find_folios_written_back - Find and move the written back folios to a new list.
+ * @list: filios list
+ * @clean: the written back folios list
+ * @is_retrying: whether the list is retrying.
+ */
+static inline void find_folios_written_back(struct list_head *list,
+		struct list_head *clean, bool is_retrying)
+{
+	struct folio *folio;
+	struct folio *next;
+
+	list_for_each_entry_safe_reverse(folio, next, list, lru) {
+		if (!folio_evictable(folio)) {
+			list_del(&folio->lru);
+			folio_putback_lru(folio);
+			continue;
+		}
+
+		/* retry folios that may have missed folio_rotate_reclaimable() */
+		if (!is_retrying && !folio_test_active(folio) && !folio_mapped(folio) &&
+		    !folio_test_dirty(folio) && !folio_test_writeback(folio)) {
+			list_move(&folio->lru, clean);
+			continue;
+		}
+
+		/* don't add rejected folios to the oldest generation */
+		if (lru_gen_enabled() && !lru_gen_distance(folio, false))
+			set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active));
+	}
+
+}
+
 /*
  * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
  * scan_control->nr_reclaimed.
@@ -1959,14 +1992,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 		enum lru_list lru)
 {
 	LIST_HEAD(folio_list);
+	LIST_HEAD(clean_list);
 	unsigned long nr_scanned;
-	unsigned int nr_reclaimed = 0;
+	unsigned int nr_reclaimed, total_reclaimed = 0;
+	unsigned int nr_pageout = 0;
+	unsigned int nr_unqueued_dirty = 0;
 	unsigned long nr_taken;
 	struct reclaim_stat stat;
 	bool file = is_file_lru(lru);
 	enum vm_event_item item;
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	bool stalled = false;
+	bool is_retrying = false;
 
 	while (unlikely(too_many_isolated(pgdat, file, sc))) {
 		if (stalled)
@@ -2000,22 +2037,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	if (nr_taken == 0)
 		return 0;
 
+retry:
 	nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false);
 
+	sc->nr.dirty += stat.nr_dirty;
+	sc->nr.congested += stat.nr_congested;
+	sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
+	sc->nr.writeback += stat.nr_writeback;
+	sc->nr.immediate += stat.nr_immediate;
+	total_reclaimed += nr_reclaimed;
+	nr_pageout += stat.nr_pageout;
+	nr_unqueued_dirty += stat.nr_unqueued_dirty;
+
+	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
+			nr_scanned, nr_reclaimed, &stat, sc->priority, file);
+
+	find_folios_written_back(&folio_list, &clean_list, is_retrying);
+
 	spin_lock_irq(&lruvec->lru_lock);
 	move_folios_to_lru(lruvec, &folio_list);
 
 	__mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(),
 					stat.nr_demoted);
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 	item = PGSTEAL_KSWAPD + reclaimer_offset();
 	if (!cgroup_reclaim(sc))
 		__count_vm_events(item, nr_reclaimed);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
 	__count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
+
+	if (!list_empty(&clean_list)) {
+		list_splice_init(&clean_list, &folio_list);
+		is_retrying = true;
+		spin_unlock_irq(&lruvec->lru_lock);
+		goto retry;
+	}
+	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 	spin_unlock_irq(&lruvec->lru_lock);
+	sc->nr.taken += nr_taken;
+	if (file)
+		sc->nr.file_taken += nr_taken;
 
-	lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed);
+	lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed);
 
 	/*
 	 * If dirty folios are scanned that are not queued for IO, it
@@ -2028,7 +2090,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	 * the flushers simply cannot keep up with the allocation
 	 * rate. Nudge the flusher threads in case they are asleep.
 	 */
-	if (stat.nr_unqueued_dirty == nr_taken) {
+	if (nr_unqueued_dirty == nr_taken) {
 		wakeup_flusher_threads(WB_REASON_VMSCAN);
 		/*
 		 * For cgroupv1 dirty throttling is achieved by waking up
@@ -2043,18 +2105,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
 	}
 
-	sc->nr.dirty += stat.nr_dirty;
-	sc->nr.congested += stat.nr_congested;
-	sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
-	sc->nr.writeback += stat.nr_writeback;
-	sc->nr.immediate += stat.nr_immediate;
-	sc->nr.taken += nr_taken;
-	if (file)
-		sc->nr.file_taken += nr_taken;
-
-	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
-			nr_scanned, nr_reclaimed, &stat, sc->priority, file);
-	return nr_reclaimed;
+	return total_reclaimed;
 }
 
 /*
@@ -4585,12 +4636,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	int reclaimed;
 	LIST_HEAD(list);
 	LIST_HEAD(clean);
-	struct folio *folio;
-	struct folio *next;
 	enum vm_event_item item;
 	struct reclaim_stat stat;
 	struct lru_gen_mm_walk *walk;
-	bool skip_retry = false;
+	bool is_retrying = false;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -4616,24 +4665,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 			scanned, reclaimed, &stat, sc->priority,
 			type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
 
-	list_for_each_entry_safe_reverse(folio, next, &list, lru) {
-		if (!folio_evictable(folio)) {
-			list_del(&folio->lru);
-			folio_putback_lru(folio);
-			continue;
-		}
-
-		/* retry folios that may have missed folio_rotate_reclaimable() */
-		if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) &&
-		    !folio_test_dirty(folio) && !folio_test_writeback(folio)) {
-			list_move(&folio->lru, &clean);
-			continue;
-		}
-
-		/* don't add rejected folios to the oldest generation */
-		if (!lru_gen_distance(folio, false))
-			set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active));
-	}
+	find_folios_written_back(&list, &clean, is_retrying);
 
 	spin_lock_irq(&lruvec->lru_lock);
 
@@ -4656,7 +4688,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	list_splice_init(&clean, &list);
 
 	if (!list_empty(&list)) {
-		skip_retry = true;
+		is_retrying = true;
 		goto retry;
 	}
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU
  2024-12-23  8:20 [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU Chen Ridong
@ 2024-12-24  4:19 ` Yu Zhao
  2024-12-24  6:45   ` Chen Ridong
  0 siblings, 1 reply; 6+ messages in thread
From: Yu Zhao @ 2024-12-24  4:19 UTC (permalink / raw)
  To: Chen Ridong
  Cc: akpm, mhocko, hannes, yosryahmed, david, willy, ryan.roberts,
	baohua, 21cnbao, wangkefeng.wang, linux-mm, linux-kernel,
	chenridong, wangweiyang2, xieym_ict

On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
> From: Chen Ridong <chenridong@huawei.com>
>
> The page reclaim isolates a batch of folios from the tail of one of the
> LRU lists and works on those folios one by one.  For a suitable
> swap-backed folio, if the swap device is async, it queues that folio for
> writeback.  After the page reclaim finishes an entire batch, it puts back
> the folios it queued for writeback to the head of the original LRU list.
>
> In the meantime, the page writeback flushes the queued folios also by
> batches.  Its batching logic is independent from that of the page reclaim.
> For each of the folios it writes back, the page writeback calls
> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
>
> folio_rotate_reclaimable() only works for a folio after the page reclaim
> has put it back.  If an async swap device is fast enough, the page
> writeback can finish with that folio while the page reclaim is still
> working on the rest of the batch containing it.  In this case, that folio
> will remain at the head and the page reclaim will not retry it before
> reaching there.

For starters, copying & pasting others' commit messages as your own is
plagiarism. You need to quote them.

> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> while isolated") only fixed the issue for mglru. However, this issue
> also exists in the traditional active/inactive LRU.

You need to prove it with some numbers.

> This issue will be
> worse if THP is split, which makes the list longer and needs longer time
> to finish a batch of folios reclaim.
>
> This issue should be fixed in the same way for the traditional LRU.
> Therefore, the common logic was extracted to the 'find_folios_written_back'
> function firstly, which is then reused in the 'shrink_inactive_list'
> function. Finally, retry reclaiming those folios that may have missed the
> rotation for traditional LRU.
>
> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> Reviewed-by: Barry Song <baohua@kernel.org>
> ---
>
> v5->v6:
>  - fix compile error(implicit declaration of function 'lru_gen_distance')
>    when CONFIG_LRU_GEN is disable.

Did you build-test it this time? I don't think LRU_REFS_FLAGS is
defined when CONFIG_LRU_GEN=y.



>  - rename 'is_retried' to is_retrying suggested by Barry Song.
>
> v5: https://lore.kernel.org/linux-kernel/CAGsJ_4x3Aj7wieK1FQKQC4Vbz5N+1dExs=Q70KQt-whS1dMxpw@mail.gmail.com/
>
>  include/linux/mm_inline.h |   5 ++
>  mm/vmscan.c               | 108 ++++++++++++++++++++++++--------------
>  2 files changed, 75 insertions(+), 38 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU
  2024-12-24  4:19 ` Yu Zhao
@ 2024-12-24  6:45   ` Chen Ridong
  2024-12-24 10:01     ` Barry Song
  2024-12-24 19:28     ` Yu Zhao
  0 siblings, 2 replies; 6+ messages in thread
From: Chen Ridong @ 2024-12-24  6:45 UTC (permalink / raw)
  To: Yu Zhao, Chen Ridong
  Cc: akpm, mhocko, hannes, yosryahmed, david, willy, ryan.roberts,
	baohua, 21cnbao, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict



On 2024/12/24 12:19, Yu Zhao wrote:
> On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> The page reclaim isolates a batch of folios from the tail of one of the
>> LRU lists and works on those folios one by one.  For a suitable
>> swap-backed folio, if the swap device is async, it queues that folio for
>> writeback.  After the page reclaim finishes an entire batch, it puts back
>> the folios it queued for writeback to the head of the original LRU list.
>>
>> In the meantime, the page writeback flushes the queued folios also by
>> batches.  Its batching logic is independent from that of the page reclaim.
>> For each of the folios it writes back, the page writeback calls
>> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
>>
>> folio_rotate_reclaimable() only works for a folio after the page reclaim
>> has put it back.  If an async swap device is fast enough, the page
>> writeback can finish with that folio while the page reclaim is still
>> working on the rest of the batch containing it.  In this case, that folio
>> will remain at the head and the page reclaim will not retry it before
>> reaching there.
> 
> For starters, copying & pasting others' commit messages as your own is
> plagiarism. You need to quote them.

Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am
a beginner, and I do not know much about that.

I wrote the message in my v1 and v2 to describe what issue I was fixing,
which is wordy. What you wrote is much clearer in the commit
359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to
modify the message?

I have sent all patch versions to you, and I don't know whether you have
noticed. How I wish you could point it out at the first I pasted it, so
I wouldn't have made this mistake again and again.

>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
>> while isolated") only fixed the issue for mglru. However, this issue
>> also exists in the traditional active/inactive LRU.
> 
> You need to prove it with some numbers.

Do you mean I should prove it with some info in my message? Actually, I
encountered this in the traditional active/inactive LRU, I did not know
you had fixed for mglru until Barry told me. I offered how to reproduce
this with Link.

>> This issue will be
>> worse if THP is split, which makes the list longer and needs longer time
>> to finish a batch of folios reclaim.
>>
>> This issue should be fixed in the same way for the traditional LRU.
>> Therefore, the common logic was extracted to the 'find_folios_written_back'
>> function firstly, which is then reused in the 'shrink_inactive_list'
>> function. Finally, retry reclaiming those folios that may have missed the
>> rotation for traditional LRU.
>>
>> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
>> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> Reviewed-by: Barry Song <baohua@kernel.org>
>> ---
>>
>> v5->v6:
>>  - fix compile error(implicit declaration of function 'lru_gen_distance')
>>    when CONFIG_LRU_GEN is disable.
> 
> Did you build-test it this time? I don't think LRU_REFS_FLAGS is
> defined when CONFIG_LRU_GEN=y.
> 

Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5.
I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am
using the next.

Best regards,
Ridong

> 
> 
>>  - rename 'is_retried' to is_retrying suggested by Barry Song.
>>
>> v5: https://lore.kernel.org/linux-kernel/CAGsJ_4x3Aj7wieK1FQKQC4Vbz5N+1dExs=Q70KQt-whS1dMxpw@mail.gmail.com/
>>
>>  include/linux/mm_inline.h |   5 ++
>>  mm/vmscan.c               | 108 ++++++++++++++++++++++++--------------
>>  2 files changed, 75 insertions(+), 38 deletions(-)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU
  2024-12-24  6:45   ` Chen Ridong
@ 2024-12-24 10:01     ` Barry Song
  2024-12-24 19:28     ` Yu Zhao
  1 sibling, 0 replies; 6+ messages in thread
From: Barry Song @ 2024-12-24 10:01 UTC (permalink / raw)
  To: Chen Ridong
  Cc: Yu Zhao, akpm, mhocko, hannes, yosryahmed, david, willy,
	ryan.roberts, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict

On Tue, Dec 24, 2024 at 7:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
>
>
> On 2024/12/24 12:19, Yu Zhao wrote:
> > On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>
> >> From: Chen Ridong <chenridong@huawei.com>
> >>
> >> The page reclaim isolates a batch of folios from the tail of one of the
> >> LRU lists and works on those folios one by one.  For a suitable
> >> swap-backed folio, if the swap device is async, it queues that folio for
> >> writeback.  After the page reclaim finishes an entire batch, it puts back
> >> the folios it queued for writeback to the head of the original LRU list.
> >>
> >> In the meantime, the page writeback flushes the queued folios also by
> >> batches.  Its batching logic is independent from that of the page reclaim.
> >> For each of the folios it writes back, the page writeback calls
> >> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> >>
> >> folio_rotate_reclaimable() only works for a folio after the page reclaim
> >> has put it back.  If an async swap device is fast enough, the page
> >> writeback can finish with that folio while the page reclaim is still
> >> working on the rest of the batch containing it.  In this case, that folio
> >> will remain at the head and the page reclaim will not retry it before
> >> reaching there.
> >
> > For starters, copying & pasting others' commit messages as your own is
> > plagiarism. You need to quote them.
>
> Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am
> a beginner, and I do not know much about that.

I don't think Ridong's intention was plagiarism. It seems this was likely an
oversight in mentioning that these commit messages are quoted from Yu's
commit addressing the same issue in mglru. It was also my carelessness for
not reminding Ridong to include something like "quoted from Yu's commit..."
and adding quotation marks.

>
> I wrote the message in my v1 and v2 to describe what issue I was fixing,
> which is wordy. What you wrote is much clearer in the commit
> 359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to
> modify the message?
>
> I have sent all patch versions to you, and I don't know whether you have
> noticed. How I wish you could point it out at the first I pasted it, so
> I wouldn't have made this mistake again and again.
>
> >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> >> while isolated") only fixed the issue for mglru. However, this issue
> >> also exists in the traditional active/inactive LRU.
> >
> > You need to prove it with some numbers.
>
> Do you mean I should prove it with some info in my message? Actually, I
> encountered this in the traditional active/inactive LRU, I did not know
> you had fixed for mglru until Barry told me. I offered how to reproduce
> this with Link.
>
> >> This issue will be
> >> worse if THP is split, which makes the list longer and needs longer time
> >> to finish a batch of folios reclaim.
> >>
> >> This issue should be fixed in the same way for the traditional LRU.
> >> Therefore, the common logic was extracted to the 'find_folios_written_back'
> >> function firstly, which is then reused in the 'shrink_inactive_list'
> >> function. Finally, retry reclaiming those folios that may have missed the
> >> rotation for traditional LRU.
> >>
> >> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
> >> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
> >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >> Reviewed-by: Barry Song <baohua@kernel.org>
> >> ---
> >>
> >> v5->v6:
> >>  - fix compile error(implicit declaration of function 'lru_gen_distance')
> >>    when CONFIG_LRU_GEN is disable.
> >
> > Did you build-test it this time? I don't think LRU_REFS_FLAGS is
> > defined when CONFIG_LRU_GEN=y.
> >
>
> Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5.
> I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am
> using the next.
>
> Best regards,
> Ridong
>
> >
> >
> >>  - rename 'is_retried' to is_retrying suggested by Barry Song.
> >>
> >> v5: https://lore.kernel.org/linux-kernel/CAGsJ_4x3Aj7wieK1FQKQC4Vbz5N+1dExs=Q70KQt-whS1dMxpw@mail.gmail.com/
> >>
> >>  include/linux/mm_inline.h |   5 ++
> >>  mm/vmscan.c               | 108 ++++++++++++++++++++++++--------------
> >>  2 files changed, 75 insertions(+), 38 deletions(-)
>

Thanks
Barry


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU
  2024-12-24  6:45   ` Chen Ridong
  2024-12-24 10:01     ` Barry Song
@ 2024-12-24 19:28     ` Yu Zhao
  2024-12-24 19:33       ` Yu Zhao
  1 sibling, 1 reply; 6+ messages in thread
From: Yu Zhao @ 2024-12-24 19:28 UTC (permalink / raw)
  To: Chen Ridong
  Cc: akpm, mhocko, hannes, yosryahmed, david, willy, ryan.roberts,
	baohua, 21cnbao, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict

On Mon, Dec 23, 2024 at 11:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
>
>
> On 2024/12/24 12:19, Yu Zhao wrote:
> > On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>
> >> From: Chen Ridong <chenridong@huawei.com>
> >>
> >> The page reclaim isolates a batch of folios from the tail of one of the
> >> LRU lists and works on those folios one by one.  For a suitable
> >> swap-backed folio, if the swap device is async, it queues that folio for
> >> writeback.  After the page reclaim finishes an entire batch, it puts back
> >> the folios it queued for writeback to the head of the original LRU list.
> >>
> >> In the meantime, the page writeback flushes the queued folios also by
> >> batches.  Its batching logic is independent from that of the page reclaim.
> >> For each of the folios it writes back, the page writeback calls
> >> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> >>
> >> folio_rotate_reclaimable() only works for a folio after the page reclaim
> >> has put it back.  If an async swap device is fast enough, the page
> >> writeback can finish with that folio while the page reclaim is still
> >> working on the rest of the batch containing it.  In this case, that folio
> >> will remain at the head and the page reclaim will not retry it before
> >> reaching there.
> >
> > For starters, copying & pasting others' commit messages as your own is
> > plagiarism. You need to quote them.
>
> Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am
> a beginner, and I do not know much about that.

I didn't think you were intentional but please also understand this is
a very basic rule that applies everywhere, not just this case.

> I wrote the message in my v1 and v2 to describe what issue I was fixing,
> which is wordy. What you wrote is much clearer in the commit
> 359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to
> modify the message?
>
> I have sent all patch versions to you, and I don't know whether you have
> noticed. How I wish you could point it out at the first I pasted it, so
> I wouldn't have made this mistake again and again.

Here is an example of how I quoted from another commit message:
https://lore.kernel.org/20241107202033.2721681-5-yuzhao@google.com/

> >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> >> while isolated") only fixed the issue for mglru. However, this issue
> >> also exists in the traditional active/inactive LRU.
> >
> > You need to prove it with some numbers.
>
> Do you mean I should prove it with some info in my message? Actually, I
> encountered this in the traditional active/inactive LRU, I did not know
> you had fixed for mglru until Barry told me. I offered how to reproduce
> this with Link.

Yes, the first link does describe how to repro, but you also need to
show the results after your patch, i.e., the improvements from the
change being reviewed.

> >> This issue will be
> >> worse if THP is split, which makes the list longer and needs longer time
> >> to finish a batch of folios reclaim.
> >>
> >> This issue should be fixed in the same way for the traditional LRU.
> >> Therefore, the common logic was extracted to the 'find_folios_written_back'
> >> function firstly, which is then reused in the 'shrink_inactive_list'
> >> function. Finally, retry reclaiming those folios that may have missed the
> >> rotation for traditional LRU.
> >>
> >> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
> >> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
> >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >> Reviewed-by: Barry Song <baohua@kernel.org>
> >> ---
> >>
> >> v5->v6:
> >>  - fix compile error(implicit declaration of function 'lru_gen_distance')
> >>    when CONFIG_LRU_GEN is disable.
> >
> > Did you build-test it this time? I don't think LRU_REFS_FLAGS is
> > defined when CONFIG_LRU_GEN=y.
> >
>
> Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5.
> I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am
> using the next.

I see. I would base my patches on mm-unstable instead -next, i.e.,
git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU
  2024-12-24 19:28     ` Yu Zhao
@ 2024-12-24 19:33       ` Yu Zhao
  0 siblings, 0 replies; 6+ messages in thread
From: Yu Zhao @ 2024-12-24 19:33 UTC (permalink / raw)
  To: Chen Ridong
  Cc: akpm, mhocko, hannes, yosryahmed, david, willy, ryan.roberts,
	baohua, 21cnbao, wangkefeng.wang, linux-mm, linux-kernel,
	wangweiyang2, xieym_ict

On Tue, Dec 24, 2024 at 12:28 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Mon, Dec 23, 2024 at 11:45 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >
> >
> >
> > On 2024/12/24 12:19, Yu Zhao wrote:
> > > On Mon, Dec 23, 2024 at 1:30 AM Chen Ridong <chenridong@huaweicloud.com> wrote:
> > >>
> > >> From: Chen Ridong <chenridong@huawei.com>
> > >>
> > >> The page reclaim isolates a batch of folios from the tail of one of the
> > >> LRU lists and works on those folios one by one.  For a suitable
> > >> swap-backed folio, if the swap device is async, it queues that folio for
> > >> writeback.  After the page reclaim finishes an entire batch, it puts back
> > >> the folios it queued for writeback to the head of the original LRU list.
> > >>
> > >> In the meantime, the page writeback flushes the queued folios also by
> > >> batches.  Its batching logic is independent from that of the page reclaim.
> > >> For each of the folios it writes back, the page writeback calls
> > >> folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> > >>
> > >> folio_rotate_reclaimable() only works for a folio after the page reclaim
> > >> has put it back.  If an async swap device is fast enough, the page
> > >> writeback can finish with that folio while the page reclaim is still
> > >> working on the rest of the batch containing it.  In this case, that folio
> > >> will remain at the head and the page reclaim will not retry it before
> > >> reaching there.
> > >
> > > For starters, copying & pasting others' commit messages as your own is
> > > plagiarism. You need to quote them.
> >
> > Hi, Yu, Thank you for reminding me, I did not mean any plagiarism. I am
> > a beginner, and I do not know much about that.
>
> I didn't think you were intentional but please also understand this is
> a very basic rule that applies everywhere, not just this case.
>
> > I wrote the message in my v1 and v2 to describe what issue I was fixing,
> > which is wordy. What you wrote is much clearer in the commit
> > 359a5e1416ca, so I pasted it. I am sorry. Should resend a new patch to
> > modify the message?
> >
> > I have sent all patch versions to you, and I don't know whether you have
> > noticed. How I wish you could point it out at the first I pasted it, so
> > I wouldn't have made this mistake again and again.
>
> Here is an example of how I quoted from another commit message:
> https://lore.kernel.org/20241107202033.2721681-5-yuzhao@google.com/
>
> > >> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back
> > >> while isolated") only fixed the issue for mglru. However, this issue
> > >> also exists in the traditional active/inactive LRU.
> > >
> > > You need to prove it with some numbers.
> >
> > Do you mean I should prove it with some info in my message? Actually, I
> > encountered this in the traditional active/inactive LRU, I did not know
> > you had fixed for mglru until Barry told me. I offered how to reproduce
> > this with Link.
>
> Yes, the first link does describe how to repro, but you also need to
> show the results after your patch, i.e., the improvements from the
> change being reviewed.
>
> > >> This issue will be
> > >> worse if THP is split, which makes the list longer and needs longer time
> > >> to finish a batch of folios reclaim.
> > >>
> > >> This issue should be fixed in the same way for the traditional LRU.
> > >> Therefore, the common logic was extracted to the 'find_folios_written_back'
> > >> function firstly, which is then reused in the 'shrink_inactive_list'
> > >> function. Finally, retry reclaiming those folios that may have missed the
> > >> rotation for traditional LRU.
> > >>
> > >> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/
> > >> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/
> > >> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> > >> Reviewed-by: Barry Song <baohua@kernel.org>
> > >> ---
> > >>
> > >> v5->v6:
> > >>  - fix compile error(implicit declaration of function 'lru_gen_distance')
> > >>    when CONFIG_LRU_GEN is disable.
> > >
> > > Did you build-test it this time? I don't think LRU_REFS_FLAGS is
> > > defined when CONFIG_LRU_GEN=y.
> > >
> >
> > Yes, I tested. I didn't test when CONFIG_LRU_GEN=n in patch v5.
> > I tested with CONFIG_LRU_GEN=n and CONFIG_LRU_GEN=y in patch v6. I am
> > using the next.
>
> I see. I would base my patches on mm-unstable instead -next, i.e.,
> git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-unstable

> > >>  - rename 'is_retried' to is_retrying suggested by Barry Song.

Also there is a subtle difference here: `skip_retry` implies "skip
retry after it has retried N times", and currently N=1 for MGLRU. The
change you made implies "only retry once", i.e., you ruled out the
later possibility for N=0 or 2, etc.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-12-24 19:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-23  8:20 [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU Chen Ridong
2024-12-24  4:19 ` Yu Zhao
2024-12-24  6:45   ` Chen Ridong
2024-12-24 10:01     ` Barry Song
2024-12-24 19:28     ` Yu Zhao
2024-12-24 19:33       ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox