* [PATCH 0/2] some fix of multi-gen lru @ 2023-10-18 8:20 Huan Yang 2023-10-18 8:20 ` [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace Huan Yang 2023-10-18 8:21 ` [PATCH 2/2] mm: multi-gen lru: fix stat count Huan Yang 0 siblings, 2 replies; 10+ messages in thread From: Huan Yang @ 2023-10-18 8:20 UTC (permalink / raw) To: Yu Zhao, Steven Rostedt, Masami Hiramatsu, Andrew Morton, Huan Yang, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf Cc: opensource.kernel For multi-gen lru shrink_inactive trace, here are some mistake: First, nr_scanned in evict_folios means all folios which it touched in `get_nr_to_scan`(so not means nr_taken). So, we may get some info from trace like this: ``` kswapd0-89 [000] 64.887613: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=64 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC ``` Actually, we don't taken too much folios in shrink. This patch use difference between before sc->nr_scanned and after shrink to replace nr_taken and pass this to shrink_inactive tracing. Second, also like above info, we can't get nr_folios going to where, actually in multi-gen lru shrink, many keep by folio_update_gen when look around or other promote this folio. Third, evict_folios don't gather reclaim_stat info after shrink list, so, dirty\writeback\congested stat will miss, and we can't control LRUVEC_CONGESTED or reclaim_throttle. Patch 1 aim to resolve first and second, patch 2 resolve third. Huan Yang (2): tracing: mm: multigen-lru: fix mglru trace mm: multi-gen lru: fix stat count include/linux/vmstat.h | 2 ++ include/trace/events/vmscan.h | 8 ++++- mm/vmscan.c | 61 ++++++++++++++++++++++++++++++++--- 3 files changed, 65 insertions(+), 6 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace 2023-10-18 8:20 [PATCH 0/2] some fix of multi-gen lru Huan Yang @ 2023-10-18 8:20 ` Huan Yang 2023-10-18 16:28 ` Yu Zhao 2023-10-18 8:21 ` [PATCH 2/2] mm: multi-gen lru: fix stat count Huan Yang 1 sibling, 1 reply; 10+ messages in thread From: Huan Yang @ 2023-10-18 8:20 UTC (permalink / raw) To: Yu Zhao, Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Huan Yang, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf Cc: opensource.kernel This patch add two reclaim stat: nr_promote: nr_pages shrink before promote by folio_update_gen. nr_demote: nr_pages NUMA demotion passed. And then, use correct nr_scanned which evict_folios passed into trace_mm_vmscan_lru_shrink_inactive. Mistake info like this: ``` kswapd0-89 [000] 64.887613: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=64 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC ``` Correct info like this: ``` <...>-9041 [006] 43.738481: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=13 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=9 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0 nr_promote=4 nr_demote=0 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC ``` Signed-off-by: Huan Yang <link@vivo.com> --- include/linux/vmstat.h | 2 ++ include/trace/events/vmscan.h | 8 +++++++- mm/vmscan.c | 26 +++++++++++++++++++++----- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index fed855bae6d8..ac2dd9168780 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -32,6 +32,8 @@ struct reclaim_stat { unsigned nr_ref_keep; unsigned nr_unmap_fail; unsigned nr_lazyfree_fail; + unsigned nr_promote; + unsigned nr_demote; }; enum writeback_stat_item { diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 1a488c30afa5..9b403824a293 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -366,6 +366,8 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, __field(unsigned int, nr_activate1) __field(unsigned long, nr_ref_keep) __field(unsigned long, nr_unmap_fail) + __field(unsigned long, nr_promote) + __field(unsigned long, nr_demote) __field(int, priority) __field(int, reclaim_flags) ), @@ -382,17 +384,21 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, __entry->nr_activate1 = stat->nr_activate[1]; __entry->nr_ref_keep = stat->nr_ref_keep; __entry->nr_unmap_fail = stat->nr_unmap_fail; + __entry->nr_promote = stat->nr_promote; + __entry->nr_demote = stat->nr_demote; __entry->priority = priority; __entry->reclaim_flags = trace_reclaim_flags(file); ), - TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate_anon=%d nr_activate_file=%d nr_ref_keep=%ld nr_unmap_fail=%ld priority=%d flags=%s", + TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate_anon=%d" + " nr_activate_file=%d nr_ref_keep=%ld nr_unmap_fail=%ld nr_promote=%ld nr_demote=%ld priority=%d flags=%s", __entry->nid, __entry->nr_scanned, __entry->nr_reclaimed, __entry->nr_dirty, __entry->nr_writeback, __entry->nr_congested, __entry->nr_immediate, __entry->nr_activate0, __entry->nr_activate1, __entry->nr_ref_keep, __entry->nr_unmap_fail, + __entry->nr_promote, __entry->nr_demote, __entry->priority, show_reclaim_flags(__entry->reclaim_flags)) ); diff --git a/mm/vmscan.c b/mm/vmscan.c index 2cc0cb41fb32..21099b9f21e0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1063,8 +1063,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* folio_update_gen() tried to promote this page? */ if (lru_gen_enabled() && !ignore_references && - folio_mapped(folio) && folio_test_referenced(folio)) + folio_mapped(folio) && folio_test_referenced(folio)) { + stat->nr_promote += nr_pages; goto keep_locked; + } /* * The number of dirty pages determines if a node is marked @@ -1193,6 +1195,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, (thp_migration_supported() || !folio_test_large(folio))) { list_add(&folio->lru, &demote_folios); folio_unlock(folio); + stat->nr_demote += nr_pages; continue; } @@ -4495,6 +4498,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap int type; int scanned; int reclaimed; + unsigned long nr_taken = sc->nr_scanned; + unsigned int total_reclaimed = 0; LIST_HEAD(list); LIST_HEAD(clean); struct folio *folio; @@ -4521,10 +4526,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap return scanned; retry: reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); - sc->nr_reclaimed += reclaimed; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, - scanned, reclaimed, &stat, sc->priority, - type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); + total_reclaimed += reclaimed; list_for_each_entry_safe_reverse(folio, next, &list, lru) { if (!folio_evictable(folio)) { @@ -4582,6 +4584,20 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap goto retry; } + /** + * MGLRU scan_folios return nr_scanned no only contains + * isolated folios. To get actually touched folios in + * shrink_folios_list, we can record last nr_scanned which + * sc saved, and sc nr_scanned will update for each folios + * which we touched. New count sub last can get right nr_taken + */ + nr_taken = sc->nr_scanned - nr_taken; + + sc->nr_reclaimed += total_reclaimed; + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, nr_taken, + total_reclaimed, &stat, + sc->priority, type); + return scanned; } -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace 2023-10-18 8:20 ` [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace Huan Yang @ 2023-10-18 16:28 ` Yu Zhao 2023-10-19 2:24 ` Huan Yang 0 siblings, 1 reply; 10+ messages in thread From: Yu Zhao @ 2023-10-18 16:28 UTC (permalink / raw) To: Huan Yang, Jaewon Kim, Kalesh Singh Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf, opensource.kernel On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: > > This patch add two reclaim stat: > nr_promote: nr_pages shrink before promote by folio_update_gen. > nr_demote: nr_pages NUMA demotion passed. The above isn't specific to MLGRU, so they should be in a separate patchset. > And then, use correct nr_scanned which evict_folios passed into > trace_mm_vmscan_lru_shrink_inactive. > > Mistake info like this: > ``` > kswapd0-89 [000] 64.887613: mm_vmscan_lru_shrink_inactive: > nid=0 nr_scanned=64 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 > nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 > nr_ref_keep=0 nr_unmap_fail=0 priority=4 > flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > ``` > Correct info like this: > ``` > <...>-9041 [006] 43.738481: mm_vmscan_lru_shrink_inactive: > nid=0 nr_scanned=13 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 > nr_congested=0 nr_immediate=0 nr_activate_anon=9 nr_activate_file=0 > nr_ref_keep=0 nr_unmap_fail=0 nr_promote=4 nr_demote=0 priority=12 > flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC > ``` Adding Jaewon & Kalesh to take a look. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace 2023-10-18 16:28 ` Yu Zhao @ 2023-10-19 2:24 ` Huan Yang 0 siblings, 0 replies; 10+ messages in thread From: Huan Yang @ 2023-10-19 2:24 UTC (permalink / raw) To: Yu Zhao, Jaewon Kim, Kalesh Singh Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf, opensource.kernel 在 2023/10/19 0:28, Yu Zhao 写道: > On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: >> This patch add two reclaim stat: >> nr_promote: nr_pages shrink before promote by folio_update_gen. >> nr_demote: nr_pages NUMA demotion passed. > The above isn't specific to MLGRU, so they should be in a separate patchset. OK, nr_demote isn't MGLRU, separate is good, but, if this, nr_demote isn't need by myself, I just add this when I see this code. :) Please see nr_promote and nr_scanned fix is MGLRU need? > >> And then, use correct nr_scanned which evict_folios passed into >> trace_mm_vmscan_lru_shrink_inactive. >> >> Mistake info like this: >> ``` >> kswapd0-89 [000] 64.887613: mm_vmscan_lru_shrink_inactive: >> nid=0 nr_scanned=64 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 >> nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 >> nr_ref_keep=0 nr_unmap_fail=0 priority=4 >> flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC >> ``` >> Correct info like this: >> ``` >> <...>-9041 [006] 43.738481: mm_vmscan_lru_shrink_inactive: >> nid=0 nr_scanned=13 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 >> nr_congested=0 nr_immediate=0 nr_activate_anon=9 nr_activate_file=0 >> nr_ref_keep=0 nr_unmap_fail=0 nr_promote=4 nr_demote=0 priority=12 >> flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC >> ``` > Adding Jaewon & Kalesh to take a look. Thanks, Huan ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm: multi-gen lru: fix stat count 2023-10-18 8:20 [PATCH 0/2] some fix of multi-gen lru Huan Yang 2023-10-18 8:20 ` [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace Huan Yang @ 2023-10-18 8:21 ` Huan Yang 2023-10-18 16:21 ` Yu Zhao 1 sibling, 1 reply; 10+ messages in thread From: Huan Yang @ 2023-10-18 8:21 UTC (permalink / raw) To: Yu Zhao, Steven Rostedt, Masami Hiramatsu, Andrew Morton, Huan Yang, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf Cc: opensource.kernel For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, gather folios which isolate to reclaim, and invoke shirnk_folio_list. But, when complete shrink, it not gather shrink reclaim stat into sc, we can't get info like nr_dirty\congested in reclaim, and then control writeback, dirty number and mark as LRUVEC_CONGESTED, or just bpf trace shrink and get correct sc stat. This patch fix this by simple copy code from shrink_inactive_list when end of shrink list. Signed-off-by: Huan Yang <link@vivo.com> --- mm/vmscan.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 21099b9f21e0..88d1d586aea5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4593,6 +4593,41 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap */ nr_taken = sc->nr_scanned - nr_taken; + /* + * If dirty folios are scanned that are not queued for IO, it + * implies that flushers are not doing their job. This can + * happen when memory pressure pushes dirty folios to the end of + * the LRU before the dirty limits are breached and the dirty + * data has expired. It can also happen when the proportion of + * dirty folios grows not through writes but through memory + * pressure reclaiming all the clean cache. And in some cases, + * the flushers simply cannot keep up with the allocation + * rate. Nudge the flusher threads in case they are asleep. + */ + if (unlikely(stat.nr_unqueued_dirty == nr_taken)) { + wakeup_flusher_threads(WB_REASON_VMSCAN); + /* + * For cgroupv1 dirty throttling is achieved by waking up + * the kernel flusher here and later waiting on folios + * which are in writeback to finish (see shrink_folio_list()). + * + * Flusher may not be able to issue writeback quickly + * enough for cgroupv1 writeback throttling to work + * on a large system. + */ + if (!writeback_throttling_sane(sc)) + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); + } + + sc->nr.dirty += stat.nr_dirty; + sc->nr.congested += stat.nr_congested; + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; + sc->nr.writeback += stat.nr_writeback; + sc->nr.immediate += stat.nr_immediate; + sc->nr.taken += nr_taken; + if (type) + sc->nr.file_taken += nr_taken; + sc->nr_reclaimed += total_reclaimed; trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, nr_taken, total_reclaimed, &stat, -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: multi-gen lru: fix stat count 2023-10-18 8:21 ` [PATCH 2/2] mm: multi-gen lru: fix stat count Huan Yang @ 2023-10-18 16:21 ` Yu Zhao 2023-10-19 2:17 ` Huan Yang 0 siblings, 1 reply; 10+ messages in thread From: Yu Zhao @ 2023-10-18 16:21 UTC (permalink / raw) To: Huan Yang Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf, opensource.kernel On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: > > For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, > gather folios which isolate to reclaim, and invoke shirnk_folio_list. > > But, when complete shrink, it not gather shrink reclaim stat into sc, > we can't get info like nr_dirty\congested in reclaim, and then > control writeback, dirty number and mark as LRUVEC_CONGESTED, or > just bpf trace shrink and get correct sc stat. > > This patch fix this by simple copy code from shrink_inactive_list when > end of shrink list. MGLRU doesn't try to write back dirt file pages in the reclaim path -- it filters them out in sort_folio() and leaves them to the page writeback. (The page writeback is a dedicated component for this purpose). So there is nothing to fix. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: multi-gen lru: fix stat count 2023-10-18 16:21 ` Yu Zhao @ 2023-10-19 2:17 ` Huan Yang 2023-10-19 2:39 ` Yu Zhao 0 siblings, 1 reply; 10+ messages in thread From: Huan Yang @ 2023-10-19 2:17 UTC (permalink / raw) To: Yu Zhao Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf, opensource.kernel Hi Yu Zhao, Thanks for your reply. 在 2023/10/19 0:21, Yu Zhao 写道: > On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: >> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, >> gather folios which isolate to reclaim, and invoke shirnk_folio_list. >> >> But, when complete shrink, it not gather shrink reclaim stat into sc, >> we can't get info like nr_dirty\congested in reclaim, and then >> control writeback, dirty number and mark as LRUVEC_CONGESTED, or >> just bpf trace shrink and get correct sc stat. >> >> This patch fix this by simple copy code from shrink_inactive_list when >> end of shrink list. > MGLRU doesn't try to write back dirt file pages in the reclaim path -- > it filters them out in sort_folio() and leaves them to the page Nice to know this, sort_folio() filters some folio indeed. But, I want to know, if we touch some folio in shrink_folio_list(), may some folio become dirty or writeback even if sort_folio() filter then? > writeback. (The page writeback is a dedicated component for this > purpose). So there is nothing to fix. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: multi-gen lru: fix stat count 2023-10-19 2:17 ` Huan Yang @ 2023-10-19 2:39 ` Yu Zhao 2023-10-19 6:29 ` Huan Yang 0 siblings, 1 reply; 10+ messages in thread From: Yu Zhao @ 2023-10-19 2:39 UTC (permalink / raw) To: Huan Yang Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf, opensource.kernel On Wed, Oct 18, 2023 at 8:17 PM Huan Yang <link@vivo.com> wrote: > > Hi Yu Zhao, > > Thanks for your reply. > > 在 2023/10/19 0:21, Yu Zhao 写道: > > On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: > >> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, > >> gather folios which isolate to reclaim, and invoke shirnk_folio_list. > >> > >> But, when complete shrink, it not gather shrink reclaim stat into sc, > >> we can't get info like nr_dirty\congested in reclaim, and then > >> control writeback, dirty number and mark as LRUVEC_CONGESTED, or > >> just bpf trace shrink and get correct sc stat. > >> > >> This patch fix this by simple copy code from shrink_inactive_list when > >> end of shrink list. > > MGLRU doesn't try to write back dirt file pages in the reclaim path -- > > it filters them out in sort_folio() and leaves them to the page > Nice to know this, sort_folio() filters some folio indeed. > But, I want to know, if we touch some folio in shrink_folio_list(), may some > folio become dirty or writeback even if sort_folio() filter then? Good question: in that case MGLRU still doesn't try to write those folios back because isolate_folio() cleared PG_reclaim and shrink_folio_list() checks PG_reclaim: if (folio_test_dirty(folio)) { /* * Only kswapd can writeback filesystem folios * to avoid risk of stack overflow. But avoid * injecting inefficient single-folio I/O into * flusher writeback as much as possible: only * write folios when we've encountered many * dirty folios, and when we've already scanned * the rest of the LRU for clean folios and see * the same dirty folios again (with the reclaim * flag set). */ if (folio_is_file_lru(folio) && (!current_is_kswapd() || !folio_test_reclaim(folio) || !test_bit(PGDAT_DIRTY, &pgdat->flags))) { ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] mm: multi-gen lru: fix stat count 2023-10-19 2:39 ` Yu Zhao @ 2023-10-19 6:29 ` Huan Yang 0 siblings, 0 replies; 10+ messages in thread From: Huan Yang @ 2023-10-19 6:29 UTC (permalink / raw) To: Yu Zhao Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm, bpf, opensource.kernel 在 2023/10/19 10:39, Yu Zhao 写道: > On Wed, Oct 18, 2023 at 8:17 PM Huan Yang <link@vivo.com> wrote: >> Hi Yu Zhao, >> >> Thanks for your reply. >> >> 在 2023/10/19 0:21, Yu Zhao 写道: >>> On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: >>>> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, >>>> gather folios which isolate to reclaim, and invoke shirnk_folio_list. >>>> >>>> But, when complete shrink, it not gather shrink reclaim stat into sc, >>>> we can't get info like nr_dirty\congested in reclaim, and then >>>> control writeback, dirty number and mark as LRUVEC_CONGESTED, or >>>> just bpf trace shrink and get correct sc stat. >>>> >>>> This patch fix this by simple copy code from shrink_inactive_list when >>>> end of shrink list. >>> MGLRU doesn't try to write back dirt file pages in the reclaim path -- >>> it filters them out in sort_folio() and leaves them to the page >> Nice to know this, sort_folio() filters some folio indeed. >> But, I want to know, if we touch some folio in shrink_folio_list(), may some >> folio become dirty or writeback even if sort_folio() filter then? > Good question: in that case MGLRU still doesn't try to write those > folios back because isolate_folio() cleared PG_reclaim and > shrink_folio_list() checks PG_reclaim: Thank you too much. So, MGLRU have many diff between typic LRU reclaim. So, why don't offer MGLRU a own shrink path to avoid so many check of folio? And more think, it's nice to assign a anon/file reclaim hook into anon_vma/address_space? (Each folio, have their own shrink path, don't try check path if it no need.) > > if (folio_test_dirty(folio)) { > /* > * Only kswapd can writeback filesystem folios > * to avoid risk of stack overflow. But avoid > * injecting inefficient single-folio I/O into > * flusher writeback as much as possible: only > * write folios when we've encountered many > * dirty folios, and when we've already scanned > * the rest of the LRU for clean folios and see > * the same dirty folios again (with the reclaim > * flag set). > */ > if (folio_is_file_lru(folio) && > (!current_is_kswapd() || > !folio_test_reclaim(folio) || > !test_bit(PGDAT_DIRTY, &pgdat->flags))) { Thanks ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] check MGLRU promoted without hold page lock @ 2023-10-20 8:43 Huan Yang 2023-10-20 8:43 ` [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace Huan Yang 0 siblings, 1 reply; 10+ messages in thread From: Huan Yang @ 2023-10-20 8:43 UTC (permalink / raw) To: Yu Zhao, Steven Rostedt, Masami Hiramatsu, Andrew Morton, Huan Yang, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm Cc: opensource.kernel This patchset add a new reclaim_stat named nr_promote to observe number folios which MGLRU promoted before shrink touch, and then show in mm_vmscan_lru_shrink_inactive. Also, fix nr_scanned in MGLRU trace into nr_taken. (patch1) Base this trace, here are many folio promoted before shrink touch, so, due to the high frequence, move this check before hold page lock is better.(patch2) Due to we will not actually touch this folio, nr_promote trace is unnecessary, remove this trace. This patchset based on link: https://lore.kernel.org/all/20231018082104.3918770-1-link@vivo.com/ Which have many mistake pointed by Yu Zhao.(thanks) Huan Yang (2): tracing: mm: multigen-lru: fix mglru trace mm: multi-gen LRU: move promoted folio out of lock include/trace/events/vmscan.h | 3 ++- mm/vmscan.c | 31 ++++++++++++++++++++++--------- 2 files changed, 24 insertions(+), 10 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace 2023-10-20 8:43 [PATCH 0/2] check MGLRU promoted without hold page lock Huan Yang @ 2023-10-20 8:43 ` Huan Yang 0 siblings, 0 replies; 10+ messages in thread From: Huan Yang @ 2023-10-20 8:43 UTC (permalink / raw) To: Yu Zhao, Steven Rostedt, Masami Hiramatsu, Andrew Morton, Huan Yang, Suren Baghdasaryan, Vlastimil Babka, linux-kernel, linux-trace-kernel, linux-mm Cc: opensource.kernel This patch add reclaim stat: nr_promote: nr_pages shrink before promote by folio_update_gen. And then, use correct nr_scanned which evict_folios passed into trace_mm_vmscan_lru_shrink_inactive. Mistake info like this: ``` kswapd0-89 [000] 64.887613: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=64 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC ``` Correct info like this: ``` <...>-9041 [006] 43.738481: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=13 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=9 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0 nr_promote=4 priority=12 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC ``` Signed-off-by: Huan Yang <link@vivo.com> --- include/linux/vmstat.h | 1 + include/trace/events/vmscan.h | 7 +++++-- mm/vmscan.c | 25 ++++++++++++++++++++----- 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index fed855bae6d8..a3710ce08434 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -32,6 +32,7 @@ struct reclaim_stat { unsigned nr_ref_keep; unsigned nr_unmap_fail; unsigned nr_lazyfree_fail; + unsigned nr_promote; }; enum writeback_stat_item { diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 1a488c30afa5..ffcf288879e0 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -366,6 +366,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, __field(unsigned int, nr_activate1) __field(unsigned long, nr_ref_keep) __field(unsigned long, nr_unmap_fail) + __field(unsigned long, nr_promote) __field(int, priority) __field(int, reclaim_flags) ), @@ -382,18 +383,20 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, __entry->nr_activate1 = stat->nr_activate[1]; __entry->nr_ref_keep = stat->nr_ref_keep; __entry->nr_unmap_fail = stat->nr_unmap_fail; + __entry->nr_promote = stat->nr_promote; __entry->priority = priority; __entry->reclaim_flags = trace_reclaim_flags(file); ), - TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate_anon=%d nr_activate_file=%d nr_ref_keep=%ld nr_unmap_fail=%ld priority=%d flags=%s", + TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate_anon=%d" + " nr_activate_file=%d nr_ref_keep=%ld nr_unmap_fail=%ld nr_promote=%ld priority=%d flags=%s", __entry->nid, __entry->nr_scanned, __entry->nr_reclaimed, __entry->nr_dirty, __entry->nr_writeback, __entry->nr_congested, __entry->nr_immediate, __entry->nr_activate0, __entry->nr_activate1, __entry->nr_ref_keep, __entry->nr_unmap_fail, - __entry->priority, + __entry->nr_promote, __entry->priority, show_reclaim_flags(__entry->reclaim_flags)) ); diff --git a/mm/vmscan.c b/mm/vmscan.c index 2cc0cb41fb32..fb5df298c955 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1063,8 +1063,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* folio_update_gen() tried to promote this page? */ if (lru_gen_enabled() && !ignore_references && - folio_mapped(folio) && folio_test_referenced(folio)) + folio_mapped(folio) && folio_test_referenced(folio)) { + stat->nr_promote += nr_pages; goto keep_locked; + } /* * The number of dirty pages determines if a node is marked @@ -4495,6 +4497,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap int type; int scanned; int reclaimed; + unsigned long nr_taken = sc->nr_scanned; + unsigned int total_reclaimed = 0; LIST_HEAD(list); LIST_HEAD(clean); struct folio *folio; @@ -4521,10 +4525,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap return scanned; retry: reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); - sc->nr_reclaimed += reclaimed; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, - scanned, reclaimed, &stat, sc->priority, - type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); + total_reclaimed += reclaimed; list_for_each_entry_safe_reverse(folio, next, &list, lru) { if (!folio_evictable(folio)) { @@ -4582,6 +4583,20 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap goto retry; } + /** + * MGLRU scan_folios return nr_scanned no only contains + * isolated folios. To get actually touched folios in + * shrink_folios_list, we can record last nr_scanned which + * sc saved, and sc nr_scanned will update for each folios + * which we touched. New count sub last can get right nr_taken + */ + nr_taken = sc->nr_scanned - nr_taken; + + sc->nr_reclaimed += total_reclaimed; + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, nr_taken, + total_reclaimed, &stat, + sc->priority, type); + return scanned; } -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-10-20 8:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-10-18 8:20 [PATCH 0/2] some fix of multi-gen lru Huan Yang 2023-10-18 8:20 ` [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace Huan Yang 2023-10-18 16:28 ` Yu Zhao 2023-10-19 2:24 ` Huan Yang 2023-10-18 8:21 ` [PATCH 2/2] mm: multi-gen lru: fix stat count Huan Yang 2023-10-18 16:21 ` Yu Zhao 2023-10-19 2:17 ` Huan Yang 2023-10-19 2:39 ` Yu Zhao 2023-10-19 6:29 ` Huan Yang 2023-10-20 8:43 [PATCH 0/2] check MGLRU promoted without hold page lock Huan Yang 2023-10-20 8:43 ` [PATCH 1/2] tracing: mm: multigen-lru: fix mglru trace Huan Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox