* [PATCH v2] memcg: add tracing for memcg stat updates
@ 2024-10-15 21:37 Shakeel Butt
2024-10-15 21:52 ` Johannes Weiner
2024-10-16 17:17 ` T.J. Mercier
0 siblings, 2 replies; 3+ messages in thread
From: Shakeel Butt @ 2024-10-15 21:37 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Steven Rostedt, JP Kobryn, Yosry Ahmed, linux-mm, cgroups,
linux-kernel, Meta kernel team, Michal Hocko, Muchun Song
The memcg stats are maintained in rstat infrastructure which provides very
fast updates side and reasonable read side. However memcg added plethora
of stats and made the read side, which is cgroup rstat flush, very slow.
To solve that, threshold was added in the memcg stats read side i.e. no
need to flush the stats if updates are within the threshold.
This threshold based improvement worked for sometime but more stats were
added to memcg and also the read codepath was getting triggered in the
performance sensitive paths which made threshold based ratelimiting
ineffective. We need more visibility into the hot and cold stats i.e.
stats with a lot of updates. Let's add trace to get that visibility.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: JP Kobryn <inwardvessel@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Changes since v1:
- Used unsigned long type for memcg_rstat_events (Yosry)
- Kept the Acks and Reviews tag
include/trace/events/memcg.h | 81 ++++++++++++++++++++++++++++++++++++
mm/memcontrol.c | 13 +++++-
2 files changed, 92 insertions(+), 2 deletions(-)
create mode 100644 include/trace/events/memcg.h
diff --git a/include/trace/events/memcg.h b/include/trace/events/memcg.h
new file mode 100644
index 000000000000..8667e57816d2
--- /dev/null
+++ b/include/trace/events/memcg.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM memcg
+
+#if !defined(_TRACE_MEMCG_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MEMCG_H
+
+#include <linux/memcontrol.h>
+#include <linux/tracepoint.h>
+
+
+DECLARE_EVENT_CLASS(memcg_rstat_stats,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, int val),
+
+ TP_ARGS(memcg, item, val),
+
+ TP_STRUCT__entry(
+ __field(u64, id)
+ __field(int, item)
+ __field(int, val)
+ ),
+
+ TP_fast_assign(
+ __entry->id = cgroup_id(memcg->css.cgroup);
+ __entry->item = item;
+ __entry->val = val;
+ ),
+
+ TP_printk("memcg_id=%llu item=%d val=%d",
+ __entry->id, __entry->item, __entry->val)
+);
+
+DEFINE_EVENT(memcg_rstat_stats, mod_memcg_state,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, int val),
+
+ TP_ARGS(memcg, item, val)
+);
+
+DEFINE_EVENT(memcg_rstat_stats, mod_memcg_lruvec_state,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, int val),
+
+ TP_ARGS(memcg, item, val)
+);
+
+DECLARE_EVENT_CLASS(memcg_rstat_events,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, unsigned long val),
+
+ TP_ARGS(memcg, item, val),
+
+ TP_STRUCT__entry(
+ __field(u64, id)
+ __field(int, item)
+ __field(unsigned long, val)
+ ),
+
+ TP_fast_assign(
+ __entry->id = cgroup_id(memcg->css.cgroup);
+ __entry->item = item;
+ __entry->val = val;
+ ),
+
+ TP_printk("memcg_id=%llu item=%d val=%lu",
+ __entry->id, __entry->item, __entry->val)
+);
+
+DEFINE_EVENT(memcg_rstat_events, count_memcg_events,
+
+ TP_PROTO(struct mem_cgroup *memcg, int item, unsigned long val),
+
+ TP_ARGS(memcg, item, val)
+);
+
+
+#endif /* _TRACE_MEMCG_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c098fd7f5c5e..17af08367c68 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -71,6 +71,10 @@
#include <linux/uaccess.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/memcg.h>
+#undef CREATE_TRACE_POINTS
+
#include <trace/events/vmscan.h>
struct cgroup_subsys memory_cgrp_subsys __read_mostly;
@@ -682,7 +686,9 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
return;
__this_cpu_add(memcg->vmstats_percpu->state[i], val);
- memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val));
+ val = memcg_state_val_in_pages(idx, val);
+ memcg_rstat_updated(memcg, val);
+ trace_mod_memcg_state(memcg, idx, val);
}
/* idx can be of type enum memcg_stat_item or node_stat_item. */
@@ -741,7 +747,9 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
/* Update lruvec */
__this_cpu_add(pn->lruvec_stats_percpu->state[i], val);
- memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val));
+ val = memcg_state_val_in_pages(idx, val);
+ memcg_rstat_updated(memcg, val);
+ trace_mod_memcg_lruvec_state(memcg, idx, val);
memcg_stats_unlock();
}
@@ -832,6 +840,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
memcg_stats_lock();
__this_cpu_add(memcg->vmstats_percpu->events[i], count);
memcg_rstat_updated(memcg, count);
+ trace_count_memcg_events(memcg, idx, count);
memcg_stats_unlock();
}
--
2.43.5
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] memcg: add tracing for memcg stat updates
2024-10-15 21:37 [PATCH v2] memcg: add tracing for memcg stat updates Shakeel Butt
@ 2024-10-15 21:52 ` Johannes Weiner
2024-10-16 17:17 ` T.J. Mercier
1 sibling, 0 replies; 3+ messages in thread
From: Johannes Weiner @ 2024-10-15 21:52 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Muchun Song,
Steven Rostedt, JP Kobryn, Yosry Ahmed, linux-mm, cgroups,
linux-kernel, Meta kernel team, Michal Hocko, Muchun Song
On Tue, Oct 15, 2024 at 02:37:21PM -0700, Shakeel Butt wrote:
> The memcg stats are maintained in rstat infrastructure which provides very
> fast updates side and reasonable read side. However memcg added plethora
> of stats and made the read side, which is cgroup rstat flush, very slow.
> To solve that, threshold was added in the memcg stats read side i.e. no
> need to flush the stats if updates are within the threshold.
>
> This threshold based improvement worked for sometime but more stats were
> added to memcg and also the read codepath was getting triggered in the
> performance sensitive paths which made threshold based ratelimiting
> ineffective. We need more visibility into the hot and cold stats i.e.
> stats with a lot of updates. Let's add trace to get that visibility.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: JP Kobryn <inwardvessel@gmail.com>
> Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] memcg: add tracing for memcg stat updates
2024-10-15 21:37 [PATCH v2] memcg: add tracing for memcg stat updates Shakeel Butt
2024-10-15 21:52 ` Johannes Weiner
@ 2024-10-16 17:17 ` T.J. Mercier
1 sibling, 0 replies; 3+ messages in thread
From: T.J. Mercier @ 2024-10-16 17:17 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Steven Rostedt, JP Kobryn, Yosry Ahmed, linux-mm,
cgroups, linux-kernel, Meta kernel team, Michal Hocko,
Muchun Song
On Tue, Oct 15, 2024 at 2:37 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> The memcg stats are maintained in rstat infrastructure which provides very
> fast updates side and reasonable read side. However memcg added plethora
> of stats and made the read side, which is cgroup rstat flush, very slow.
> To solve that, threshold was added in the memcg stats read side i.e. no
> need to flush the stats if updates are within the threshold.
>
> This threshold based improvement worked for sometime but more stats were
> added to memcg and also the read codepath was getting triggered in the
> performance sensitive paths which made threshold based ratelimiting
> ineffective. We need more visibility into the hot and cold stats i.e.
> stats with a lot of updates. Let's add trace to get that visibility.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: JP Kobryn <inwardvessel@gmail.com>
> Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-10-16 17:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-15 21:37 [PATCH v2] memcg: add tracing for memcg stat updates Shakeel Butt
2024-10-15 21:52 ` Johannes Weiner
2024-10-16 17:17 ` T.J. Mercier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox