From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22ED7D1D896 for ; Tue, 15 Oct 2024 18:48:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA8CC6B0085; Tue, 15 Oct 2024 14:48:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A32B96B0089; Tue, 15 Oct 2024 14:48:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AB006B008A; Tue, 15 Oct 2024 14:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6ACB16B0085 for ; Tue, 15 Oct 2024 14:48:11 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C77F11417C1 for ; Tue, 15 Oct 2024 18:48:01 +0000 (UTC) X-FDA: 82676721246.07.EF44241 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf24.hostedemail.com (Postfix) with ESMTP id 675D2180003 for ; Tue, 15 Oct 2024 18:48:07 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FwhwPLGi; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729017972; a=rsa-sha256; cv=none; b=vVYdJItnMkUy5gUYMwT/RObHGgdXXVUUCzaXFhI/8FnMX2EeNU8wqszwqnnVD4mMs0PBc9 ZZKtlJecWJ1FMr+T+SR4FMhXlIWadMGaHzGXi8Bhc7ZAcFVnIZb9sDItS6svqu1xvJY72B 7MpKyBMdBGeuQV0R6Sq+BUChPHZwm/o= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FwhwPLGi; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729017972; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r0Q6Rr8GmR7OPODRoA8Pk21Xu3m03yYr3s3ff8F/GCo=; b=itEfahpeYyt3TImk2HKpgwjkpbEBP/t0B7XgjANlqYqSwzf1mgO0Lgm8jczhtwsyY8NxGx 0TqFt4Fm04JE0muxcmN0O7ih8eM8FAuEbX9lssGanW09H9vBhGwL/tjznh4voVE5Zs7s6H 5v1MpSxK9W1zWUbT5hCkBYpxTKKlNDA= Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a99650da839so970894066b.2 for ; Tue, 15 Oct 2024 11:48:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729018088; x=1729622888; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=r0Q6Rr8GmR7OPODRoA8Pk21Xu3m03yYr3s3ff8F/GCo=; b=FwhwPLGirdEi9WfcT9DoAdpM1UwR0u3wnfpRHd1LHxtXxVSLC6aeNhbUMhIEoK+Tas Jz/Y/2/hrmtEyXNxVm3GcIBIVVKv/w5UPT4BzPtVStYUjQirU4m81ho3xi+NNl3KWmvC Lh+okI2338rN+Cncr9ahpyfVl/tCfQtQ789ee1nm0OEzIvTZU9l1R6arAY+0wj5Yg7Jy Ums8GsQzsQQXxHIKV462iBr9k7yA/DBsx4Q2ywW4YAAyF9/HWfl59BdjjUAg3tNmPdYj q0LloESs4Jg8zF17crRWMorZW8arR2B2As7ufOiSBJqGX2mKq/LQt4u8WfSclaSjlKZ1 +0EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729018088; x=1729622888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r0Q6Rr8GmR7OPODRoA8Pk21Xu3m03yYr3s3ff8F/GCo=; b=BbGy+Vnv+ZCrLfbyHSAloChJmAqsPJjwvlNQpnvv6hn5qM/zuXZlSAeNbPOyX0ZjOi LCSSncrDr2dmQGRWfC44wny/6WsPvE5usNes3L29Jtl+MfoWp6DaFP3Pk3jWpQ2HMHpU u53aeP0oNaeGYLse1M6LcDwrX1zK3oFPNQ1zcji4HFdE5S4eX1RXp0cMt/blUbZIsmEO fBXGHMvgHx8LAhnvYFJoaC36fbUcmJOHuvFKOtVt5GL9Tm7zLX/vviBn86KoZV3JFqWl O32hLifjdjl388QyovdLv4esF0Wg7TTKlZvScsdADlcl8Y27wOrxXQ0a68exPWM60upA M54Q== X-Forwarded-Encrypted: i=1; AJvYcCXHJwZ5HjZqF3vsbjv82BXj4Ya/cybIVNdA1+1BK+F8LYwwj5RPLWavD/MI5l/1dnIOCmA2akM+yQ==@kvack.org X-Gm-Message-State: AOJu0YyTeDUAXP2x92GgaZDu8JK8TH8OPx4aZDzaGcrAAHx/bBe8UKzX PA0kDT3syr5VplSmQ4JD3VnZo/Qgl11wcU7RrW34IOqRxqVzdVJQ2JTAdC6GLYhGupl0gLkhAuG nDA2evxzZy8pZJl3xFe/Wy1LzSvSa9nHPZ2JR X-Google-Smtp-Source: AGHT+IEPE4Wm+iAmO9A2wTEUS7hNRVy3vUa9Q+I2Ede5JQ064QDWeqSACfAYirinEsa2vM2o7hLn+VBgAb65Kad6Xkc= X-Received: by 2002:a17:907:1b1c:b0:a99:7455:25f2 with SMTP id a640c23a62f3a-a99e3cdc18bmr1202242066b.39.1729018087243; Tue, 15 Oct 2024 11:48:07 -0700 (PDT) MIME-Version: 1.0 References: <20241010003550.3695245-1-shakeel.butt@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Tue, 15 Oct 2024 11:47:29 -0700 Message-ID: Subject: Re: [PATCH] memcg: add tracing for memcg stat updates To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Steven Rostedt , JP Kobryn , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 675D2180003 X-Stat-Signature: zjj46barz1h5qpio8u4gtntsfrrwpuar X-Rspam-User: X-HE-Tag: 1729018087-722771 X-HE-Meta: U2FsdGVkX18nI8gPbRRCSRXbNbktALdLXnJsvpRoQvcxYIsjidlQCnf5u/zofW4LrPUUgpFf/pq8Di50d5ozlFy2xaSg8QYvjf19xT/QFjUpjiPE1Uf6UNDufqFrzcC4/Y7S5fg5fyEj1UVBXI1lVxpkS7HmZ6YZMmQ8FNB+yHPFNJ3OfpL2KoEafYpOR7+ZY6xWCgBcBAFYvrdLaiCjtRmehoZlACGjLAeR286YkTkrpiXFD31LIZAIauN3N/XIqMY5w9vnAXm/oQrGNnuFMqdRT8AyXp7wLwG+jDtvL+SHcydsulNMEvUGaJ/vjjx1i+uF5eLRotkIdp5WTFc91AbWOUr9U6G6PAp/d65WzvC5USUO2YgTuVG+xvq6KByCRWGL2pV1HxtKSGAHD6mDmgPkBcIEOMAkvp/kCA65YBuBwniItKIPMSprKezKbFlWQe5NX6j2ZaARLb7LDpdvQReTx6WyfFzECJRzDj/PEBgXc0RnOnBWiXB63fI09xLwTOkQAtkOztOFyOL1atVMPDdyVZTHQB6u5lTOVwZRr0KRYASBJ2KMiUYmn/D5o7r+bCxd9A+r0CMhydEBPib5sOiBtHgYv3ngqx2MBHg1Tyc6SQpKPeDjutPstgZBjL6+7JPMEdDf0xgyPDyOE9QvLP/3Nz4GukOY0biJu+sDcEWoR16H2w0srnTN3ToJpubc5DBNPZzaIh8+Vg17nIPgphTMlEGu1UNkJHGPRINtpuLLw25tkQXhZKc2OBOuH+C6Ju7wSTxOTNn61G+GNNho8ZvFEfyLmSnS7JQ0615I0m5qFdcOSvXB0cpZ1iG7rYH5BPj/JZMiXnWQ9JnRhuAH7H8h+5BPCZuMoURd8fRUf8f0q+6yGAgTCBkv9tmjhe+IIQmjIQ3L325ICQ86PXrKS2sL8zvY8HOF4IuYygR0p9botg0AFgIO9am5KkAIpMgnr0sZ6T7H2mmUOdWFhlY ZjxN8Bk7 zFlN8HiNfy4n1KXlbQkm9odJqw5Yw4wd3ExQZgwWxg5MOEg8rtaZ0rokABA19VTWLTr7W76IuYbfrZXVnMOkRZIr5znDyYqpxkEjfEGJFnj3pDwDF0Z+CqJoQ9ouPKWwAMTdVGkZtA7GDdLz1A8MFInlrDI/+ewohCp7WKcNSwooFJQqzYTPlCyJdMaMXIID3W/ZJTPxwytUIR9WoN2wGgORGmWzFLGg07OeEIs3ilELz4aK6YiaugggCSv0NkXh8C/iR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 15, 2024 at 11:39=E2=80=AFAM Shakeel Butt wrote: > > On Tue, Oct 15, 2024 at 01:07:30AM GMT, Yosry Ahmed wrote: > > On Wed, Oct 9, 2024 at 5:36=E2=80=AFPM Shakeel Butt wrote: > > > > > > The memcg stats are maintained in rstat infrastructure which provides > > > very fast updates side and reasonable read side. However memcg added > > > plethora of stats and made the read side, which is cgroup rstat flush= , > > > very slow. To solve that, threshold was added in the memcg stats read > > > side i.e. no need to flush the stats if updates are within the > > > threshold. > > > > > > This threshold based improvement worked for sometime but more stats w= ere > > > added to memcg and also the read codepath was getting triggered in th= e > > > performance sensitive paths which made threshold based ratelimiting > > > ineffective. We need more visibility into the hot and cold stats i.e. > > > stats with a lot of updates. Let's add trace to get that visibility. > > > > > > Signed-off-by: Shakeel Butt > > > > One question below, otherwise: > > > > Reviewed-by: Yosry Ahmed > > > > > --- > > > include/trace/events/memcg.h | 59 ++++++++++++++++++++++++++++++++++= ++ > > > mm/memcontrol.c | 13 ++++++-- > > > 2 files changed, 70 insertions(+), 2 deletions(-) > > > create mode 100644 include/trace/events/memcg.h > > > > > > diff --git a/include/trace/events/memcg.h b/include/trace/events/memc= g.h > > > new file mode 100644 > > > index 000000000000..913db9aba580 > > > --- /dev/null > > > +++ b/include/trace/events/memcg.h > > > @@ -0,0 +1,59 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > +#undef TRACE_SYSTEM > > > +#define TRACE_SYSTEM memcg > > > + > > > +#if !defined(_TRACE_MEMCG_H) || defined(TRACE_HEADER_MULTI_READ) > > > +#define _TRACE_MEMCG_H > > > + > > > +#include > > > +#include > > > + > > > + > > > +DECLARE_EVENT_CLASS(memcg_rstat, > > > + > > > + TP_PROTO(struct mem_cgroup *memcg, int item, int val), > > > + > > > + TP_ARGS(memcg, item, val), > > > + > > > + TP_STRUCT__entry( > > > + __field(u64, id) > > > + __field(int, item) > > > + __field(int, val) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->id =3D cgroup_id(memcg->css.cgroup); > > > + __entry->item =3D item; > > > + __entry->val =3D val; > > > + ), > > > + > > > + TP_printk("memcg_id=3D%llu item=3D%d val=3D%d", > > > + __entry->id, __entry->item, __entry->val) > > > +); > > > + > > > +DEFINE_EVENT(memcg_rstat, mod_memcg_state, > > > + > > > + TP_PROTO(struct mem_cgroup *memcg, int item, int val), > > > + > > > + TP_ARGS(memcg, item, val) > > > +); > > > + > > > +DEFINE_EVENT(memcg_rstat, mod_memcg_lruvec_state, > > > + > > > + TP_PROTO(struct mem_cgroup *memcg, int item, int val), > > > + > > > + TP_ARGS(memcg, item, val) > > > +); > > > + > > > +DEFINE_EVENT(memcg_rstat, count_memcg_events, > > > + > > > + TP_PROTO(struct mem_cgroup *memcg, int item, int val), > > > + > > > + TP_ARGS(memcg, item, val) > > > +); > > > + > > > + > > > +#endif /* _TRACE_MEMCG_H */ > > > + > > > +/* This part must be outside protection */ > > > +#include > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index c098fd7f5c5e..17af08367c68 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -71,6 +71,10 @@ > > > > > > #include > > > > > > +#define CREATE_TRACE_POINTS > > > +#include > > > +#undef CREATE_TRACE_POINTS > > > + > > > #include > > > > > > struct cgroup_subsys memory_cgrp_subsys __read_mostly; > > > @@ -682,7 +686,9 @@ void __mod_memcg_state(struct mem_cgroup *memcg, = enum memcg_stat_item idx, > > > return; > > > > > > __this_cpu_add(memcg->vmstats_percpu->state[i], val); > > > - memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)= ); > > > + val =3D memcg_state_val_in_pages(idx, val); > > > + memcg_rstat_updated(memcg, val); > > > + trace_mod_memcg_state(memcg, idx, val); > > > } > > > > > > /* idx can be of type enum memcg_stat_item or node_stat_item. */ > > > @@ -741,7 +747,9 @@ static void __mod_memcg_lruvec_state(struct lruve= c *lruvec, > > > /* Update lruvec */ > > > __this_cpu_add(pn->lruvec_stats_percpu->state[i], val); > > > > > > - memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)= ); > > > + val =3D memcg_state_val_in_pages(idx, val); > > > + memcg_rstat_updated(memcg, val); > > > + trace_mod_memcg_lruvec_state(memcg, idx, val); > > > memcg_stats_unlock(); > > > } > > > > > > @@ -832,6 +840,7 @@ void __count_memcg_events(struct mem_cgroup *memc= g, enum vm_event_item idx, > > > memcg_stats_lock(); > > > __this_cpu_add(memcg->vmstats_percpu->events[i], count); > > > memcg_rstat_updated(memcg, count); > > > + trace_count_memcg_events(memcg, idx, count); > > > > count here is an unsigned long, and we are casting it to int, right? > > > > Would it be slightly better if the tracepoint uses a long instead of > > int? It's still not ideal but probably better than int. > > > > Do you mean something line the following? If this looks good to you then > we can ask Andrew to squash this in the patch. Yes, unless you have a better way to also accommodate the unsigned long value in __count_memcg_events(). > > > diff --git a/include/trace/events/memcg.h b/include/trace/events/memcg.h > index 913db9aba580..37812900acce 100644 > --- a/include/trace/events/memcg.h > +++ b/include/trace/events/memcg.h > @@ -11,14 +11,14 @@ > > DECLARE_EVENT_CLASS(memcg_rstat, > > - TP_PROTO(struct mem_cgroup *memcg, int item, int val), > + TP_PROTO(struct mem_cgroup *memcg, int item, long val), > > TP_ARGS(memcg, item, val), > > TP_STRUCT__entry( > __field(u64, id) > __field(int, item) > - __field(int, val) > + __field(long, val) > ), > > TP_fast_assign( > @@ -33,21 +33,21 @@ DECLARE_EVENT_CLASS(memcg_rstat, > > DEFINE_EVENT(memcg_rstat, mod_memcg_state, > > - TP_PROTO(struct mem_cgroup *memcg, int item, int val), > + TP_PROTO(struct mem_cgroup *memcg, int item, long val), > > TP_ARGS(memcg, item, val) > ); > > DEFINE_EVENT(memcg_rstat, mod_memcg_lruvec_state, > > - TP_PROTO(struct mem_cgroup *memcg, int item, int val), > + TP_PROTO(struct mem_cgroup *memcg, int item, long val), > > TP_ARGS(memcg, item, val) > ); > > DEFINE_EVENT(memcg_rstat, count_memcg_events, > > - TP_PROTO(struct mem_cgroup *memcg, int item, int val), > + TP_PROTO(struct mem_cgroup *memcg, int item, long val), > > TP_ARGS(memcg, item, val) > ); >