From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B51A5D149CD for ; Fri, 25 Oct 2024 17:54:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 545EE6B0089; Fri, 25 Oct 2024 13:54:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F51E6B008A; Fri, 25 Oct 2024 13:54:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BD586B008C; Fri, 25 Oct 2024 13:54:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 21B916B0089 for ; Fri, 25 Oct 2024 13:54:19 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 128DE40DEA for ; Fri, 25 Oct 2024 17:54:08 +0000 (UTC) X-FDA: 82712872494.20.C87381C Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf17.hostedemail.com (Postfix) with ESMTP id 6340340011 for ; Fri, 25 Oct 2024 17:54:01 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jr+LwaCx; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729878651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FLhL761LORBHkdChYyd01T9RJGPpx3zgweo3IcacCF8=; b=wSuqiLN4LXkNQ0I7nvbYOm77EIUO8Ze36tLlOzb1sgwRNM6Ia5jGpNI1h1ebF6zGdRAnp6 K84KZCaHwklGeMfQZq1fklVZFSdN/7CVFiJ53sqt1GhjyieWnATWR/pU0Nq/ZjTE6zp8nV GtPWUP0ZLJzZSQen+SFNoaryadFu2VM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jr+LwaCx; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729878651; a=rsa-sha256; cv=none; b=YXFwh+Y2gGXrhhTvgyE3lIHfW8+i8/s0vRXMr5qqvDn6DxGJv41E1UOUJttcLqGc0hgha4 9At89mqaoHbv/XyTkiG+BVrVfiicSqfWuMPL9Fz0OEHb41tiZvX+SLwqYYQClg92lZhgSA n7yPwN7k2vPcP6H1Cq81cwjrzcDtJ/c= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a9aa8895facso358034366b.2 for ; Fri, 25 Oct 2024 10:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729878855; x=1730483655; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FLhL761LORBHkdChYyd01T9RJGPpx3zgweo3IcacCF8=; b=Jr+LwaCx8v9mDHSUKCeMwweHwvEU/EkDlHZ0Pna7wWnvh9XYsCxUT/NSRnnGambdgs 26L865MXtmU5LHF2R1Td1l2ksdN/iRyvErZlybHM19FUF1gfvqodMmd2t6u44Z3o/89r sqBBtgn29Bu+0NUHwlTLvFEgmUkCEenLc3CFOcoLDasiveb2tv/ymuixWwzX2m9tYtpq PEJSEoM70dJluZghWGrgjgt7ojMs8rcS3Do+pR03ghIgQNXymB488gCdWUhe7Izw8q2n YluoMq5bo1WykNPVC8EiSNxY6dw9QT7vj7hNJMaWqvhHTo+YmIgzClmAJhnBdX+Y0H+U fTiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729878855; x=1730483655; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FLhL761LORBHkdChYyd01T9RJGPpx3zgweo3IcacCF8=; b=JPDO9OKSVnDJOXIOkiUt1XB9Dv3Fr5bwbOyu5cstizVSp7HhRJQEJebvEYL9c9/V/S iAXyTB7sJPBellvSV+FjpL4jL5OO8GHRhEycwPV0OMNKHaLGz33Tm8talZQkfiPjt5vF j7i5y190nphbFI1CUTSZw8dSEh74C8nsWuVcM/sQC57pUWBGMabdVcHn+38OoFIaA8hM 3A4XCBn6nHgomeS/xBcJ6dfFmTwneyN5DSiYVloQphjfwUgeR1h0Z4UB2rg0UN3bY45U xIS4Bs4vD/1RB7CtcghE9tkgEDGo8zCMP3WbaKHciZOunKLIrnX9tnbFvFeyTr/t2u6v fTaQ== X-Forwarded-Encrypted: i=1; AJvYcCXt4vXdkWYj4o25sgFXCf8BdKzcrMkfkOv7MRRUWM/DxKJp/X+ezTvykOXusAG2QXFba5nDqiH6yQ==@kvack.org X-Gm-Message-State: AOJu0Yw3f+zlBWO3GaW20B1chGTsx+s65D0DcCPxEtunq+dTAwuTNkt9 mMD5aSHdQ8Tc6oeKNq7R8mQrhBvVilQwyHCRwFdHB+JsNhpKJ8t2aVaZkz16zGvtCu/XgSqOxZo dR1UDu6K2NrIzMWZv5BglkB4Sqzca714u/aW6 X-Google-Smtp-Source: AGHT+IHkJAJuOX/pyLg2QsbuOF3P/GYnzGXE2uzjMFMMhmPjkp/NkPYtGL3AMxaX2lKRotQBlGoxbStk9EkimPSCZ54= X-Received: by 2002:a17:907:9617:b0:a99:43e5:ac37 with SMTP id a640c23a62f3a-a9ad273744bmr519527566b.15.1729878854963; Fri, 25 Oct 2024 10:54:14 -0700 (PDT) MIME-Version: 1.0 References: <20241025002511.129899-1-inwardvessel@gmail.com> <20241025002511.129899-3-inwardvessel@gmail.com> <3da0f38a-51f1-43fa-a625-6bb1fe992920@gmail.com> In-Reply-To: <3da0f38a-51f1-43fa-a625-6bb1fe992920@gmail.com> From: Yosry Ahmed Date: Fri, 25 Oct 2024 10:53:39 -0700 Message-ID: Subject: Re: [PATCH 2/2] memcg: use memcg flush tracepoint To: JP Kobryn Cc: Shakeel Butt , hannes@cmpxchg.org, akpm@linux-foundation.org, rostedt@goodmis.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6340340011 X-Stat-Signature: 1eoffp5gk5k9umt8nzbmdhmxdm3zs18t X-Rspam-User: X-HE-Tag: 1729878841-533556 X-HE-Meta: U2FsdGVkX19KecAYvR4BXXUx4RvqMzyDNNa5kloHt9W0uX1OIRh4QYYlEN4FQP5S1Ss2hcUVA63IPKdmd/S13+tUmyN8MxBryeJ5FIoi11OFJhNfq8PF+EVNhlC9n1bq2I47JEoB3rWO4JwShSWUTSEBqzgEqgF1++q+2O9kUkQmGpOS3rMUPXtC9T4En3afTR+RDkFoWReZm6JaPo3eOtTbEL7m1m1+QWFzCR/SGysyoohlS3Wc1lDMtA3RPrWz32dtYu/BXvyPfzGZh1F8qvJnc4a4tYwPWrcgNEaUvxJ5GJsTA3SyBWI4XcRK7ui9aE72UsV6Rgb+cQmF8RLClGPsdjFZCSXBYKRgGbYUHdZUVQ0IOCrCZDyWakGD2PM8GPICzLB0ZNibDUnOlCZHKXUYtV6AFYi0D6jwVcCs935SjRfEvVq3YZJZ0Cw9Tg0z7FLOxJ5hD3iTkqlMnlOs0pACOuhUyw4lT5yNIFNRnLtwDmtxms88a00UmRyHcD/82ulx77IHiE4clsvfl77JZAJotqwS34nXYrSsrwMJaKVzZfdo0ivGF3pyCpblrFSZuQ7AqCP7otQN0zLEVNoJSOgvBi++LyA8aN2WMPNqOKmbrJs9ZC0OmNmOPtQzL1+uOCofAZweOtFo4vPdaD1ETWnssiXqSNv35iwDV2IJ3a4RyQeh/VvWLOeEX8S1afhXs/IAnXX7FmY6mg2gYt0+T2ZQtQkDQfuxDyI0qw87O9csY1PcQjD/b/get5/ep8mycv3v0cgmhUJUMoOXbkAS9gr/Id1oIzkgIgZ3nrcQOzDNr077N92GQ41+gy+BTRho9HCIq/t4hP9z+ofrm8k4Q+xgYmmzyvOAQerHydmjB1D8yhylZ91WvUGyzjOnyYpQp4GtZpb+iiixm9gCxMioQUPV1CvzUSkKy/xB2p1bB4S8Oy6/bPW1Li37POCa7otE1CkgDezFYZxEEaXkRJd KPcfA7Lz dzYKIeol5mM9e2KgkW0fH/evSxJm0HARG8OmYOwiaApkLY4QppeGbpjXOpTqjEqe7WrRAN20hJnFnEPF5mrUfcULQTVT+RO4CVr2W++KM+D8GUII96ASWJIYXG1fasajYNe75xq7ArU4l42SVJjI7Bz7nvTGGND0jFkIGgdJs6RNNKuvO2foRFiZFWHGeoR287CKyu5dEM3FtdSVPaWt9ujZjVA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 25, 2024 at 10:05=E2=80=AFAM JP Kobryn = wrote: > > > On 10/25/24 12:40 AM, Yosry Ahmed wrote: > > On Thu, Oct 24, 2024 at 6:16=E2=80=AFPM Shakeel Butt wrote: > >> On Thu, Oct 24, 2024 at 05:57:25PM GMT, Yosry Ahmed wrote: > >>> On Thu, Oct 24, 2024 at 5:26=E2=80=AFPM JP Kobryn wrote: > >>>> Make use of the flush tracepoint within memcontrol. > >>>> > >>>> Signed-off-by: JP Kobryn > >>> Is the intention to use tools like bpftrace to analyze where we flush > >>> the most? In this case, why can't we just attach to the fentry of > >>> do_flush_stats() and use the stack trace to find the path? > >>> > >>> We can also attach to mem_cgroup_flush_stats(), and the difference in > >>> counts between the two will be the number of skipped flushes. > >>> > >> All these functions can get inlined and then we can not really attach > >> easily. We can somehow find the offset in the inlined places and try t= o > >> use kprobe but it is prohibitive when have to do for multiple kernels > >> built with fdo/bolt. > >> > >> Please note that tracepoints are not really API, so we can remove them > >> in future if we see no usage for them. > > That's fair, but can we just add two tracepoints? This seems enough to > > collect necessary data, and prevent proliferation of tracepoints and > > the addition of the enum. > > > > I am thinking one in mem_cgroup_flush_stats() and one in > > do_flush_stats(), e.g. trace_mem_cgroup_flush_stats() and > > trace_do_flush_stats(). Although the name of the latter is too > > generic, maybe we should rename the function first to add mem_cgroup_* > > or memcg_*. > > > > WDYT? > > Hmmm, I think if we did that we wouldn't get accurate info on when the > flush was skipped. Comparing the number of hits between > mem_cgroup_flush_stats() and do_flush_stats() to determine the number of > skips doesn't seem reliable because of the places where do_flush_stats() > is called outside of mem_cgroup_flush_stats(). There would be situations > where a skip occurs, but meanwhile each call to do_flush_stats() outside > of mem_cgroup_flush_stats() would effectively subtract that skip, making > it appear that a skip did not occur. You're underestimating the power of BPF, my friend :) We can count the number of flushes in task local storages, in which case we can get a very accurate representation because the counters are per-task, so we know exactly when we skipped, but.. > > Maybe as a middle ground we could remove the trace calls for the zswap > and periodic cases, since no skips can occur there. We could then just > leave one trace call in mem_cgroup_flush_stats() and instead of an enum > we can pass a bool saying skipped or not. Something like this: > > mem_cgroup_flush_stats() > > { > > bool needs_flush =3D memcg_vmstats_needs_flush(...); > > trace_memcg_flush_stats(memcg, needs_flush); > > if (needs_flush) > > do_flush_stats(...); > > } > > > Yosry/Shakeel, do you have any thoughts on whether we should keep the > trace calls for obj_cgroup_may_zswap() and periodic workqueue cases? ..with that being said, I do like having a single tracepoint. I think with some refactoring we can end up with a single tracepoint and more data. We can even capture the cases where we force a flush but we don't really need to flush. We can even add vmstats->stats_updates to the tracepoint to know exactly how many updates we have when we flush. What about the following: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7845c64a2c570..be0e7f52ad11a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -584,8 +584,14 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void do_flush_stats(struct mem_cgroup *memcg) +static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) { + bool needs_flush =3D memcg_vmstats_needs_flush(memcg->vmstats); + + trace_memcg_flush_stats(memcg, needs_flush, force, ...); + if (!force && !needs_flush) + return; + if (mem_cgroup_is_root(memcg)) WRITE_ONCE(flush_last_time, jiffies_64); @@ -609,8 +615,7 @@ void mem_cgroup_flush_stats(struct mem_cgroup *memcg) if (!memcg) memcg =3D root_mem_cgroup; - if (memcg_vmstats_needs_flush(memcg->vmstats)) - do_flush_stats(memcg); + __mem_cgroup_flush_stats(memcg, false); } void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg) @@ -626,7 +631,7 @@ static void flush_memcg_stats_dwork(struct work_struct = *w) * Deliberately ignore memcg_vmstats_needs_flush() here so that flu= shing * in latency-sensitive paths is as cheap as possible. */ - do_flush_stats(root_mem_cgroup); + __mem_cgroup_flush_stats(root_mem_cgroup, true); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIM= E); } @@ -5272,11 +5277,8 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) break; } - /* - * mem_cgroup_flush_stats() ignores small changes. Use - * do_flush_stats() directly to get accurate stats for char= ging. - */ - do_flush_stats(memcg); + /* Force a flush to get accurate stats for charging */ + __mem_cgroup_flush_stats(memcg, true); pages =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZ= E; if (pages < max) continue;