From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8283DE6ADE7 for ; Sat, 23 Nov 2024 06:47:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAB706B0085; Sat, 23 Nov 2024 01:47:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5AD36B0088; Sat, 23 Nov 2024 01:47:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FAFA6B0089; Sat, 23 Nov 2024 01:47:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 826F66B0085 for ; Sat, 23 Nov 2024 01:47:33 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3AC9181AE2 for ; Sat, 23 Nov 2024 06:47:33 +0000 (UTC) X-FDA: 82816428744.12.D2576A5 Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) by imf10.hostedemail.com (Postfix) with ESMTP id 239B8C0005 for ; Sat, 23 Nov 2024 06:47:30 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2XSr+Wz7; spf=pass (imf10.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732344451; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ymgflkDUVyug1wPOQFiRQ2YqU/Q7TymF7PGNQguc3+A=; b=Zrx6e/ungDNQVGHo3jb58kEwZD4g7LEq2qCZhHeMpvn6VKFdZRxxvkEkRJlSgeOhXj6bnw HWbZa67y2PPZUgp5urWFtIvbHv6a6WmXds770PUo91O8rn2jg8yWNWJAvnPphVkvWITCO3 n2lTSMP5R7VSfytRInKNwIaFiyAZfTw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732344451; a=rsa-sha256; cv=none; b=FilWK3fVpH4eI7qRwVTudyCZw5sdhobjANI8kxlirBt09mlSQpGYR/gJyIy1Wvg82hd/eA Z+TryP99kMhooHbnh2xgSeO22NlMF+YU8KWXVPOMPCHaiUAbEC63Pde23/b0dZpF1KZ7nN GztuL+4sMAGzK9yI8ciGeH/2lbAzxjQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2XSr+Wz7; spf=pass (imf10.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-6d41eeca2d6so20095916d6.1 for ; Fri, 22 Nov 2024 22:47:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732344450; x=1732949250; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ymgflkDUVyug1wPOQFiRQ2YqU/Q7TymF7PGNQguc3+A=; b=2XSr+Wz7V5WmO+Q5IJBP8XKud3lvsi7YrqchVr2bk6BuDnwnn/2YpcWFVf+P23JOux cj/Qbsc79pnP4USMVtZLRA/wwUUvljwp93Mb6+RPm+FaDuy1IMiQX9LDI9QHBKy1sO/Y BXeznO8STF4aMgETZTAYbqkaFD7DUO0OIRN9YLMMWfd3Bq5YQGoaSxo0ADd3/UeoAr4d /JI2oPAOoWSMWlKMnRqc5DU350dgQBUumFmk/zVkVTjPpEHZDotKe3XI4Q4OgwCbdIBr SoCF2/4VRUVrLOWSfOPUSkxYYEm4gFDe9xFjiRZW+aStDPtwEA6YnzrKA5lkyEm7f2ur hygA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732344450; x=1732949250; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ymgflkDUVyug1wPOQFiRQ2YqU/Q7TymF7PGNQguc3+A=; b=JLa/OM4UxtjyYTWlYDWRN7jJ1BlgJiQPiW1/3PYvk9p1WXN9wo7vRjgog34X2pCmIu Pm3OFx4ZUepr0abpHbF/Usv9MnPqYOiasOwhgWctRUejrUx/BB5Fqgo1snjD91qY+BNS W5zauFy/X2xxdmRs3VmOElDslFmlLmwrQG+VCh20m5VZ19Nx5+AVR9GO4Brh06W6Qvof M7djwuT12iCwaOX+Zb8xsAHGCmz+KL+PnCK+90AYwttrTJ3/2miUezc23Yb0PrCVsPgD FVTo2zAJiVtOYnbluiGipuPARSik3vjFactEuBmXdfje0xZcpF6pukAGxsx4ZlRc+0n6 YguA== X-Forwarded-Encrypted: i=1; AJvYcCUsprFPi7OeZ5KsPThxL3BQC6uwoCj+4R5LdsAZ/we8gbcN+/bLDI92jEzh25HIiZwZR0eGHOYaYw==@kvack.org X-Gm-Message-State: AOJu0YxYQuBqGEg9tO7cMJGBJQPLgui3gann4375DA1xZ0wHtTgePydg zJUUVs1eYXWpsaCSS006NoGOqd3msAlQmJpOKn6whBYMt6olD/G/XXFijtvk3tzLh3daXAT6yqE ZuY/w+gH7378ONEK0wovn7oWV23XPtaCVN5Xk X-Gm-Gg: ASbGncv8yTEy1gbL9b32ENqs6FOJqsJdQK917unjOPUJlNPJ2R0Xg5Ejsyea7t6dLPu 76zNAXvwvcUJ6qOy2be+0Idf/XyEe X-Google-Smtp-Source: AGHT+IEkDB9TYGxxMJwZU3/JVtCFBD87cfOIXWCVX3m/odh7EtLSO0xYG3/4XTBX7kGJq+bQhEf7xk72q9otkdJWVxs= X-Received: by 2002:a05:6214:410:b0:6d4:1bad:7405 with SMTP id 6a1803df08f44-6d450e6a990mr87224516d6.3.1732344450249; Fri, 22 Nov 2024 22:47:30 -0800 (PST) MIME-Version: 1.0 References: <20241123060939.169978-1-shakeel.butt@linux.dev> In-Reply-To: <20241123060939.169978-1-shakeel.butt@linux.dev> From: Yosry Ahmed Date: Fri, 22 Nov 2024 22:46:53 -0800 Message-ID: Subject: Re: [PATCH] mm: mmap_lock: optimize mmap_lock tracepoints To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Axel Rasmussen , Steven Rostedt , Suren Baghdasaryan , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: uj11qmy87nh9j1s14etmj6iz5kwhdqpa X-Rspam-User: X-Rspamd-Queue-Id: 239B8C0005 X-Rspamd-Server: rspam02 X-HE-Tag: 1732344450-732005 X-HE-Meta: U2FsdGVkX19tuMWjBqdaCm/WxAWPCnC+3UXYYvPy38w+hYqO+NjpwOFQq9fqbXGGlINu0AuMb5DAZ2rh1NLFjUAznWmtd9Pwsh1j23kIEFCLZBBO4znDVwhp5RHOz2RG28jgtixUrsJT6lPYu2PfdFuD8uILlBCSE1sVhZ8rb74gfjXBNSHFMEKy7L/mwqf7Eq9Ox3ljHpVam1Q6paQnZsY6lKny0cyNDLz3T/hghte+fWwPDwIRpwXGyPcCXmhwKq5EsrK7kjadtxn2rOGE/mYV09qDevFXFcGLHbnp2J+1jaHGNmDBHyxRweVnanZXDlM9TfG0mPJwKXotUbW2Gl6f7VC2/0KB5jr6+ta+hMnErF+Ld6VlEB3ATJ+MOnz7ypilDD8SfNnjTYBuhk5efe+lrPLGRBGOWZs8a7HksR3x4n5BlSO6pQ0ENEiaonGM8EfCB5H6Y40nU+cE9MX1ka/DfHwKwX+hlW5wwKS9872l6/wkJu+B250+QVQR8CQofnAPa4LWz/FJwFGOtRTYv1yB712dFyL4BY9glL7h/5jzNkepp79ldyEXZdHQW9GCmKPjJ7ft/2040fR4TxakNDXIaKk+hbup2Yb3sriAe9ahv9P250FWrbTASTlhw/f7V8hbaAPyvaRR30bQXFjOgKrsl9un4Wq23Ow0bpG+mNGNcn157lDSQJY5oSBBNt9ANpu4+GAs0elbiuMCy+SPAVMLoK6I5ClojFcCFfvLZ5FrN+B0q4G9mzz/n/eY/k6d/D8m3Z/3mth7VmcATBPo9M6tvns/O2dv9QKguruqbZPy4fQYDvuWQBdLs8Hg4IP88Kwy6ecyqKvw+WlAZOnbdsmHDILvVwqejZoy4AJ/IKFl0cq74kCsyzpnfmxDB3FckCAaLIaMvCN00X328QBIlHAzh75xj6u/IISajVOHKwQtlyG/SZzQP47wBW9wZghVsuxqXz+pk8kUzU+P/1I RU6EouNN DeWAS0BxTIBPkJfOl9YjaRXHNjJgi5PlcFJLOVrisFoPbW7jZQpWO2l7toX2hvWN6ng5C2yWVuO5a1dfLvZPKuUo9TdxkLdfhk+moNpNqRlIo2FJpNPFlaHloIQhVFJmxDelFHV65eWURThBRBIT7rOg7TxRFqQW074G28atyINqkeSyXjziQHVbabutAHvsj+OVqIk7XyMViluC9roAsuvSqQ9nEhuYSYp/sGyjN433RNSsm4Gg0ngOUyA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 22, 2024 at 10:10=E2=80=AFPM Shakeel Butt wrote: > > We are starting to deploy mmap_lock tracepoint monitoring across our > fleet and the early results showed that these tracepoints are consuming > significant amount of CPUs in kernfs_path_from_node when enabled. > > It seems like the kernel is trying to resolved the cgroup path in the s/resolved/resolve > fast path of the locking code path when the tracepoints are enabled. In > addition for some application their metrics are regressing when > monitoring is enabled. > > The cgroup path resolution can be slow and should not be done in the > fast path. Most userspace tools, like bpftrace, provides functionality > to get the cgroup path from cgroup id, so let's just trace the cgroup > id and the users can use better tools to get the path in the slow path. > > Signed-off-by: Shakeel Butt > --- > include/linux/memcontrol.h | 18 ++++++++++++ > include/trace/events/mmap_lock.h | 32 ++++++++++---------- > mm/mmap_lock.c | 50 ++------------------------------ > 3 files changed, 36 insertions(+), 64 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 5502aa8e138e..d82f08cd70cd 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1046,6 +1046,19 @@ static inline void memcg_memory_event_mm(struct mm= _struct *mm, > > void split_page_memcg(struct page *head, int old_order, int new_order); > > +static inline u64 memcg_id_from_mm(struct mm_struct *mm) The usage of memcg_id here and throughout the patch is a bit confusing because we have a member called 'id' in struct mem_cgroup, but this isn't it. This is the cgroup_id of the memcg. I admit it's hard to distinguish them during naming, but when I first saw the function I thought it was returning memcg->id. Maybe just cgroup_id_from_mm()? In cgroup v2, the cgroup id is the same regardless of the controller anyway, in cgroup v1, it's kinda natural that we return the cgroup id of the memcg. I don't feel strongly, but I prefer that we use clearer naming, and either way a comment may help clarify things. > +{ > + struct mem_cgroup *memcg; > + u64 id =3D 0; > + > + rcu_read_lock(); > + memcg =3D mem_cgroup_from_task(rcu_dereference(mm->owner)); > + if (likely(memcg)) > + id =3D cgroup_id(memcg->css.cgroup); We return 0 if the memcg is NULL here, shouldn't we return the cgroup id of the root memcg instead? This is more consistent with get_mem_cgroup_from_mm(), and makes sure we always return the id of a valid cgroup. > + rcu_read_unlock(); > + return id; > +} > + > #else /* CONFIG_MEMCG */ > > #define MEM_CGROUP_ID_SHIFT 0 > @@ -1466,6 +1479,11 @@ void count_memcg_event_mm(struct mm_struct *mm, en= um vm_event_item idx) > static inline void split_page_memcg(struct page *head, int old_order, in= t new_order) > { > } > + > +static inline u64 memcg_id_from_mm(struct mm_struct *mm) > +{ > + return 0; > +} > #endif /* CONFIG_MEMCG */ > > /* > diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap= _lock.h > index bc2e3ad787b3..5529933d19c5 100644 > --- a/include/trace/events/mmap_lock.h > +++ b/include/trace/events/mmap_lock.h > @@ -5,6 +5,7 @@ > #if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) > #define _TRACE_MMAP_LOCK_H > > +#include > #include > #include > > @@ -12,64 +13,61 @@ struct mm_struct; > > DECLARE_EVENT_CLASS(mmap_lock, > > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write= ), > + TP_PROTO(struct mm_struct *mm, bool write), > > - TP_ARGS(mm, memcg_path, write), > + TP_ARGS(mm, write), > > TP_STRUCT__entry( > __field(struct mm_struct *, mm) > - __string(memcg_path, memcg_path) > + __field(u64, memcg_id) > __field(bool, write) > ), > > TP_fast_assign( > __entry->mm =3D mm; > - __assign_str(memcg_path); > + __entry->memcg_id =3D memcg_id_from_mm(mm); > __entry->write =3D write; > ), > > TP_printk( > - "mm=3D%p memcg_path=3D%s write=3D%s", > - __entry->mm, > - __get_str(memcg_path), > + "mm=3D%p memcg_id=3D%llu write=3D%s", > + __entry->mm, __entry->memcg_id, > __entry->write ? "true" : "false" > ) > ); > > #define DEFINE_MMAP_LOCK_EVENT(name) = \ > DEFINE_EVENT(mmap_lock, name, \ > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, \ > - bool write), \ > - TP_ARGS(mm, memcg_path, write)) > + TP_PROTO(struct mm_struct *mm, bool write), \ > + TP_ARGS(mm, write)) > > DEFINE_MMAP_LOCK_EVENT(mmap_lock_start_locking); > DEFINE_MMAP_LOCK_EVENT(mmap_lock_released); > > TRACE_EVENT(mmap_lock_acquire_returned, > > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write= , > - bool success), > + TP_PROTO(struct mm_struct *mm, bool write, bool success), > > - TP_ARGS(mm, memcg_path, write, success), > + TP_ARGS(mm, write, success), > > TP_STRUCT__entry( > __field(struct mm_struct *, mm) > - __string(memcg_path, memcg_path) > + __field(u64, memcg_id) > __field(bool, write) > __field(bool, success) > ), > > TP_fast_assign( > __entry->mm =3D mm; > - __assign_str(memcg_path); > + __entry->memcg_id =3D memcg_id_from_mm(mm); > __entry->write =3D write; > __entry->success =3D success; > ), > > TP_printk( > - "mm=3D%p memcg_path=3D%s write=3D%s success=3D%s", > + "mm=3D%p memcg_id=3D%llu write=3D%s success=3D%s", > __entry->mm, > - __get_str(memcg_path), > + __entry->memcg_id, > __entry->write ? "true" : "false", > __entry->success ? "true" : "false" > ) > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index f186d57df2c6..e7dbaf96aa17 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -17,51 +17,7 @@ EXPORT_TRACEPOINT_SYMBOL(mmap_lock_start_locking); > EXPORT_TRACEPOINT_SYMBOL(mmap_lock_acquire_returned); > EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); > > -#ifdef CONFIG_MEMCG > - > -/* > - * Size of the buffer for memcg path names. Ignoring stack trace support= , > - * trace_events_hist.c uses MAX_FILTER_STR_VAL for this, so we also use = it. > - */ > -#define MEMCG_PATH_BUF_SIZE MAX_FILTER_STR_VAL > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > - do { \ > - if (trace_mmap_lock_##type##_enabled()) { \ > - char buf[MEMCG_PATH_BUF_SIZE]; \ > - get_mm_memcg_path(mm, buf, sizeof(buf)); \ > - trace_mmap_lock_##type(mm, buf, ##__VA_ARGS__); \ > - } \ > - } while (0) > - > -#else /* !CONFIG_MEMCG */ > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) = \ > - trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) > - > -#endif /* CONFIG_MEMCG */ > - > #ifdef CONFIG_TRACING > -#ifdef CONFIG_MEMCG > -/* > - * Write the given mm_struct's memcg path to a buffer. If the path canno= t be > - * determined, empty string is written. > - */ > -static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t bu= flen) > -{ > - struct mem_cgroup *memcg; > - > - buf[0] =3D '\0'; > - memcg =3D get_mem_cgroup_from_mm(mm); > - if (memcg =3D=3D NULL) > - return; > - if (memcg->css.cgroup) > - cgroup_path(memcg->css.cgroup, buf, buflen); > - css_put(&memcg->css); > -} > - > -#endif /* CONFIG_MEMCG */ > - > /* > * Trace calls must be in a separate file, as otherwise there's a circul= ar > * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. > @@ -69,20 +25,20 @@ static void get_mm_memcg_path(struct mm_struct *mm, c= har *buf, size_t buflen) > > void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write= ) > { > - TRACE_MMAP_LOCK_EVENT(start_locking, mm, write); > + trace_mmap_lock_start_locking(mm, write); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); > > void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool wr= ite, > bool success) > { > - TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); > + trace_mmap_lock_acquire_returned(mm, write, success); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); > > void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) > { > - TRACE_MMAP_LOCK_EVENT(released, mm, write); > + trace_mmap_lock_released(mm, write); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_released); > #endif /* CONFIG_TRACING */ > -- > 2.43.5 > >