From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B3B3D59D68 for ; Mon, 25 Nov 2024 19:52:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAE1F6B008C; Mon, 25 Nov 2024 14:52:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D5DD16B0092; Mon, 25 Nov 2024 14:52:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4D1E6B0093; Mon, 25 Nov 2024 14:52:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A7A426B008C for ; Mon, 25 Nov 2024 14:52:21 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4C9DF80CEB for ; Mon, 25 Nov 2024 19:52:21 +0000 (UTC) X-FDA: 82825664040.28.9182C02 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf12.hostedemail.com (Postfix) with ESMTP id E09304000E for ; Mon, 25 Nov 2024 19:52:17 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WDzBdrH3; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732564335; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mne9TnxCPP2fm/4106chmuJnI2d9LCsaAMO9TCov5uc=; b=kCqVcHLDco29CR2K/2VMYuRwYgqm8ja2HONI0xqEkRYi4sIOQOBsa0dOHlTvlBzprW5xSl lYms2Z7nlJOSHcGlbriPQX4K5I2XbkgSkqox8+SlXD9HqXvuOigiDwCh7WAENftg713D3I yMBtaMmsl5WlyDZrKWk91YV/Ic2GDDI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WDzBdrH3; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732564335; a=rsa-sha256; cv=none; b=hCc71g7UGizHaC0N7oqpr7R0a0h04ZcXz3RiMPfNEQD3p1FQLhFBzyrUuh8Ie4sn55zsPw e4rHDvowKhxAJqAMyqVJOJAyC5aCMah3ABc5PHzvN7JrjOGZSqzt6Ws1ANyGs/rIHmd6Ml SOnhbJOmq12WQh9df8+OkXIS7SnK7L0= Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-6ea053b5929so38768847b3.0 for ; Mon, 25 Nov 2024 11:52:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732564338; x=1733169138; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Mne9TnxCPP2fm/4106chmuJnI2d9LCsaAMO9TCov5uc=; b=WDzBdrH3gnZs7e6J2Zh5VwN3QH44RPqmuJytfUvgcOOptxLVVcpZFOQAzEgyB4MRDD PEQ/RNqm662/8YkFDUvZVrWNSk2X564W0/C9tUD76V+4GEm57mC6Zsf0IpuFePqzf18s hGlzNyUKv5vgvnmesj8gqnFIFYUQ515llq7ktJcg8UG4F2cwEUfzeet8uLKDa9K8+UDy y1HjwjrbzOOD2nxTdv03xUfin1l6vtK6tXWWW2uXaeg+ByyuxqldobQnePHgp1q/yG11 7Zw3qJFUi/tTEHSjgg3PJiUWvOh39+K7j+GJ7NeBE3ugGNq5DfOWfvbh7/9Oq6OOTxsi Jhbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732564338; x=1733169138; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Mne9TnxCPP2fm/4106chmuJnI2d9LCsaAMO9TCov5uc=; b=hf4dDj1sQBmfdK7uxEADv08G3Ss/SQzPcP4oMy2dMM0XHG6ut32mIDit74sngBuuPq N5CCIzcT+4RdeUa3+WRN1fuWj+FaAWWwCRhhPc5K6xgcBG6lzk6ztM2CiWIPhK+/dxAX Oqp+fDhD7VTIeiOBofx+eL6c1lsiuQ1SA07yve2XB/XxTyW/gV5YSx4BI5cJ6sCgPIII QzdXePctzhtV230dGvzHa1w0cT8WkNj72725LyfcguIU3JuIO57Z5uvEVMfosFpBABA8 CUzPKBYkemsQP0PIBMfOcTr9zgqOi7gdJIAR2F3LX/H634o6dyj5OhvCFln2XITNv/FR RPQg== X-Forwarded-Encrypted: i=1; AJvYcCVU4sTX0GuDT+6lvetM9eT+5aDJqPoZRrizGjMvCiGdoRy1vGr2ezfhmdt4kGdENV+DZp87FoeNvQ==@kvack.org X-Gm-Message-State: AOJu0YzMcYkjIIGA6Tckti5bxuiNyzGSUC96C4Ww5NfaIErhVU0KUqt1 ww7oUPCLOmbCwV+WzpQvQOvy5VExV/vqdzUAYoUy2XfSxBLd4P4njSithznI4Lssj5RI6imjKSG b6BK5Gusxpbe8Nu/uhOUy3FarNF3OVKZQ8EYy X-Gm-Gg: ASbGnct1EDgUOXsxKh9zQeLatuw1180Gzcb90xOE/QnEzFdSQ6+TS8yn24FwhU1gAdt e2m58FSUAfwIkYmtO5l81+H4kQjWHVCLiSjZZR7kkSNirfLlRqOvpkgR82CwlNQ== X-Google-Smtp-Source: AGHT+IHmCrrVRu68GQphwJ7BbRlO6PPSA+ohYN9FNXBSxIiNptZk+ncdZDsbIU3cNZCqTtKbpWvr76w7HALzW8miY5o= X-Received: by 2002:a05:690c:7085:b0:6ea:7d88:ae3f with SMTP id 00721157ae682-6eee08c3ed1mr137907717b3.17.1732564338195; Mon, 25 Nov 2024 11:52:18 -0800 (PST) MIME-Version: 1.0 References: <20241125171617.113892-1-shakeel.butt@linux.dev> In-Reply-To: <20241125171617.113892-1-shakeel.butt@linux.dev> From: Yosry Ahmed Date: Mon, 25 Nov 2024 11:51:41 -0800 Message-ID: Subject: Re: [PATCH v2] mm: mmap_lock: optimize mmap_lock tracepoints To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Axel Rasmussen , Steven Rostedt , Suren Baghdasaryan , Matthew Wilcox , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E09304000E X-Stat-Signature: ec8qjghjhqxf5fnpcq11tzhuottrb173 X-Rspam-User: X-HE-Tag: 1732564337-652327 X-HE-Meta: U2FsdGVkX182G+9nJo9aZ8JUh7MIIeWpMCCafBLNMlI1YnW76S7LeKPTlfH5LciOYipzmc+RS23pvGZTPOYPydp8nxRARs9Jpq/SM9rTiUJ8Pzp0B8wAURRRmjH1vX3nHkBMKfbSNUn4ksiEuoQMMQEJSLE25EH1sOvxu0NHA+nQo4L95OvYD5LyxQBg8ucNhkzt10cMEMXA/hAQjxF8kSHETlAUZ5OBvJzEOS/E1MIUDFnSZ9cqjKmBlTZeRdjxdp+layMaOenkWqYIC4J8YHTPqh8qa/4lEtpjmsnHsYiC8/oPKEJy6i31pnrtyFq3XDa4uAInOlKfs9DL4XlR2MexoGAU2LQoHR+Rm2rY3QkMB5IpCRLhPhun+59lO3Nwia598KpBSWcpJVWU88W5Mc5LcmPHcQrVXNVXvzhOT4rDRAwtK2Yq7Z1pM4escDSrs8Qdx8JtwUN6fbQCPNKAaDkkDH+fXidBgQgyP6VpFtFelkaN+hsIGh5IF/hMNTD6fN0DUtGaQKh1uMuJKerbWa5s4wdG3HmxN00cb8sD8rbeZb+ZCVBCIvVmCW51BxCa45DaljEEgP5ACFD6zOXg+sAkMqo0J3BRPrDBBX10FI/8HpyNM1MyWnSuB3x4+8IthViAj7dQT6GdU19ixoob9PziRSnVpGIQn1M32SXS6DSCjbLtfjgK1p8XfoNh1ZZf3ADqjsk03T9tIxJ79yb9d/iDe5/44m+dISoxssA+Uw6NVtXAIJ7kUzQI3OmDwYm67EF4/HW29uziYZTQC5IT4GGh3LmphAAvbtAlamHoKBACKVaixUsKimiUXmA8bB7M7wRnG8PCC4Db2UosFnkxyR2ObbtFzsiDjYJ2IyM+mlOYm9TWS+w7pmBPud+kgDbJfIrMM26pXBSMhKUpv1OAuUkx41KZ8+GN4sYg0eSt6YdE6K4neng88SKarGWmYaGAqDfjdU//WJ/65lgQnAd cGxJRWTP 6Coza03FU0vuPGgL80y7dpxHiAZTXq+co74gg48kcDTJqG74H1BM1F9J2GjQZ9nZ8T8rawr06bBitARziAcHbCv2C0VH8BIL8TsXhQiawcSTpm10oxwuHEsqQEAnAXjbu+A5wz50cHJMALOAvoKQjlRDIoB+XpTNwYizoP5EvEZJQ+MrpzrNnRTfJjhr3hk6GX9eiaKFZ0/Gv/RzSW2gbpETK5bWXBpxcA5eO7miG/VkntZgmagapmYWYM3n20DCtDwjtn+sJXlMt7Q0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 25, 2024 at 9:16=E2=80=AFAM Shakeel Butt wrote: > > We are starting to deploy mmap_lock tracepoint monitoring across our > fleet and the early results showed that these tracepoints are consuming > significant amount of CPUs in kernfs_path_from_node when enabled. > > It seems like the kernel is trying to resolve the cgroup path in the > fast path of the locking code path when the tracepoints are enabled. In > addition for some application their metrics are regressing when > monitoring is enabled. > > The cgroup path resolution can be slow and should not be done in the > fast path. Most userspace tools, like bpftrace, provides functionality > to get the cgroup path from cgroup id, so let's just trace the cgroup > id and the users can use better tools to get the path in the slow path. > > Signed-off-by: Shakeel Butt Reviewed-by: Yosry Ahmed > --- > > Changes since v1: > - Fixed commit message (Yosry). > - Renamed memcg_id_from_mm to cgroup_id_from_mm (Yosry). > - Fixed handling of root memcg (Yosry). > - Fixed mem_cgroup_disabled() handling. > - Kept mm pointer printing based on Steven's feedback. > > include/linux/memcontrol.h | 22 ++++++++++++++ > include/trace/events/mmap_lock.h | 32 ++++++++++---------- > mm/mmap_lock.c | 50 ++------------------------------ > 3 files changed, 40 insertions(+), 64 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 5502aa8e138e..b28180269e75 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1046,6 +1046,23 @@ static inline void memcg_memory_event_mm(struct mm= _struct *mm, > > void split_page_memcg(struct page *head, int old_order, int new_order); > > +static inline u64 cgroup_id_from_mm(struct mm_struct *mm) > +{ > + struct mem_cgroup *memcg; > + u64 id; > + > + if (mem_cgroup_disabled()) > + return 0; > + > + rcu_read_lock(); > + memcg =3D mem_cgroup_from_task(rcu_dereference(mm->owner)); > + if (!memcg) > + memcg =3D root_mem_cgroup; > + id =3D cgroup_id(memcg->css.cgroup); > + rcu_read_unlock(); > + return id; > +} > + > #else /* CONFIG_MEMCG */ > > #define MEM_CGROUP_ID_SHIFT 0 > @@ -1466,6 +1483,11 @@ void count_memcg_event_mm(struct mm_struct *mm, en= um vm_event_item idx) > static inline void split_page_memcg(struct page *head, int old_order, in= t new_order) > { > } > + > +static inline u64 cgroup_id_from_mm(struct mm_struct *mm) > +{ > + return 0; > +} > #endif /* CONFIG_MEMCG */ > > /* > diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap= _lock.h > index bc2e3ad787b3..cf9f9faf8914 100644 > --- a/include/trace/events/mmap_lock.h > +++ b/include/trace/events/mmap_lock.h > @@ -5,6 +5,7 @@ > #if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) > #define _TRACE_MMAP_LOCK_H > > +#include > #include > #include > > @@ -12,64 +13,61 @@ struct mm_struct; > > DECLARE_EVENT_CLASS(mmap_lock, > > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write= ), > + TP_PROTO(struct mm_struct *mm, bool write), > > - TP_ARGS(mm, memcg_path, write), > + TP_ARGS(mm, write), > > TP_STRUCT__entry( > __field(struct mm_struct *, mm) > - __string(memcg_path, memcg_path) > + __field(u64, memcg_id) > __field(bool, write) > ), > > TP_fast_assign( > __entry->mm =3D mm; > - __assign_str(memcg_path); > + __entry->memcg_id =3D cgroup_id_from_mm(mm); > __entry->write =3D write; > ), > > TP_printk( > - "mm=3D%p memcg_path=3D%s write=3D%s", > - __entry->mm, > - __get_str(memcg_path), > + "mm=3D%p memcg_id=3D%llu write=3D%s", > + __entry->mm, __entry->memcg_id, > __entry->write ? "true" : "false" > ) > ); > > #define DEFINE_MMAP_LOCK_EVENT(name) = \ > DEFINE_EVENT(mmap_lock, name, \ > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, \ > - bool write), \ > - TP_ARGS(mm, memcg_path, write)) > + TP_PROTO(struct mm_struct *mm, bool write), \ > + TP_ARGS(mm, write)) > > DEFINE_MMAP_LOCK_EVENT(mmap_lock_start_locking); > DEFINE_MMAP_LOCK_EVENT(mmap_lock_released); > > TRACE_EVENT(mmap_lock_acquire_returned, > > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write= , > - bool success), > + TP_PROTO(struct mm_struct *mm, bool write, bool success), > > - TP_ARGS(mm, memcg_path, write, success), > + TP_ARGS(mm, write, success), > > TP_STRUCT__entry( > __field(struct mm_struct *, mm) > - __string(memcg_path, memcg_path) > + __field(u64, memcg_id) > __field(bool, write) > __field(bool, success) > ), > > TP_fast_assign( > __entry->mm =3D mm; > - __assign_str(memcg_path); > + __entry->memcg_id =3D cgroup_id_from_mm(mm); > __entry->write =3D write; > __entry->success =3D success; > ), > > TP_printk( > - "mm=3D%p memcg_path=3D%s write=3D%s success=3D%s", > + "mm=3D%p memcg_id=3D%llu write=3D%s success=3D%s", > __entry->mm, > - __get_str(memcg_path), > + __entry->memcg_id, > __entry->write ? "true" : "false", > __entry->success ? "true" : "false" > ) > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index f186d57df2c6..e7dbaf96aa17 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -17,51 +17,7 @@ EXPORT_TRACEPOINT_SYMBOL(mmap_lock_start_locking); > EXPORT_TRACEPOINT_SYMBOL(mmap_lock_acquire_returned); > EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); > > -#ifdef CONFIG_MEMCG > - > -/* > - * Size of the buffer for memcg path names. Ignoring stack trace support= , > - * trace_events_hist.c uses MAX_FILTER_STR_VAL for this, so we also use = it. > - */ > -#define MEMCG_PATH_BUF_SIZE MAX_FILTER_STR_VAL > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > - do { \ > - if (trace_mmap_lock_##type##_enabled()) { \ > - char buf[MEMCG_PATH_BUF_SIZE]; \ > - get_mm_memcg_path(mm, buf, sizeof(buf)); \ > - trace_mmap_lock_##type(mm, buf, ##__VA_ARGS__); \ > - } \ > - } while (0) > - > -#else /* !CONFIG_MEMCG */ > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) = \ > - trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) > - > -#endif /* CONFIG_MEMCG */ > - > #ifdef CONFIG_TRACING > -#ifdef CONFIG_MEMCG > -/* > - * Write the given mm_struct's memcg path to a buffer. If the path canno= t be > - * determined, empty string is written. > - */ > -static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t bu= flen) > -{ > - struct mem_cgroup *memcg; > - > - buf[0] =3D '\0'; > - memcg =3D get_mem_cgroup_from_mm(mm); > - if (memcg =3D=3D NULL) > - return; > - if (memcg->css.cgroup) > - cgroup_path(memcg->css.cgroup, buf, buflen); > - css_put(&memcg->css); > -} > - > -#endif /* CONFIG_MEMCG */ > - > /* > * Trace calls must be in a separate file, as otherwise there's a circul= ar > * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. > @@ -69,20 +25,20 @@ static void get_mm_memcg_path(struct mm_struct *mm, c= har *buf, size_t buflen) > > void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write= ) > { > - TRACE_MMAP_LOCK_EVENT(start_locking, mm, write); > + trace_mmap_lock_start_locking(mm, write); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); > > void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool wr= ite, > bool success) > { > - TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); > + trace_mmap_lock_acquire_returned(mm, write, success); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); > > void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) > { > - TRACE_MMAP_LOCK_EVENT(released, mm, write); > + trace_mmap_lock_released(mm, write); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_released); > #endif /* CONFIG_TRACING */ > -- > 2.43.5 > >