From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03A07D58D71 for ; Mon, 25 Nov 2024 17:34:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D50C6B0085; Mon, 25 Nov 2024 12:34:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3852D6B0088; Mon, 25 Nov 2024 12:34:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24CEA6B0089; Mon, 25 Nov 2024 12:34:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 037FE6B0085 for ; Mon, 25 Nov 2024 12:34:23 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8739AC0CF4 for ; Mon, 25 Nov 2024 17:34:23 +0000 (UTC) X-FDA: 82825316322.16.4C0C2B2 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) by imf20.hostedemail.com (Postfix) with ESMTP id 3B2DE1C000A for ; Mon, 25 Nov 2024 17:34:16 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ukwktytr; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732556059; a=rsa-sha256; cv=none; b=uQywHgQxvKt957HaL53JA9EmB3L7PznrptT2QsR8U8kH//SN8ZTcx3ItqX4I59uw+N2f6X /wdA4UJOrLsgw1ezTHF9DiYUqUm9bgQpKFwEdvqGHIzyzQmQvduIlpxS/DptExBLm6/xNe qqwUNSeXrZjSbuo9DcXET/rd9SZJMD4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ukwktytr; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732556059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ftws9Jc/aR17pFgSNeoO8CO9UsEIAO0k+/pFLC677oE=; b=xGV8pXDTwfV85zbJrGlX278Mn3GSORyUl1pQbVz0DVTZdRePrDW31A+a+UgGqeLirlu4wA sD9ZWQ9SY+7fUlq6+yQmvsH91mTf5UxX91ZTCVxIuBM0EaiHT9lyjYPQc1KWfehSBXmTlC /I0fAXqESiscUIUs1a1QS4dlxPjoAF8= Date: Mon, 25 Nov 2024 09:34:11 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1732556059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Ftws9Jc/aR17pFgSNeoO8CO9UsEIAO0k+/pFLC677oE=; b=ukwktytrjn5qnXmZos9eCGwRbqIB0fvZHlKjQFq0Q36UgKav2F5kFr3C4h1klSzZcCZ4jn aRouhGF0YhcZrMFcnJcXYAfTUBQwRP1t18e1gyRNscq4CFuSpjioG0nItIjrxfP2wD2y5b bKIc3qXG+liJHPy+/ROdc/sH2Ak6frs= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Andrew Morton , Yosry Ahmed Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Axel Rasmussen , Steven Rostedt , Suren Baghdasaryan , Matthew Wilcox , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [PATCH v2] mm: mmap_lock: optimize mmap_lock tracepoints Message-ID: References: <20241125171617.113892-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241125171617.113892-1-shakeel.butt@linux.dev> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 3B2DE1C000A X-Stat-Signature: h8e77qakkxyjita94tm3rk5t6kkdcwbp X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1732556056-483881 X-HE-Meta: U2FsdGVkX19pgqSw0x6gcxGvi1D91miypHzVWGc0El/Klz14/0z3nXkY5CkJkgm0ri1bzBTxCvu2ZYTnHZga6WFjcBI8/XKUaYzMCxIcN9ZJ2iazcMwzRxFOg2z4H5z0+IAZEuKIA0gDdhq66Ku1nbCtET5SWseRczbuelGElNfhbbVx87rrubf9qfKkpmTyCY4unalH2ancvG75T4NMnO79OZ5DILU2SK13ifKV+gqh9XdgGfqEQ9nPU3hl0xnCRWvySc7wBt21eUxm91q9pCH7J0me/mf840C8Nk3V/UqJJ2p1FgYNA8vQ4HzwRD8bcWukXQe19114Kyrog1g7TpAx5DYAnCMmJTDOILyfU3cPqPE4O/K0Kk2hAmayhwV1ox50jSSWBNdkniID8UeDfxmsbK90vXg4Za/p5JA3NBY31xdZL5BdVGQZG/S83PEBpH4fJdvU9yn6oPKIC2FkmRe9nEyeuRJ3TJF9EQ9I7FIeKJTLTBRl7K56N/Ogan+EUXfzTCRCAMnRDOmAJLY467EniZdjGaKg/TzOt2/TykwablYWKL+fQocl+J31AtzISdCyZHFTR6/tyqNj8CZ7kkJ2Cj5VI6UrelXmJqiqpFERfizHbC5Pd7zIuYJpbAJWg1GaOXpYmLMWXr6qVobTDCZbVdc4zgsGdA9ErXTFFv/yhCvPQbjZ6IJfZobNebfLqsNMOrnoqDF7Ebt+NYhNQz1ubBE3VCuZUA4DK0V6i/GpXUv16VLMVSNwxXYkX8dXkoDqFANwdL2UsqcOb3ee1nmUCJEui3yeh2yxldk+rtZ/oFNUiFrJ+kck4TUlFulFJx0Wf/huSsA9csdtrr7xFnafNgIuvCTdlrvhwvnTZHZFVV93PYDPl6pkTSa3vdBEaxnmnxwJOlXr074PM8Usfc5x6PF2zlOK5FQ4/z3qONA3tfSEQlCvKYWsZ3jWS2jnGDXVJiMo3ROifYeQIwy KyBQ1Amm qVyjnqab7ltZIbE3sB+iMbRADoeLGFok2dP6dUAKg1EXTE/HtbgnGz9ApEihN2zR13T48eNY2BsiC+wvUiVM/X85DUFRHwAk68ZYqTiRS6U3jJZcCKsfaVMZMFj2fCmFSvlQG2hPTlAhNQeLX99TE4SL3mIybIvXEn6p9IGkawiAex3YQPzds5xzL55T+rUpTwPISo0B5U8WNyCxb/q+YJpqMqbTft+DS5VLzZ8fDSJ/PogGgrMHfJHFEpxv8cCiR8Ni/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cc Yosry. On Mon, Nov 25, 2024 at 09:16:17AM -0800, Shakeel Butt wrote: > We are starting to deploy mmap_lock tracepoint monitoring across our > fleet and the early results showed that these tracepoints are consuming > significant amount of CPUs in kernfs_path_from_node when enabled. > > It seems like the kernel is trying to resolve the cgroup path in the > fast path of the locking code path when the tracepoints are enabled. In > addition for some application their metrics are regressing when > monitoring is enabled. > > The cgroup path resolution can be slow and should not be done in the > fast path. Most userspace tools, like bpftrace, provides functionality > to get the cgroup path from cgroup id, so let's just trace the cgroup > id and the users can use better tools to get the path in the slow path. > > Signed-off-by: Shakeel Butt > --- > > Changes since v1: > - Fixed commit message (Yosry). > - Renamed memcg_id_from_mm to cgroup_id_from_mm (Yosry). > - Fixed handling of root memcg (Yosry). > - Fixed mem_cgroup_disabled() handling. > - Kept mm pointer printing based on Steven's feedback. > > include/linux/memcontrol.h | 22 ++++++++++++++ > include/trace/events/mmap_lock.h | 32 ++++++++++---------- > mm/mmap_lock.c | 50 ++------------------------------ > 3 files changed, 40 insertions(+), 64 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 5502aa8e138e..b28180269e75 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1046,6 +1046,23 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, > > void split_page_memcg(struct page *head, int old_order, int new_order); > > +static inline u64 cgroup_id_from_mm(struct mm_struct *mm) > +{ > + struct mem_cgroup *memcg; > + u64 id; > + > + if (mem_cgroup_disabled()) > + return 0; > + > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + if (!memcg) > + memcg = root_mem_cgroup; > + id = cgroup_id(memcg->css.cgroup); > + rcu_read_unlock(); > + return id; > +} > + > #else /* CONFIG_MEMCG */ > > #define MEM_CGROUP_ID_SHIFT 0 > @@ -1466,6 +1483,11 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) > static inline void split_page_memcg(struct page *head, int old_order, int new_order) > { > } > + > +static inline u64 cgroup_id_from_mm(struct mm_struct *mm) > +{ > + return 0; > +} > #endif /* CONFIG_MEMCG */ > > /* > diff --git a/include/trace/events/mmap_lock.h b/include/trace/events/mmap_lock.h > index bc2e3ad787b3..cf9f9faf8914 100644 > --- a/include/trace/events/mmap_lock.h > +++ b/include/trace/events/mmap_lock.h > @@ -5,6 +5,7 @@ > #if !defined(_TRACE_MMAP_LOCK_H) || defined(TRACE_HEADER_MULTI_READ) > #define _TRACE_MMAP_LOCK_H > > +#include > #include > #include > > @@ -12,64 +13,61 @@ struct mm_struct; > > DECLARE_EVENT_CLASS(mmap_lock, > > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write), > + TP_PROTO(struct mm_struct *mm, bool write), > > - TP_ARGS(mm, memcg_path, write), > + TP_ARGS(mm, write), > > TP_STRUCT__entry( > __field(struct mm_struct *, mm) > - __string(memcg_path, memcg_path) > + __field(u64, memcg_id) > __field(bool, write) > ), > > TP_fast_assign( > __entry->mm = mm; > - __assign_str(memcg_path); > + __entry->memcg_id = cgroup_id_from_mm(mm); > __entry->write = write; > ), > > TP_printk( > - "mm=%p memcg_path=%s write=%s", > - __entry->mm, > - __get_str(memcg_path), > + "mm=%p memcg_id=%llu write=%s", > + __entry->mm, __entry->memcg_id, > __entry->write ? "true" : "false" > ) > ); > > #define DEFINE_MMAP_LOCK_EVENT(name) \ > DEFINE_EVENT(mmap_lock, name, \ > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, \ > - bool write), \ > - TP_ARGS(mm, memcg_path, write)) > + TP_PROTO(struct mm_struct *mm, bool write), \ > + TP_ARGS(mm, write)) > > DEFINE_MMAP_LOCK_EVENT(mmap_lock_start_locking); > DEFINE_MMAP_LOCK_EVENT(mmap_lock_released); > > TRACE_EVENT(mmap_lock_acquire_returned, > > - TP_PROTO(struct mm_struct *mm, const char *memcg_path, bool write, > - bool success), > + TP_PROTO(struct mm_struct *mm, bool write, bool success), > > - TP_ARGS(mm, memcg_path, write, success), > + TP_ARGS(mm, write, success), > > TP_STRUCT__entry( > __field(struct mm_struct *, mm) > - __string(memcg_path, memcg_path) > + __field(u64, memcg_id) > __field(bool, write) > __field(bool, success) > ), > > TP_fast_assign( > __entry->mm = mm; > - __assign_str(memcg_path); > + __entry->memcg_id = cgroup_id_from_mm(mm); > __entry->write = write; > __entry->success = success; > ), > > TP_printk( > - "mm=%p memcg_path=%s write=%s success=%s", > + "mm=%p memcg_id=%llu write=%s success=%s", > __entry->mm, > - __get_str(memcg_path), > + __entry->memcg_id, > __entry->write ? "true" : "false", > __entry->success ? "true" : "false" > ) > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index f186d57df2c6..e7dbaf96aa17 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -17,51 +17,7 @@ EXPORT_TRACEPOINT_SYMBOL(mmap_lock_start_locking); > EXPORT_TRACEPOINT_SYMBOL(mmap_lock_acquire_returned); > EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); > > -#ifdef CONFIG_MEMCG > - > -/* > - * Size of the buffer for memcg path names. Ignoring stack trace support, > - * trace_events_hist.c uses MAX_FILTER_STR_VAL for this, so we also use it. > - */ > -#define MEMCG_PATH_BUF_SIZE MAX_FILTER_STR_VAL > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > - do { \ > - if (trace_mmap_lock_##type##_enabled()) { \ > - char buf[MEMCG_PATH_BUF_SIZE]; \ > - get_mm_memcg_path(mm, buf, sizeof(buf)); \ > - trace_mmap_lock_##type(mm, buf, ##__VA_ARGS__); \ > - } \ > - } while (0) > - > -#else /* !CONFIG_MEMCG */ > - > -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ > - trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) > - > -#endif /* CONFIG_MEMCG */ > - > #ifdef CONFIG_TRACING > -#ifdef CONFIG_MEMCG > -/* > - * Write the given mm_struct's memcg path to a buffer. If the path cannot be > - * determined, empty string is written. > - */ > -static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t buflen) > -{ > - struct mem_cgroup *memcg; > - > - buf[0] = '\0'; > - memcg = get_mem_cgroup_from_mm(mm); > - if (memcg == NULL) > - return; > - if (memcg->css.cgroup) > - cgroup_path(memcg->css.cgroup, buf, buflen); > - css_put(&memcg->css); > -} > - > -#endif /* CONFIG_MEMCG */ > - > /* > * Trace calls must be in a separate file, as otherwise there's a circular > * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. > @@ -69,20 +25,20 @@ static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t buflen) > > void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) > { > - TRACE_MMAP_LOCK_EVENT(start_locking, mm, write); > + trace_mmap_lock_start_locking(mm, write); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); > > void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, > bool success) > { > - TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); > + trace_mmap_lock_acquire_returned(mm, write, success); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); > > void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) > { > - TRACE_MMAP_LOCK_EVENT(released, mm, write); > + trace_mmap_lock_released(mm, write); > } > EXPORT_SYMBOL(__mmap_lock_do_trace_released); > #endif /* CONFIG_TRACING */ > -- > 2.43.5 >