From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CF4FC4363D for ; Tue, 20 Oct 2020 14:50:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A8CFD2224B for ; Tue, 20 Oct 2020 14:50:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A8CFD2224B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 149FF6B0062; Tue, 20 Oct 2020 10:50:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D3306B006C; Tue, 20 Oct 2020 10:50:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDC726B0070; Tue, 20 Oct 2020 10:50:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id ACA736B0062 for ; Tue, 20 Oct 2020 10:50:33 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 494A41EE6 for ; Tue, 20 Oct 2020 14:50:33 +0000 (UTC) X-FDA: 77392589946.29.horse64_0201d4227240 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 164FC18086CCF for ; Tue, 20 Oct 2020 14:50:33 +0000 (UTC) X-HE-Tag: horse64_0201d4227240 X-Filterd-Recvd-Size: 6861 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 14:50:32 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 47EE6AE5C; Tue, 20 Oct 2020 14:50:31 +0000 (UTC) To: Axel Rasmussen , Steven Rostedt , Ingo Molnar , Andrew Morton , Michel Lespinasse , Daniel Jordan , Laurent Dufour , Jann Horn , Chinwen Chang Cc: Yafang Shao , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20201009220524.485102-1-axelrasmussen@google.com> <20201009220524.485102-3-axelrasmussen@google.com> From: Vlastimil Babka Subject: Re: [PATCH v3 2/2] mmap_lock: add tracepoints around lock acquisition Message-ID: <1b9238b7-17f2-6c1e-b37e-cf65424f504b@suse.cz> Date: Tue, 20 Oct 2020 16:50:30 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.2 MIME-Version: 1.0 In-Reply-To: <20201009220524.485102-3-axelrasmussen@google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/10/20 12:05 AM, Axel Rasmussen wrote: > The goal of these tracepoints is to be able to debug lock contention > issues. This lock is acquired on most (all?) mmap / munmap / page fault > operations, so a multi-threaded process which does a lot of these can > experience significant contention. >=20 > We trace just before we start acquisition, when the acquisition returns > (whether it succeeded or not), and when the lock is released (or > downgraded). The events are broken out by lock type (read / write). >=20 > The events are also broken out by memcg path. For container-based > workloads, users often think of several processes in a memcg as a singl= e > logical "task", so collecting statistics at this level is useful. >=20 > The end goal is to get latency information. This isn't directly include= d > in the trace events. Instead, users are expected to compute the time > between "start locking" and "acquire returned", using e.g. synthetic > events or BPF. The benefit we get from this is simpler code. >=20 > Because we use tracepoint_enabled() to decide whether or not to trace, > this patch has effectively no overhead unless tracepoints are enabled a= t > runtime. If tracepoints are enabled, there is a performance impact, but > how much depends on exactly what e.g. the BPF program does. >=20 > Signed-off-by: Axel Rasmussen Yeah I agree with this approach that follows the page ref one. ... > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > new file mode 100644 > index 000000000000..b849287bd12a > --- /dev/null > +++ b/mm/mmap_lock.c > @@ -0,0 +1,87 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#define CREATE_TRACE_POINTS > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* > + * We have to export these, as drivers use mmap_lock, and our inline f= unctions > + * in the header check if the tracepoint is enabled. They can't be GPL= , as e.g. > + * the nvidia driver is an existing caller of this code. I don't think this argument works in the kernel community. I would just r= emove=20 this comment. > + */ > +EXPORT_SYMBOL(__tracepoint_mmap_lock_start_locking); > +EXPORT_SYMBOL(__tracepoint_mmap_lock_acquire_returned); > +EXPORT_SYMBOL(__tracepoint_mmap_lock_released); You can use EXPORT_TRACEPOINT_SYMBOL() here. > +#ifdef CONFIG_MEMCG > + > +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path); > + > +/* > + * Write the given mm_struct's memcg path to a percpu buffer, and retu= rn a > + * pointer to it. If the path cannot be determined, the buffer will co= ntain the > + * empty string. > + * > + * Note: buffers are allocated per-cpu to avoid locking, so preemption= must be > + * disabled by the caller before calling us, and re-enabled only after= the > + * caller is done with the pointer. > + */ > +static const char *get_mm_memcg_path(struct mm_struct *mm) > +{ > + struct mem_cgroup *memcg =3D get_mem_cgroup_from_mm(mm); > + > + if (memcg !=3D NULL && likely(memcg->css.cgroup !=3D NULL)) { > + char *buf =3D this_cpu_ptr(trace_memcg_path); > + > + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL); > + return buf; > + } > + return ""; > +} > + > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) = \ > + do { = \ > + if (trace_mmap_lock_##type##_enabled()) { \ Is this check really needed? We only got called from the functions inline= d in=20 the .h file because tracepoint_enabled() was true in the first place, so = this=20 seems redundant. > + get_cpu(); \ > + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \ > + ##__VA_ARGS__); \ > + put_cpu(); \ > + } \ > + } while (0) > + > +#else /* !CONFIG_MEMCG */ > + > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) = \ > + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__) > + > +#endif /* CONFIG_MEMCG */ > + > +/* > + * Trace calls must be in a separate file, as otherwise there's a circ= ular > + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h. > + */ > + > +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool wri= te) > +{ > + TRACE_MMAP_LOCK_EVENT(start_locking, mm, write, true); Seems wasteful to have an always-true success field here. Yeah, not reusi= ng the=20 same event class for all three tracepoints means more code, but for traci= ng=20 efficiency it's worth it, IMHO. > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking); > + > +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool = write, > + bool success) > +{ > + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success); > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned); > + > +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) > +{ > + TRACE_MMAP_LOCK_EVENT(released, mm, write, true); Ditto. > +} > +EXPORT_SYMBOL(__mmap_lock_do_trace_released); >=20